← Back to Research
Overview
Messages referenced by cryptographic hash (content address) rather than location, enabling deduplication, verifiable delivery, and IPFS-style distributed storage.
Problem Statement
Traditional email is location-addressed: - Messages stored on specific servers - Duplicates when forwarded (wasteful) - No verifiable proof of content - Centralized storage (SPOF) - No content-based deduplication
Vision
Traditional: Message stored at imap://msgs.global/INBOX/12345
Content-Addressed: Message is hash sha256:a3f4b2c... (stored anywhere, retrieved by hash)
Architecture
1. Content Addressing
Message Hash: SHA-256 of canonical message
IPFS CID: QmYwAPJzv5CZsnA... (multihash format)
References:
- Parent: sha256:parent-hash
- Attachments: [sha256:att1-hash, sha256:att2-hash]
- Thread: sha256:thread-root-hash
2. Storage Layer
┌──────────────┐
│ IPFS/S3 │
│ (immutable) │
└──────┬───────┘
│
┌──────▼───────┐
│ Hash Index │
│ (PostgreSQL)│
└──────┬───────┘
│
┌─────────────────────┼─────────────────────┐
│ │ │
┌───▼────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ User A │ │ User B │ │ User C │
│ Refs │ │ Refs │ │ Refs │
└────────┘ └───────────┘ └───────────┘
[Only user references are stored per-user, content is shared]
3. Message Format
{
"hash": "sha256:a3f4b2c8...",
"ipfs_cid": "QmYwAPJzv5CZsnA...",
"headers": {
"from": "alice@msgs.global",
"to": ["bob@msgs.global"],
"subject": "Project Update",
"date": "2026-03-07T20:00:00Z",
"message-id": "unique-id@msgs.global"
},
"body": {
"type": "text/plain",
"hash": "sha256:body-hash...",
"size": 1234
},
"attachments": [
{
"filename": "report.pdf",
"hash": "sha256:att-hash...",
"size": 524288,
"ipfs_cid": "QmXbZ..."
}
],
"references": {
"in-reply-to": "sha256:parent-hash...",
"thread-root": "sha256:thread-hash..."
}
}
Key Benefits
1. Deduplication
Alice sends message with attachment (2 MB) to Bob and Carol
Traditional: 4 MB stored (2x2)
Content-Addressed: 2 MB stored (1x, 2 refs)
Alice forwards Bob's message to Carol
Traditional: 2x message storage
Content-Addressed: 1 ref added
2. Verifiable Delivery
# Sender proves message delivered
delivery_proof = {
'message_hash': 'sha256:a3f4b2c...',
'recipient': 'bob@msgs.global',
'timestamp': '2026-03-07T20:00:00Z',
'signature': sign(recipient_key, message_hash + timestamp)
}
# Anyone can verify Bob received this exact message
verify(delivery_proof) # Cryptographic proof, non-repudiable
3. Distributed Storage
# Message stored on multiple nodes
ipfs add message.eml # -> QmYwAPJzv5CZsnA...
# Anyone with the hash can retrieve
ipfs cat QmYwAPJzv5CZsnA...
# msgs.global nodes pin important messages
# Users can pin their own messages elsewhere
4. Thread Integrity
# Entire thread is a Merkle tree
thread_root = "sha256:thread-hash"
# Verify entire conversation integrity
verify_thread_integrity(thread_root) # -> True/False
# Detect if any message in thread was tampered
Integration with msgs.global
Database Schema
-- Content-addressed messages
CREATE TABLE ca_messages (
hash VARCHAR(64) PRIMARY KEY,
ipfs_cid VARCHAR(100),
message_data JSONB,
content BYTEA, -- Canonical representation
stored_at TIMESTAMP DEFAULT NOW(),
ref_count INTEGER DEFAULT 0
);
-- User message references
CREATE TABLE user_message_refs (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
message_hash VARCHAR(64) REFERENCES ca_messages(hash),
folder VARCHAR(50), -- INBOX, Sent, etc.
flags TEXT[], -- SEEN, FLAGGED, etc.
received_at TIMESTAMP DEFAULT NOW()
);
-- Attachment deduplication
CREATE TABLE ca_attachments (
hash VARCHAR(64) PRIMARY KEY,
ipfs_cid VARCHAR(100),
filename VARCHAR(255),
mime_type VARCHAR(100),
size INTEGER,
content BYTEA,
ref_count INTEGER DEFAULT 0
);
CREATE INDEX idx_user_refs_folder ON user_message_refs(user_id, folder);
Storage Service
class ContentAddressedStorage:
def store_message(self, message):
"""Store message and return hash"""
# 1. Canonicalize message
canonical = canonicalize_message(message)
# 2. Hash content
message_hash = hashlib.sha256(canonical).hexdigest()
# 3. Check if exists
if self.exists(message_hash):
self.increment_ref_count(message_hash)
return message_hash
# 4. Store attachments separately
for att in message.attachments:
att_hash = self.store_attachment(att)
message.attachment_hashes.append(att_hash)
# 5. Store to IPFS (optional)
ipfs_cid = ipfs_client.add(canonical)
# 6. Store to database
db.execute("""
INSERT INTO ca_messages (hash, ipfs_cid, message_data, content)
VALUES (?, ?, ?, ?)
""", message_hash, ipfs_cid, message.to_json(), canonical)
return message_hash
def get_message(self, hash):
"""Retrieve message by hash"""
# Try local database
msg = db.query("SELECT * FROM ca_messages WHERE hash = ?", hash)
if msg:
return msg
# Try IPFS
if ipfs_cid := self.get_ipfs_cid(hash):
content = ipfs_client.cat(ipfs_cid)
return parse_message(content)
return None
def delete_message(self, user_id, hash):
"""Delete user reference (not content)"""
db.execute("""
DELETE FROM user_message_refs
WHERE user_id = ? AND message_hash = ?
""", user_id, hash)
# Decrement ref count
db.execute("""
UPDATE ca_messages
SET ref_count = ref_count - 1
WHERE hash = ?
""", hash)
# Garbage collect if ref_count = 0 (optional)
self.gc_if_needed(hash)
IMAP Bridge
# Transparent IMAP interface
# Users see normal IMAP folders, but backed by content-addressed storage
class ContentAddressedIMAP:
def fetch(self, message_num):
"""Fetch message by sequence number"""
# Map sequence number -> hash
ref = db.query("""
SELECT message_hash FROM user_message_refs
WHERE user_id = ? AND folder = ?
ORDER BY received_at
LIMIT 1 OFFSET ?
""", self.user_id, self.folder, message_num - 1)
# Retrieve by hash
message = storage.get_message(ref.message_hash)
return message
API Endpoints
@app.route('/api/v1/messages/<hash>')
def get_message_by_hash(hash):
"""Retrieve message by content address"""
message = storage.get_message(hash)
if not message:
return {'error': 'Message not found'}, 404
return message.to_json()
@app.route('/api/v1/messages/<hash>/verify')
def verify_message_integrity(hash):
"""Verify message integrity"""
message = storage.get_message(hash)
canonical = canonicalize_message(message)
computed_hash = hashlib.sha256(canonical).hexdigest()
return {
'valid': computed_hash == hash,
'claimed_hash': hash,
'computed_hash': computed_hash
}
@app.route('/api/v1/threads/<thread_hash>/verify')
def verify_thread_integrity(thread_hash):
"""Verify entire thread integrity"""
# Reconstruct thread from references
thread = reconstruct_thread(thread_hash)
# Verify each message
for msg in thread:
if not verify_message_integrity(msg.hash):
return {'valid': False, 'invalid_message': msg.hash}
return {'valid': True, 'message_count': len(thread)}
Migration Strategy
Phase 1: Hybrid Storage (6 months)
- New messages stored content-addressed
- Existing messages remain traditional
- IMAP interface unchanged (transparent)
Phase 2: Background Migration (6 months)
- Deduplicate existing messages
- Compute hashes for historical data
- Migrate to content-addressed refs
Phase 3: Full Content-Addressed (12 months)
- All storage content-addressed
- IPFS pinning for distributed backup
- Enable user-controlled storage
Storage Savings Estimate
Current: 100,000 users × 5 GB avg = 500 TB
Deduplication:
- Attachments: 40% reduction (common files)
- Forwards: 30% reduction
- Thread replies: 20% reduction
Estimated savings: ~35% = 175 TB
Cost savings: ~$3,500/month @ $20/TB
Challenges & Solutions
| Challenge | Solution |
|---|---|
| Hash collision | Use SHA-256 (probability negligible) |
| Message mutability | Store canonical form (normalized) |
| IMAP compatibility | Transparent mapping layer |
| Performance | Aggressive caching, local index |
| IPFS pinning cost | Selective pinning, user-pays model |
Related Technologies
- IPFS: InterPlanetary File System (distributed storage)
- Git: Content-addressed version control (inspiration)
- Perkeep: Personal archival system
- Dat Protocol: Distributed data sharing
- Filecoin: Incentivized IPFS storage
Status
🔬 Research & Prototyping Phase
Next Steps
- Prototype content-addressed storage layer
- Benchmark deduplication savings on real data
- Test IPFS integration
- Build Merkle tree thread verification
- IMAP compatibility testing
- Measure performance vs traditional storage