Implementing Immutable WORM Storage for Records + Metadata
SEC 17a-4 and FINRA 4511 require "non-rewriteable, non-erasable" records storage (classic WORM or equivalent) for specified periods. HIPAA requires retrievable, safeguarded records and auditable activity logs (often 6+ years).
Core Principles
Immutability
- Once a record + metadata are written, they cannot be modified or deleted until
retain_until - New versions are new immutable records linked by lineage
- No in-place edits allowed
Integrity & Verification
- Hash each stored object
- Optionally chain hashes (blockchain-style)
- Anchor periodic "top hashes" to external timestamp service
- Prove non-tampering
Separation of Duties
- Admins cannot bypass WORM
- Admin cannot silently alter retention rules
- All changes logged
- Logs themselves are WORM-archived
Cloud WORM Storage Options
AWS S3 Object Lock
import boto3
s3 = boto3.client('s3')
# Put object with retention
s3.put_object(
Bucket='compliance-archive',
Key=f'records/{doc_id}/{version}.json',
Body=content,
ObjectLockMode='COMPLIANCE',
ObjectLockRetainUntilDate=retain_until_date
)
Azure Immutable Blob Storage
from azure.storage.blob import BlobServiceClient
blob_service = BlobServiceClient.from_connection_string(conn_str)
container_client = blob_service.get_container_client('compliance')
# Set immutability policy
container_client.set_container_metadata(
metadata={'immutability_period_days': str(365 * 7)}
)
GCP Retention Policy
from google.cloud import storage
client = storage.Client()
bucket = client.bucket('compliance-archive')
bucket.retention_period = 365 * 7 * 24 * 60 * 60 # 7 years in seconds
bucket.patch()
Practical Pattern
Write Path
-
Validate metadata
- Check required fields
- Verify retention category exists
-
Calculate retention
- Look up
retention_categoryin policy table - Compute
retain_untilfrom effective date
- Look up
-
Serialize record
{
"markdown": "# Document Content...",
"metadata": {
"doc_id": "uuid",
"title": "...",
"retention_category": "HIPAA-6Y",
"retain_until": "2031-01-15",
...
},
"content_hash": "sha256:...",
"created_at": "2025-01-15T10:00:00Z"
} -
Write to WORM store
- Set retention lock >= policy requirement
- Store content + metadata together
-
Record pointer in PostgreSQL
INSERT INTO documents (
doc_id, path, worm_object_id, content_hash, ...
) VALUES (...);
Update/Versioning Path
-
Create new WORM object
- Increment version
- New metadata with
supersedeslink
-
Update PostgreSQL
- New row in
document_versions - Update
documents.current_version
- New row in
-
Previous versions remain immutable
- May be required for full retention period
- Never delete until
retain_untilpassed
Deletion Path
-
Scheduled job identifies eligible documents:
SELECT doc_id FROM document_metadata
WHERE retain_until <= CURRENT_DATE
AND legal_hold = FALSE; -
Check WORM mode
- If compliance mode with retention expired: can delete
- If governance mode: may need admin override
-
Issue delete/expiry request
s3.delete_object(
Bucket='compliance-archive',
Key=f'records/{doc_id}/{version}.json'
) -
Log destruction event
INSERT INTO destruction_log (
doc_id, destroyed_at, destroyed_by, reason
) VALUES (...);
Audit Log Storage
Audit logs themselves need WORM protection:
Append-Only Log Table
CREATE TABLE audit_events (
event_id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
user_id TEXT NOT NULL,
action TEXT NOT NULL,
doc_id UUID,
old_value JSONB,
new_value JSONB,
source_ip INET
);
-- Make table append-only
REVOKE DELETE, UPDATE ON audit_events FROM PUBLIC;
Periodic Archive
# Weekly job: archive audit logs to WORM
audit_batch = db.query("""
SELECT * FROM audit_events
WHERE timestamp >= %s AND timestamp < %s
""", start, end)
s3.put_object(
Bucket='audit-archive',
Key=f'audit/{year}/{week}/events.jsonl.gz',
Body=gzip.compress(audit_batch.to_jsonl()),
ObjectLockMode='COMPLIANCE',
ObjectLockRetainUntilDate=retain_until
)
Hash Chain
import hashlib
def compute_chain_hash(events, previous_hash):
h = hashlib.sha256()
h.update(previous_hash.encode())
for event in events:
h.update(json.dumps(event, sort_keys=True).encode())
return h.hexdigest()
# Store chain hash with batch
batch_metadata = {
'chain_hash': compute_chain_hash(events, previous_batch_hash),
'previous_batch': previous_batch_id
}
Architecture Diagram
┌─────────────────────────────────────────────────────────────┐
│ Application │
│ Document Create/Update → Validate → Serialize │
└───────────────────────────┬─────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ PostgreSQL │ │ Meilisearch │ │ WORM Store │
│ - Metadata │ │ - Search │ │ - S3 Object │
│ - Pointers │ │ - Facets │ │ Lock │
│ - Audit │ │ │ │ - Retention │
└───────────────┘ └───────────────┘ └───────────────┘