Skip to main content

Implementing Immutable WORM Storage for Records + Metadata

SEC 17a-4 and FINRA 4511 require "non-rewriteable, non-erasable" records storage (classic WORM or equivalent) for specified periods. HIPAA requires retrievable, safeguarded records and auditable activity logs (often 6+ years).

Core Principles

Immutability

  • Once a record + metadata are written, they cannot be modified or deleted until retain_until
  • New versions are new immutable records linked by lineage
  • No in-place edits allowed

Integrity & Verification

  • Hash each stored object
  • Optionally chain hashes (blockchain-style)
  • Anchor periodic "top hashes" to external timestamp service
  • Prove non-tampering

Separation of Duties

  • Admins cannot bypass WORM
  • Admin cannot silently alter retention rules
  • All changes logged
  • Logs themselves are WORM-archived

Cloud WORM Storage Options

AWS S3 Object Lock

import boto3

s3 = boto3.client('s3')

# Put object with retention
s3.put_object(
Bucket='compliance-archive',
Key=f'records/{doc_id}/{version}.json',
Body=content,
ObjectLockMode='COMPLIANCE',
ObjectLockRetainUntilDate=retain_until_date
)

Azure Immutable Blob Storage

from azure.storage.blob import BlobServiceClient

blob_service = BlobServiceClient.from_connection_string(conn_str)
container_client = blob_service.get_container_client('compliance')

# Set immutability policy
container_client.set_container_metadata(
metadata={'immutability_period_days': str(365 * 7)}
)

GCP Retention Policy

from google.cloud import storage

client = storage.Client()
bucket = client.bucket('compliance-archive')
bucket.retention_period = 365 * 7 * 24 * 60 * 60 # 7 years in seconds
bucket.patch()

Practical Pattern

Write Path

  1. Validate metadata

    • Check required fields
    • Verify retention category exists
  2. Calculate retention

    • Look up retention_category in policy table
    • Compute retain_until from effective date
  3. Serialize record

    {
    "markdown": "# Document Content...",
    "metadata": {
    "doc_id": "uuid",
    "title": "...",
    "retention_category": "HIPAA-6Y",
    "retain_until": "2031-01-15",
    ...
    },
    "content_hash": "sha256:...",
    "created_at": "2025-01-15T10:00:00Z"
    }
  4. Write to WORM store

    • Set retention lock >= policy requirement
    • Store content + metadata together
  5. Record pointer in PostgreSQL

    INSERT INTO documents (
    doc_id, path, worm_object_id, content_hash, ...
    ) VALUES (...);

Update/Versioning Path

  1. Create new WORM object

    • Increment version
    • New metadata with supersedes link
  2. Update PostgreSQL

    • New row in document_versions
    • Update documents.current_version
  3. Previous versions remain immutable

    • May be required for full retention period
    • Never delete until retain_until passed

Deletion Path

  1. Scheduled job identifies eligible documents:

    SELECT doc_id FROM document_metadata
    WHERE retain_until <= CURRENT_DATE
    AND legal_hold = FALSE;
  2. Check WORM mode

    • If compliance mode with retention expired: can delete
    • If governance mode: may need admin override
  3. Issue delete/expiry request

    s3.delete_object(
    Bucket='compliance-archive',
    Key=f'records/{doc_id}/{version}.json'
    )
  4. Log destruction event

    INSERT INTO destruction_log (
    doc_id, destroyed_at, destroyed_by, reason
    ) VALUES (...);

Audit Log Storage

Audit logs themselves need WORM protection:

Append-Only Log Table

CREATE TABLE audit_events (
event_id BIGSERIAL PRIMARY KEY,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
user_id TEXT NOT NULL,
action TEXT NOT NULL,
doc_id UUID,
old_value JSONB,
new_value JSONB,
source_ip INET
);

-- Make table append-only
REVOKE DELETE, UPDATE ON audit_events FROM PUBLIC;

Periodic Archive

# Weekly job: archive audit logs to WORM
audit_batch = db.query("""
SELECT * FROM audit_events
WHERE timestamp >= %s AND timestamp < %s
""", start, end)

s3.put_object(
Bucket='audit-archive',
Key=f'audit/{year}/{week}/events.jsonl.gz',
Body=gzip.compress(audit_batch.to_jsonl()),
ObjectLockMode='COMPLIANCE',
ObjectLockRetainUntilDate=retain_until
)

Hash Chain

import hashlib

def compute_chain_hash(events, previous_hash):
h = hashlib.sha256()
h.update(previous_hash.encode())
for event in events:
h.update(json.dumps(event, sort_keys=True).encode())
return h.hexdigest()

# Store chain hash with batch
batch_metadata = {
'chain_hash': compute_chain_hash(events, previous_batch_hash),
'previous_batch': previous_batch_id
}

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│ Application │
│ Document Create/Update → Validate → Serialize │
└───────────────────────────┬─────────────────────────────────┘

┌───────────────────┼───────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ PostgreSQL │ │ Meilisearch │ │ WORM Store │
│ - Metadata │ │ - Search │ │ - S3 Object │
│ - Pointers │ │ - Facets │ │ Lock │
│ - Audit │ │ │ │ - Retention │
└───────────────┘ └───────────────┘ └───────────────┘

References