Skip to main content

Sidecar Metadata Files vs Database

Both patterns are used; regulators care that metadata is preserved immutably and consistently, not where it lives.

Comparison Table

AspectSidecar/YAML-onlyCentral DB (Postgres)
Source of truthMetadata travels with content in Git/object storeClear, queryable system of record
Human reviewEasy to view/edit in editors; good for dev workflowsRequires UI/tooling; less diff-friendly
Query & analyticsNeeds indexing step; no ad-hoc SQLNatural for dashboards, joins, reports
Immutability (WORM)Git history can act as log; but Git alone may not satisfy WORMDB must be backed by WORM/append-only logs
ComplexitySimpler infra; more complex analyticsMore infra; simpler analytics
RegulatoryMust prove metadata wasn't altered independentlyStronger central controls, validations

Schema Reference

Data Structure

field_name:
type: string
required: true
description: Field description
example: "example_value"

API Reference

Endpoint Overview

MethodEndpointDescription
GET/api/v1/resourceList resources
POST/api/v1/resourceCreate resource
PUT/api/v1/resource/:idUpdate resource
DELETE/api/v1/resource/:idDelete resource

Sidecar Pattern (Frontmatter/YAML)

Structure

documents/
├── policy-001.md
├── policy-001.yaml # Optional sidecar
└── policy-002.md

Or embedded:

# policy-001.md
---
doc_id: "uuid"
title: "HIPAA Privacy Policy"
retention_category: "HIPAA-6Y"
# ... all metadata in frontmatter
---

# Policy Content

...

Pros

  • Metadata travels with content
  • Version controlled in Git
  • Easy for developers to edit
  • No additional infrastructure

Cons

  • No ad-hoc SQL queries
  • Requires parsing for analytics
  • Must build indexing pipeline
  • WORM compliance harder to prove

Database Pattern (PostgreSQL)

Structure

documents (
doc_id UUID PRIMARY KEY,
path TEXT,
content_hash TEXT,
...
)

document_metadata (
doc_id UUID PRIMARY KEY,
title TEXT,
retention_category TEXT,
...
)

Pros

  • Rich querying with SQL
  • Easy compliance dashboards
  • Central validation rules
  • Natural for RBAC/ABAC
  • Audit logging built-in

Cons

  • Additional infrastructure
  • Sync issues with content
  • Developers need tooling
  • Must archive DB for WORM

For HIPAA/FINRA-grade systems, combine both:

Authoring Layer

  • Metadata in frontmatter/sidecar
  • Tightly coupled with Markdown
  • Developer-friendly workflow

Operational Layer

  • Parsed metadata normalized into PostgreSQL
  • Rich querying and analytics
  • RBAC/ABAC enforcement

Archival Layer

  • Both content and metadata archived to WORM
  • Preserved together for compliance
  • Immutable snapshots

Implementation Flow

┌─────────────────────────────────┐
│ Markdown + Frontmatter │
│ (Git Repository) │
└───────────────┬─────────────────┘
│ Parse & Validate

┌─────────────────────────────────┐
│ PostgreSQL │
│ - Normalized metadata │
│ - RBAC/ABAC attributes │
│ - Retention tracking │
└───────────────┬─────────────────┘
│ Archive

┌─────────────────────────────────┐
│ WORM Storage │
│ - Content + metadata blob │
│ - Object lock retention │
│ - Tamper-evident │
└─────────────────────────────────┘

Sync Strategy

On Markdown Change

  1. Parse frontmatter → validate
  2. Upsert metadata row(s) in PostgreSQL
  3. Trigger reindex in search engine
  4. Archive to WORM if approved

Integrity Verification

  • Content hash in DB matches file
  • Periodic reconciliation jobs
  • Alert on mismatches

Regulatory Compliance

For Sidecar-Only

  • Archive entire repo to WORM snapshots
  • Prove metadata wasn't altered independently
  • Cryptographic signing of commits

For Database

  • DB backed by WORM/append-only logs
  • Audit trail in separate WORM store
  • Periodic DB snapshots to WORM

For Hybrid (Best)

  • Frontmatter for authoring
  • DB for operations
  • Both archived to WORM together
  • Clear chain of custody

References