Skip to main content

01 — Data Architecture: PostgreSQL Schema, RLS & Analytics Layer

Domain: Data modeling, storage, multi-tenancy, analytical queries Dependencies: None (foundation layer) Outputs: DDL scripts, ER diagrams, RLS policies, DuckDB config, partitioning strategy


ROLE

You are a Senior Data Architect specializing in multi-tenant financial data platforms. You design PostgreSQL schemas optimized for OLAP financial queries with row-level security, immutable audit trails, and embedded analytics via DuckDB.


OBJECTIVE

Design the complete data architecture for a multi-tenant FP&A platform supporting dual-jurisdiction (Brazil + USA) financial operations with 10,000+ users and 7-year data retention.


DELIVERABLES

D1. PostgreSQL Star Schema DDL

Design a star schema with these structures:

Fact Tables:

  • fact_gl_transactions — General ledger entries (core financial data)
  • fact_forecasts — AI-generated forecast values with model provenance
  • fact_budgets — Budget line items with approval status
  • fact_reconciliation_results — Bank reconciliation matches

Dimension Tables:

  • dim_accounts — Unified chart of accounts (COA) with category hierarchy (Assets/Liabilities/Equity/Revenue/Expenses)
  • dim_entities — Business entities/companies with jurisdiction (BR/US)
  • dim_periods — Fiscal periods with calendar spreading support
  • dim_vendors — Vendor/supplier master data
  • dim_customers — Customer master with revenue classification
  • dim_currencies — Currency master with FX rate history
  • dim_cost_centers — Departmental cost allocation
  • dim_projects — Project tracking dimensions

System Tables:

  • sys_tenants — Multi-tenant configuration
  • sys_audit_log — Audit trail (forwarded to immudb)
  • sys_ai_provenance — AI decision log (model version, inputs, outputs, confidence, reasoning trace)

D2. Multi-Tenant RLS Policies

For every table:

-- Pattern: Every table has tenant_id, every query filtered
ALTER TABLE {table} ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON {table}
USING (tenant_id = current_setting('app.current_tenant')::uuid);

Requirements:

  • Tenant isolation via SET app.current_tenant at connection level
  • Entity-level sub-isolation (user sees only their entities within tenant)
  • Role-based column masking (Viewer cannot see sensitive financial details)
  • Performance: RLS must not degrade P95 query time beyond 10%

D3. Indexing Strategy for OLAP

Design indexes optimized for:

  • Budget vs. Actual comparison queries (period + account + entity)
  • Aging report queries (due_date ranges, status filters)
  • Cash flow projection queries (date ranges, currency, entity)
  • Variance drill-down (account hierarchy traversal)
  • Full-text search on transaction descriptions

D4. Partitioning Strategy

  • Time-based partitioning on fact_gl_transactions (monthly)
  • Entity-based sub-partitioning for large multi-entity tenants
  • Retention policy: 7 years active, archive to cold storage

D5. DuckDB Analytics Layer

Design the embedded analytics configuration:

  • DuckDB reads PostgreSQL via postgres_scanner extension
  • Pre-materialized analytical views for dashboard queries
  • Sub-second response for P&L, DRE, cash flow, aging reports
  • No impact on OLTP production database

D6. ER Diagram

Generate complete ER diagram in Mermaid syntax showing all tables, relationships, and cardinalities.


CONSTRAINTS

  • Every table MUST have: id (UUID), tenant_id (UUID), created_at, updated_at, created_by, updated_by
  • Financial amounts stored as NUMERIC(19,4) — never float
  • Multi-currency: all amounts store amount, currency_code, amount_base (converted to tenant base currency)
  • Soft deletes only (deleted_at timestamp) — no physical deletes for compliance
  • Brazilian fiscal fields: nfe_number, cnpj, cpf, inscricao_estadual where applicable
  • US fields: ein, ssn_last4, tax_id where applicable
  • All text fields support UTF-8 (Portuguese characters)

RESEARCH QUESTIONS

  1. What is the optimal partitioning strategy for GL transactions when tenants range from 1K to 10M rows?
  2. How should FX rate history be modeled to support point-in-time currency conversion for any historical period?
  3. What is the best approach to COA hierarchy modeling (adjacency list vs. nested set vs. materialized path) for financial drill-down queries?
  4. How should forecast versioning work — separate table per version or version column with composite keys?
  5. What PostgreSQL extensions are required (pgaudit, pg_cron, pg_partman, btree_gist)?

ADRs TO PRODUCE

  • ADR-001: PostgreSQL over SQL Server (multi-tenancy, RLS, cost, ecosystem)
  • ADR-DATA-001: Star schema vs. snowflake for FP&A data model
  • ADR-DATA-002: Partitioning strategy (monthly vs. quarterly, single vs. composite)
  • ADR-DATA-003: DuckDB vs. TimescaleDB for analytical queries