Architecture Decision Records (ADRs)
AI-First Open-Source FP&A Platform
ADR-001: PostgreSQL as Primary Database
Status: Accepted
Date: 2026-02-03
Decision Makers: Platform Architecture Team
Context
The platform requires a reliable, scalable OLTP database for financial data with strict consistency requirements, multi-tenant isolation, and regulatory compliance capabilities.
Decision
Adopt PostgreSQL 16 as the primary OLTP database.
Rationale
| Factor | PostgreSQL | SQL Server | MySQL |
|---|---|---|---|
| Cost | Open source | $15K+/core | Open source |
| RLS Support | Native | Limited | None |
| JSONB | Native | JSON | JSON |
| Extensions | Rich ecosystem | Limited | Limited |
| Cloud Support | All providers | Azure-preferred | All providers |
| Compliance | Excellent | Excellent | Good |
Consequences
Positive:
- Zero license cost
- Native Row-Level Security for multi-tenancy
- Rich extension ecosystem (TimescaleDB, pgvector)
- Excellent tooling and community
Negative:
- Team training needed (from SQL Server background)
- Different optimization patterns
Implementation Notes
- Use RLS for tenant isolation
- Enable logical replication for CDC
- Partition large tables by date
- Use pgvector for embeddings
ADR-002: TimescaleDB for Time-Series Data
Status: Accepted
Date: 2026-02-03
Context
Financial forecasts, metrics, and audit logs generate large volumes of time-series data requiring efficient storage and querying.
Decision
Deploy TimescaleDB 2.x as PostgreSQL extension for time-series workloads.
Rationale
- 10-100x compression vs standard PostgreSQL
- Native PostgreSQL compatibility
- Continuous aggregates for real-time dashboards
- Automatic data retention policies
Consequences
Positive:
- Single database technology (PostgreSQL + extension)
- Efficient storage for forecast values
- Built-in downsampling
Negative:
- Additional operational complexity
- Learning curve for hypertables
ADR-003: Airbyte for Data Integration
Status: Accepted
Date: 2026-02-03
Context
The platform must integrate with 50+ ERP systems, banking APIs, and other data sources with minimal custom development.
Decision
Adopt Airbyte as the primary ELT platform for data integration.
Rationale
| Factor | Airbyte | Meltano | Fivetran |
|---|---|---|---|
| Connectors | 600+ | 300+ | 400+ |
| Cost | Open source | Open source | $$/connector |
| Self-hosted | Yes | Yes | No |
| Custom connectors | CDK | SDK | Limited |
| Brazilian ERPs | Community | Limited | None |
Consequences
Positive:
- 600+ pre-built connectors
- Open-source, self-hosted option
- Active community developing Brazilian ERP connectors
- Visual mapping interface
Negative:
- Operational overhead for self-hosting
- Some connector quality varies
Implementation Notes
- Deploy Airbyte on Kubernetes
- Use CDC mode where available
- Implement dbt for transformations
- Build custom connectors for Omie, Conta Azul
ADR-004: Dagster for Pipeline Orchestration
Status: Accepted
Date: 2026-02-03
Context
Financial close processes require reliable, observable workflow orchestration with complex dependencies and compliance requirements.
Decision
Adopt Dagster 1.6+ as the data pipeline orchestration platform.
Rationale
| Factor | Dagster | Airflow | Prefect |
|---|---|---|---|
| Asset-centric | Yes | No | Partial |
| Data lineage | Native | Limited | Limited |
| Testing | Excellent | Basic | Good |
| Observability | Built-in | External | Built-in |
| Compliance | Audit logs | Manual | Manual |
Consequences
Positive:
- Asset-centric paradigm fits FP&A well
- Native data lineage for audit
- Software-defined assets for consistency
- Built-in testing framework
Negative:
- Smaller community than Airflow
- Team retraining required
ADR-005: DeepSeek-R1 for Local LLM
Status: Accepted
Date: 2026-02-03
Context
Regulated financial environments require local LLM deployment to prevent sensitive data from leaving the infrastructure while maintaining reasoning capabilities.
Decision
Deploy DeepSeek-R1-Distill-Qwen-32B as the primary local LLM for financial reasoning.
Rationale
| Factor | DeepSeek-R1-32B | Llama 3.1-70B | GPT-4 API |
|---|---|---|---|
| Reasoning | Excellent | Good | Excellent |
| Local Deploy | Yes (2x A100) | Yes (4x A100) | No |
| Cost/month | $8K (hardware) | $15K (hardware) | $20K+ (API) |
| Data Privacy | Full control | Full control | Third-party |
| Compliance | SOC2/HIPAA ready | SOC2/HIPAA ready | Depends |
Consequences
Positive:
- Full data sovereignty
- Predictable costs
- No API rate limits
- Compliance friendly
Negative:
- GPU infrastructure required
- Model updates require redeployment
- Smaller context than GPT-4
Implementation Notes
- Deploy via vLLM with tensor parallelism
- Use INT8 quantization for efficiency
- Implement prompt caching
- Monitor GPU utilization
ADR-006: LangGraph for Agent Workflows
Status: Accepted
Date: 2026-02-03
Context
AI agents for FP&A (reconciliation, variance analysis, forecasting) require stateful, controllable workflows with human-in-the-loop capabilities and compliance checkpoints.
Decision
Adopt LangGraph 0.1+ as the agent workflow framework.
Rationale
| Factor | LangGraph | CrewAI | AutoGPT |
|---|---|---|---|
| Determinism | High | Medium | Low |
| State Management | PostgreSQL native | In-memory | External |
| Compliance | Checkpoints built-in | Manual | None |
| Control Flow | Explicit graphs | Implicit | Autonomous |
| Human-in-loop | Native | Limited | Limited |
Consequences
Positive:
- Deterministic, auditable workflows
- PostgreSQL-based state persistence
- Native checkpoint/resume
- Explicit compliance gates
Negative:
- More code than declarative frameworks
- Learning curve for graph-based thinking
Implementation Notes
# Core pattern for FP&A agents
workflow = StateGraph(AgentState)
workflow.add_node("compliance_check", compliance_gate)
workflow.add_conditional_edges("action", should_escalate, {...})
ADR-007: OpenFGA for Authorization
Status: Accepted
Date: 2026-02-03
Context
Multi-tenant FP&A platform requires fine-grained authorization with relationship-based access control, segregation of duties, and audit capabilities.
Decision
Adopt OpenFGA as the authorization engine following the Zanzibar model.
Rationale
| Factor | OpenFGA | Cerbos | Custom RBAC |
|---|---|---|---|
| Model | ReBAC (Zanzibar) | Policy-based | Role-based |
| Performance | Sub-ms | Sub-ms | Varies |
| Segregation of Duties | Native | Manual | Manual |
| Audit Trail | Built-in | Limited | Manual |
| Complexity | Medium | Medium | High |
Consequences
Positive:
- Relationship-based authorization
- Native segregation of duties (approver ≠ creator)
- Sub-millisecond check latency
- Audit-ready
Negative:
- New paradigm for team
- Requires careful model design
Authorization Model Summary
organization → tenant → account/journal_entry/budget
user → can_view/can_edit/can_approve
Segregation: can_approve = admin AND NOT author
ADR-008: immudb for Audit Trails
Status: Accepted
Date: 2026-02-03
Context
Financial platforms require tamper-evident audit trails for SOX, SOC 2, and FDA 21 CFR Part 11 compliance with cryptographic verification capabilities.
Decision
Adopt immudb as the immutable audit log database alongside PostgreSQL.
Rationale
| Factor | immudb | PostgreSQL | Blockchain |
|---|---|---|---|
| Immutability | Cryptographic | Logical (can bypass) | Cryptographic |
| Verification | Merkle proofs | None | Native |
| Performance | 100K+ ops/sec | Excellent | Poor |
| Complexity | Low | N/A | Very High |
| Compliance | SOX/SOC2 ready | Needs controls | Overkill |
Consequences
Positive:
- Cryptographic tamper evidence
- Built-in verification APIs
- High performance
- SQL-like query interface
Negative:
- Additional database to operate
- Synchronization complexity
Implementation Notes
- Write all state changes to immudb
- Store verification hashes in PostgreSQL
- Implement periodic verification jobs
- Generate compliance reports from immudb
ADR-009: NeuralProphet for Forecasting
Status: Accepted
Date: 2026-02-03
Context
Cash flow and financial forecasting requires accurate, explainable predictions that can incorporate business drivers and seasonal patterns.
Decision
Adopt NeuralProphet 0.8+ as the primary forecasting engine with ensemble fallbacks.
Rationale
| Factor | NeuralProphet | Prophet | ARIMA | XGBoost |
|---|---|---|---|---|
| Accuracy | Highest | Good | Medium | High |
| Explainability | High | High | Low | Low |
| Seasonality | Auto-detect | Manual | Manual | Manual |
| Regressors | Native | Native | External | Native |
| GPU Acceleration | Yes | No | No | Yes |
Consequences
Positive:
- 55-92% accuracy improvement over Prophet
- Interpretable components
- GPU-accelerated training
- Native trend changepoint detection
Negative:
- More complex than Prophet
- Longer training times
Implementation Notes
- Use ensemble with XGBoost fallback
- Implement automatic model selection
- Track accuracy metrics per account type
- Retrain weekly with latest data
ADR-010: Multi-Tenant RLS Strategy
Status: Accepted
Date: 2026-02-03
Context
Platform serves multiple tenants with strict data isolation requirements. Must prevent cross-tenant data leaks while maintaining query performance.
Decision
Implement Row-Level Security (RLS) at the database level with application-level tenant context.
Rationale
- Defense in depth (even SQL injection can't access other tenants)
- Transparent to application code
- Single database, lower operational cost
- Supported natively by PostgreSQL
Implementation Pattern
-- Enable RLS
ALTER TABLE accounts ENABLE ROW LEVEL SECURITY;
-- Create policy
CREATE POLICY tenant_isolation ON accounts
USING (tenant_id = current_setting('app.current_tenant')::UUID);
-- Set context at connection
SET app.current_tenant = '550e8400-...';
Consequences
Positive:
- Database-enforced isolation
- Cannot bypass via application bugs
- Works with all PostgreSQL tools
Negative:
- Slight performance overhead (~5%)
- Must manage connection-level context
- Backup/restore requires care
ADR-011: Event-Driven Architecture
Status: Accepted
Date: 2026-02-03
Context
Loose coupling between services, audit trail generation, and real-time updates require an event-driven communication pattern.
Decision
Adopt Apache Kafka as the event backbone with event sourcing for critical domains.
Rationale
- Guaranteed delivery for financial events
- Natural audit trail from event log
- Enables real-time dashboards
- Decouples services effectively
Event Categories
| Category | Examples | Retention |
|---|---|---|
| Domain Events | journal_entry.posted | 1 year |
| Integration Events | sync.completed | 30 days |
| Audit Events | user.login | 7 years |
| Analytics Events | report.viewed | 90 days |
Consequences
Positive:
- Loose coupling
- Natural audit trail
- Enable CQRS pattern
Negative:
- Eventual consistency
- Operational complexity
ADR-012: API-First Design
Status: Accepted
Date: 2026-02-03
Context
Platform must support multiple clients (web, mobile, API) and enable third-party integrations.
Decision
Adopt API-first design with OpenAPI 3.1 specification as the contract.
Standards
- REST for CRUD operations
- WebSocket for real-time updates
- GraphQL considered but rejected (complexity)
- gRPC for internal service communication
Versioning Strategy
- URL versioning:
/v1/,/v2/ - No breaking changes within version
- 12-month deprecation notice
- Sunset header for deprecated endpoints
ADR-013: Encryption Strategy
Status: Accepted
Date: 2026-02-03
Context
Financial data requires encryption at rest and in transit with proper key management.
Decision
Implement comprehensive encryption:
- TLS 1.3 for all network traffic
- AES-256 for data at rest
- AWS KMS / HashiCorp Vault for key management
- Field-level encryption for PII
Implementation
| Layer | Method | Key Management |
|---|---|---|
| Network | TLS 1.3 | Certificate Manager |
| Database | TDE (Transparent) | KMS |
| Backups | AES-256-GCM | KMS |
| PII Fields | Application-level | Vault |
ADR-014: Observability Strategy
Status: Accepted
Date: 2026-02-03
Context
Production systems require comprehensive observability for debugging, performance optimization, and compliance monitoring.
Decision
Adopt OpenTelemetry as the standard with the following stack:
- Tracing: Jaeger / Tempo
- Metrics: Prometheus + Grafana
- Logging: Loki / ELK
- Alerting: PagerDuty integration
Key Metrics
| Category | Metrics |
|---|---|
| Business | close_days, reconciliation_rate, forecast_accuracy |
| Performance | latency_p95, throughput, error_rate |
| Infrastructure | cpu, memory, disk, network |
| AI | llm_latency, token_usage, accuracy |
ADR-015: Compliance Framework
Status: Accepted
Date: 2026-02-03
Context
Platform serves regulated industries requiring SOX, SOC 2, HIPAA, and FDA 21 CFR Part 11 compliance.
Decision
Implement compliance as a first-class architectural concern with:
- Automated evidence collection
- Policy-as-code for controls
- Continuous compliance monitoring
- Audit-ready documentation
Framework Mapping
| Control | SOX | SOC 2 | FDA 21 CFR Part 11 |
|---|---|---|---|
| Audit Trail | 302/404 | CC6.1 | 11.10(e) |
| Access Control | 302 | CC6.1-6.3 | 11.10(d) |
| Data Integrity | 404 | CC1.1 | 11.10(a) |
| Electronic Signatures | N/A | N/A | 11.50-11.200 |
| Change Management | 404 | CC8.1 | 11.10(k) |
Implementation
- OpenFGA for access control
- immudb for audit trails
- Git-based change management
- Automated compliance reports
Summary Table
| ADR | Decision | Status |
|---|---|---|
| ADR-001 | PostgreSQL 16 for OLTP | Accepted |
| ADR-002 | TimescaleDB for time-series | Accepted |
| ADR-003 | Airbyte for ELT | Accepted |
| ADR-004 | Dagster for orchestration | Accepted |
| ADR-005 | DeepSeek-R1 for local LLM | Accepted |
| ADR-006 | LangGraph for agents | Accepted |
| ADR-007 | OpenFGA for authorization | Accepted |
| ADR-008 | immudb for audit trails | Accepted |
| ADR-009 | NeuralProphet for forecasting | Accepted |
| ADR-010 | Multi-tenant RLS | Accepted |
| ADR-011 | Event-driven architecture | Accepted |
| ADR-012 | API-first design | Accepted |
| ADR-013 | Encryption strategy | Accepted |
| ADR-014 | Observability with OpenTelemetry | Accepted |
| ADR-015 | Compliance framework | Accepted |