Track L: Data & Business Intelligence
Priority: MEDIUM — Analytics for growth decisions
Agent: data-pipeline-specialist, data-warehouse-architect
Sprint Range: S7-S9
Reference: docs/operations/63-data-architecture.md (22 tables, L0-L4 classification)
Status Summary
Progress: 0% (0/22 tasks)
| Section | Title | Status | Tasks |
|---|---|---|---|
| L.1 | Data Warehouse Architecture | Pending | 0/4 |
| L.2 | ETL/ELT Pipeline | Pending | 0/4 |
| L.3 | BI Dashboards & Reporting | Pending | 0/5 |
| L.4 | Data Governance & Quality | Pending | 0/4 |
| L.5 | Regulatory Reporting Automation | Pending | 0/5 |
L.1: Data Warehouse Architecture
Sprint: S6 | Priority: P1 | Depends On: C.1, E.2 Goal: Star schema warehouse with CDC and data lake for regulatory retention
- L.1.1: Design data warehouse schema
- Model: star schema — fact tables (WO events, billing events, usage events) + dimensions (time, tenant, user, asset)
- Separation: operational DB (PostgreSQL) vs. analytical DB (BigQuery)
- Classification: data classification mapping from L0-L4 per doc 63
- L.1.2: Implement CDC (Change Data Capture)
- Source: PostgreSQL logical replication to warehouse
- Tool: Debezium or Cloud Datastream
- Schema evolution: handling for upstream changes
- L.1.3: Build data lake for raw events
- Storage: GCS bucket with partitioned Parquet files
- Layers: raw → cleansed → curated architecture
- Retention: 24 months hot, 7 years cold (regulatory)
- L.1.4: Create data catalog
- Documentation: table/column documentation with business glossary
- Lineage: tracking (source → transformation → destination)
- Access control: per classification level (L0-L4)
L.2: ETL/ELT Pipeline
Sprint: S7-S8 | Priority: P1 | Depends On: L.1 Goal: Automated data pipelines with quality checks and orchestration
- L.2.1: Build core data pipelines
- WO lifecycle: raw events → dimensional model
- User activity: login, feature usage → engagement metrics
- Financial: subscriptions, invoices → revenue metrics
- L.2.2: Implement data transformation layer
- Tool: dbt (data build tool) for SQL transformations
- Materialized views: for common queries
- Incremental processing: for efficiency
- L.2.3: Create pipeline orchestration
- Tool: Cloud Composer (Airflow) or Cloud Workflows
- Schedule: real-time for events, hourly for aggregates, daily for reports
- Alerting: pipeline failures with auto-retry
- L.2.4: Build data quality checks
- Validation: row count, null checks, referential integrity
- Freshness: stale data alerting
- Contracts: Great Expectations or dbt tests for data contracts
L.3: BI Dashboards & Reporting
Sprint: S8 | Priority: P1 | Depends On: L.1, L.2 Goal: Executive, operational, compliance, and customer-facing dashboards
- L.3.1: Build executive dashboard
- Metrics: ARR, customer count, NRR, logo churn, gross margin
- Trends: monthly, quarterly, YoY
- Segments: by tier, ICP, region
- L.3.2: Create operational dashboard
- WO metrics: throughput, cycle time, SLA compliance
- AI metrics: agent autonomy rate, token consumption
- Support: ticket volume, resolution time
- L.3.3: Build compliance dashboard
- Readiness: audit readiness score per tenant
- Open items: CAPAs, overdue deviations, training compliance %
- Coverage: regulatory framework heat map
- L.3.4: Create customer-facing analytics
- Embedded: analytics in customer portal
- WO performance: per organization
- Compliance: metrics and trend reports
- L.3.5: Implement report scheduling and distribution
- Delivery: automated email (daily, weekly, monthly)
- Format: PDF report generation
- Custom: report builder for power users
L.4: Data Governance & Quality
Sprint: S8-S9 | Priority: P1 | Depends On: L.1 Goal: Role-based access, quality monitoring, privacy compliance, and retention automation
- L.4.1: Implement data access governance
- RBAC: role-based access to warehouse tables
- Column-level: security for L3/L4 data
- Audit: access audit logging
- L.4.2: Build data quality monitoring
- Scoring: automated data quality scoring per table
- Anomaly detection: on key metrics
- SLA: data quality SLA and reporting
- L.4.3: Create data privacy compliance
- PII inventory: mapping across all tables
- Right-to-erasure: implementation in warehouse
- Cross-border: data transfer tracking
- L.4.4: Implement data retention automation
- Policies: per-regulation retention (Part 11: 7yr, HIPAA: 6yr per doc 63)
- Archival: automated to cold storage
- Reporting: retention compliance reporting
L.5: Regulatory Reporting Automation
Sprint: S9 | Priority: P2 | Depends On: L.2, L.3 Goal: Automated evidence packages for FDA, HIPAA, and SOC 2 audits
- L.5.1: Build FDA submission data package
- CSV evidence: Computer System Validation export
- Audit trail: extracts (date range, user, action filters)
- Validation: IQ/OQ/PQ test execution evidence
- L.5.2: Create HIPAA reporting suite
- Access reports: frequency per PHI field
- Anomaly: unusual access pattern detection
- Breach: notification data preparation
- L.5.3: Build SOC 2 evidence automation
- Controls: effectiveness reports per Trust Service Criteria
- Snapshots: monthly evidence with tamper detection
- Auditor portal: read-only, time-limited access
- L.5.4: Create audit readiness dashboard
- Score: real-time audit readiness score per framework (FDA, HIPAA, SOC 2)
- Evidence gaps: automated identification of missing or stale evidence
- Timeline: audit preparation timeline with task assignments
- Historical: past audit findings tracking and remediation status
- L.5.5: Implement regulatory submission automation
- Packaging: automated evidence package assembly per submission type
- Formatting: agency-specific format requirements (FDA eCTD, EMA NeeS)
- Tracking: submission tracking with agency response monitoring
- Templates: configurable submission templates per regulatory pathway
Updated: 2026-02-14 Compliance: CODITECT Track Nomenclature Standard (ADR-054)