02 — AI/ML Pipeline: Forecasting, NLQ, Model Serving & Explainability

Domain: Machine learning infrastructure, LLM serving, forecasting, natural language queries Dependencies: 01-Data Architecture (schema for training data and provenance tables) Outputs: Training pipelines, model registry, inference APIs, explainability framework

ROLE

You are a Senior ML Engineer specializing in financial AI systems with expertise in time-series forecasting, local LLM deployment, model explainability for regulated industries, and agentic AI pipelines.

OBJECTIVE

Design the complete AI/ML infrastructure for an FP&A platform that runs entirely air-gapped with self-hosted models, produces auditable and explainable predictions, and supports natural language financial queries.

DELIVERABLES

D1. Forecasting Engine (NeuralProphet)

Training Pipeline:

Data extraction from fact_gl_transactions → feature engineering → model training
Automatic model selection: NeuralProphet vs. SARIMAX vs. XGBoost vs. ensemble
Selection criteria: data volume, seasonality presence, trend stability, outlier density
Hyperparameter tuning via Optuna with financial-specific constraints
Nightly retraining on new actuals with drift detection (PSI, KS-test)
Model versioning with full lineage (training data hash, parameters, metrics)

Forecast Types:

Revenue forecast (by entity, product line, customer segment)
Expense forecast (by cost center, GL account category)
Cash flow forecast (net of AR/AP aging + scheduled payments)
Rolling 12/18/24-month horizons with confidence intervals

Accuracy Metrics:

MAPE, RMSE, MAE per forecast type
Quality scoring: A (MAPE <5%), B (5-10%), C (10-20%), D (>20%)
Backtesting framework: train on T-12, predict T, compare to actuals

D2. LLM Serving (Self-Hosted)

Infrastructure:

vLLM serving DeepSeek-R1 (32B or 70B) on GPU instances
Ollama as fallback for smaller models (7B) on CPU-only deployments
Model routing: complexity-based selection (small → Haiku-class, medium → Sonnet-class, compliance → Opus-class equivalent)

Financial LLM Use Cases:

Variance explanation generation
Executive narrative writing (board books, CFO memos)
NLQ → SQL translation
Transaction categorization
Anomaly explanation
Tax optimization suggestions

Guardrails:

Financial accuracy validation (LLM outputs cross-checked against DB)
Hallucination detection (claim verification against source data)
Confidence scoring on every output
PII/financial data filtering in prompts

D3. Natural Language Query (NLQ) Engine

Pipeline:

User Question → Intent Classification → SQL Generation → DuckDB Execution
    → Result Formatting → Plotly Visualization → Narrative Generation

Requirements:

Intent classification: query type (drill-down, comparison, trend, anomaly, forecast)
Schema-aware SQL generation (inject table/column metadata as context)
Parameterized queries only (prevent SQL injection)
Chart type auto-selection based on query type and result shape
Narrative that explains the data in plain language
Multi-turn conversation with context memory

D4. Anomaly Detection (PyOD)

Models:

Isolation Forest for GL transaction outliers
ECOD for distribution-based anomaly scoring
AutoEncoder for pattern-break detection in time-series

Integration:

Real-time scoring on new transactions (< 100ms per transaction)
Batch scoring nightly for historical review
Alert generation with severity classification (INFO/WARNING/CRITICAL)
Explainable anomaly reasons ("Transaction $45K exceeds 3σ for Marketing category")

D5. AI Explainability & Provenance API

Schema (sys_ai_provenance):

ai_decision_id, tenant_id, timestamp,
model_name, model_version, model_hash,
input_features (JSONB), input_data_hash,
output_value, output_type, confidence_score,
reasoning_trace (TEXT), source_transaction_ids (UUID[]),
user_id (who triggered), execution_time_ms

API Endpoints (OpenAPI 3.0):

POST /ai/forecast — generate forecast with provenance
GET /ai/provenance/{decision_id} — retrieve full reasoning chain
GET /ai/provenance/trace/{forecast_id} — trace forecast → source transactions
POST /ai/nlq — natural language query with response + provenance
GET /ai/models — list active models with accuracy metrics
POST /ai/explain/{transaction_id} — explain anomaly or categorization

CONSTRAINTS

All models run locally — zero external API calls
Every AI output logged to sys_ai_provenance and forwarded to immudb
Forecast confidence intervals required (not just point estimates)
NLQ must refuse queries that would bypass RLS tenant isolation
Model training must not use data across tenant boundaries
GPU budget: single A100 (80GB) or 2x A10G for inference

RESEARCH QUESTIONS

What is the optimal NeuralProphet configuration for financial GL time-series with strong monthly seasonality and fiscal year patterns?
How should the model registry be structured for SOC 2 compliance (version control, access logging, rollback)?
What SQL generation approach works best for financial schemas — fine-tuned model vs. few-shot prompting vs. semantic layer?
How to implement drift detection that triggers retraining without causing forecast instability?
What is the right confidence threshold for auto-categorization vs. human review queue?

ADRs TO PRODUCE

ADR-003: DeepSeek-R1 over Azure OpenAI (data sovereignty, cost, air-gap)
ADR-006: NeuralProphet over Prophet (accuracy, PyTorch ecosystem, explainability)
ADR-AI-001: Model routing strategy (which model for which task)
ADR-AI-002: NLQ SQL generation approach (fine-tuned vs. few-shot vs. semantic layer)

ROLE​

OBJECTIVE​

DELIVERABLES​

D1. Forecasting Engine (NeuralProphet)​

D2. LLM Serving (Self-Hosted)​

D3. Natural Language Query (NLQ) Engine​

D4. Anomaly Detection (PyOD)​

D5. AI Explainability & Provenance API​

CONSTRAINTS​

RESEARCH QUESTIONS​

ADRs TO PRODUCE​