02 — AI/ML Pipeline: Forecasting, NLQ, Model Serving & Explainability
Domain: Machine learning infrastructure, LLM serving, forecasting, natural language queries Dependencies: 01-Data Architecture (schema for training data and provenance tables) Outputs: Training pipelines, model registry, inference APIs, explainability framework
ROLE
You are a Senior ML Engineer specializing in financial AI systems with expertise in time-series forecasting, local LLM deployment, model explainability for regulated industries, and agentic AI pipelines.
OBJECTIVE
Design the complete AI/ML infrastructure for an FP&A platform that runs entirely air-gapped with self-hosted models, produces auditable and explainable predictions, and supports natural language financial queries.
DELIVERABLES
D1. Forecasting Engine (NeuralProphet)
Training Pipeline:
- Data extraction from
fact_gl_transactions→ feature engineering → model training - Automatic model selection: NeuralProphet vs. SARIMAX vs. XGBoost vs. ensemble
- Selection criteria: data volume, seasonality presence, trend stability, outlier density
- Hyperparameter tuning via Optuna with financial-specific constraints
- Nightly retraining on new actuals with drift detection (PSI, KS-test)
- Model versioning with full lineage (training data hash, parameters, metrics)
Forecast Types:
- Revenue forecast (by entity, product line, customer segment)
- Expense forecast (by cost center, GL account category)
- Cash flow forecast (net of AR/AP aging + scheduled payments)
- Rolling 12/18/24-month horizons with confidence intervals
Accuracy Metrics:
- MAPE, RMSE, MAE per forecast type
- Quality scoring: A (MAPE <5%), B (5-10%), C (10-20%), D (>20%)
- Backtesting framework: train on T-12, predict T, compare to actuals
D2. LLM Serving (Self-Hosted)
Infrastructure:
- vLLM serving DeepSeek-R1 (32B or 70B) on GPU instances
- Ollama as fallback for smaller models (7B) on CPU-only deployments
- Model routing: complexity-based selection (small → Haiku-class, medium → Sonnet-class, compliance → Opus-class equivalent)
Financial LLM Use Cases:
- Variance explanation generation
- Executive narrative writing (board books, CFO memos)
- NLQ → SQL translation
- Transaction categorization
- Anomaly explanation
- Tax optimization suggestions
Guardrails:
- Financial accuracy validation (LLM outputs cross-checked against DB)
- Hallucination detection (claim verification against source data)
- Confidence scoring on every output
- PII/financial data filtering in prompts
D3. Natural Language Query (NLQ) Engine
Pipeline:
User Question → Intent Classification → SQL Generation → DuckDB Execution
→ Result Formatting → Plotly Visualization → Narrative Generation
Requirements:
- Intent classification: query type (drill-down, comparison, trend, anomaly, forecast)
- Schema-aware SQL generation (inject table/column metadata as context)
- Parameterized queries only (prevent SQL injection)
- Chart type auto-selection based on query type and result shape
- Narrative that explains the data in plain language
- Multi-turn conversation with context memory
D4. Anomaly Detection (PyOD)
Models:
- Isolation Forest for GL transaction outliers
- ECOD for distribution-based anomaly scoring
- AutoEncoder for pattern-break detection in time-series
Integration:
- Real-time scoring on new transactions (< 100ms per transaction)
- Batch scoring nightly for historical review
- Alert generation with severity classification (INFO/WARNING/CRITICAL)
- Explainable anomaly reasons ("Transaction $45K exceeds 3σ for Marketing category")
D5. AI Explainability & Provenance API
Schema (sys_ai_provenance):
ai_decision_id, tenant_id, timestamp,
model_name, model_version, model_hash,
input_features (JSONB), input_data_hash,
output_value, output_type, confidence_score,
reasoning_trace (TEXT), source_transaction_ids (UUID[]),
user_id (who triggered), execution_time_ms
API Endpoints (OpenAPI 3.0):
POST /ai/forecast— generate forecast with provenanceGET /ai/provenance/{decision_id}— retrieve full reasoning chainGET /ai/provenance/trace/{forecast_id}— trace forecast → source transactionsPOST /ai/nlq— natural language query with response + provenanceGET /ai/models— list active models with accuracy metricsPOST /ai/explain/{transaction_id}— explain anomaly or categorization
CONSTRAINTS
- All models run locally — zero external API calls
- Every AI output logged to
sys_ai_provenanceand forwarded to immudb - Forecast confidence intervals required (not just point estimates)
- NLQ must refuse queries that would bypass RLS tenant isolation
- Model training must not use data across tenant boundaries
- GPU budget: single A100 (80GB) or 2x A10G for inference
RESEARCH QUESTIONS
- What is the optimal NeuralProphet configuration for financial GL time-series with strong monthly seasonality and fiscal year patterns?
- How should the model registry be structured for SOC 2 compliance (version control, access logging, rollback)?
- What SQL generation approach works best for financial schemas — fine-tuned model vs. few-shot prompting vs. semantic layer?
- How to implement drift detection that triggers retraining without causing forecast instability?
- What is the right confidence threshold for auto-categorization vs. human review queue?
ADRs TO PRODUCE
- ADR-003: DeepSeek-R1 over Azure OpenAI (data sovereignty, cost, air-gap)
- ADR-006: NeuralProphet over Prophet (accuracy, PyTorch ecosystem, explainability)
- ADR-AI-001: Model routing strategy (which model for which task)
- ADR-AI-002: NLQ SQL generation approach (fine-tuned vs. few-shot vs. semantic layer)