Skip to main content

02 — AI/ML Pipeline: Forecasting, NLQ, Model Serving & Explainability

Domain: Machine learning infrastructure, LLM serving, forecasting, natural language queries Dependencies: 01-Data Architecture (schema for training data and provenance tables) Outputs: Training pipelines, model registry, inference APIs, explainability framework


ROLE

You are a Senior ML Engineer specializing in financial AI systems with expertise in time-series forecasting, local LLM deployment, model explainability for regulated industries, and agentic AI pipelines.


OBJECTIVE

Design the complete AI/ML infrastructure for an FP&A platform that runs entirely air-gapped with self-hosted models, produces auditable and explainable predictions, and supports natural language financial queries.


DELIVERABLES

D1. Forecasting Engine (NeuralProphet)

Training Pipeline:

  • Data extraction from fact_gl_transactions → feature engineering → model training
  • Automatic model selection: NeuralProphet vs. SARIMAX vs. XGBoost vs. ensemble
  • Selection criteria: data volume, seasonality presence, trend stability, outlier density
  • Hyperparameter tuning via Optuna with financial-specific constraints
  • Nightly retraining on new actuals with drift detection (PSI, KS-test)
  • Model versioning with full lineage (training data hash, parameters, metrics)

Forecast Types:

  • Revenue forecast (by entity, product line, customer segment)
  • Expense forecast (by cost center, GL account category)
  • Cash flow forecast (net of AR/AP aging + scheduled payments)
  • Rolling 12/18/24-month horizons with confidence intervals

Accuracy Metrics:

  • MAPE, RMSE, MAE per forecast type
  • Quality scoring: A (MAPE <5%), B (5-10%), C (10-20%), D (>20%)
  • Backtesting framework: train on T-12, predict T, compare to actuals

D2. LLM Serving (Self-Hosted)

Infrastructure:

  • vLLM serving DeepSeek-R1 (32B or 70B) on GPU instances
  • Ollama as fallback for smaller models (7B) on CPU-only deployments
  • Model routing: complexity-based selection (small → Haiku-class, medium → Sonnet-class, compliance → Opus-class equivalent)

Financial LLM Use Cases:

  • Variance explanation generation
  • Executive narrative writing (board books, CFO memos)
  • NLQ → SQL translation
  • Transaction categorization
  • Anomaly explanation
  • Tax optimization suggestions

Guardrails:

  • Financial accuracy validation (LLM outputs cross-checked against DB)
  • Hallucination detection (claim verification against source data)
  • Confidence scoring on every output
  • PII/financial data filtering in prompts

D3. Natural Language Query (NLQ) Engine

Pipeline:

User Question → Intent Classification → SQL Generation → DuckDB Execution
→ Result Formatting → Plotly Visualization → Narrative Generation

Requirements:

  • Intent classification: query type (drill-down, comparison, trend, anomaly, forecast)
  • Schema-aware SQL generation (inject table/column metadata as context)
  • Parameterized queries only (prevent SQL injection)
  • Chart type auto-selection based on query type and result shape
  • Narrative that explains the data in plain language
  • Multi-turn conversation with context memory

D4. Anomaly Detection (PyOD)

Models:

  • Isolation Forest for GL transaction outliers
  • ECOD for distribution-based anomaly scoring
  • AutoEncoder for pattern-break detection in time-series

Integration:

  • Real-time scoring on new transactions (< 100ms per transaction)
  • Batch scoring nightly for historical review
  • Alert generation with severity classification (INFO/WARNING/CRITICAL)
  • Explainable anomaly reasons ("Transaction $45K exceeds 3σ for Marketing category")

D5. AI Explainability & Provenance API

Schema (sys_ai_provenance):

ai_decision_id, tenant_id, timestamp,
model_name, model_version, model_hash,
input_features (JSONB), input_data_hash,
output_value, output_type, confidence_score,
reasoning_trace (TEXT), source_transaction_ids (UUID[]),
user_id (who triggered), execution_time_ms

API Endpoints (OpenAPI 3.0):

  • POST /ai/forecast — generate forecast with provenance
  • GET /ai/provenance/{decision_id} — retrieve full reasoning chain
  • GET /ai/provenance/trace/{forecast_id} — trace forecast → source transactions
  • POST /ai/nlq — natural language query with response + provenance
  • GET /ai/models — list active models with accuracy metrics
  • POST /ai/explain/{transaction_id} — explain anomaly or categorization

CONSTRAINTS

  • All models run locally — zero external API calls
  • Every AI output logged to sys_ai_provenance and forwarded to immudb
  • Forecast confidence intervals required (not just point estimates)
  • NLQ must refuse queries that would bypass RLS tenant isolation
  • Model training must not use data across tenant boundaries
  • GPU budget: single A100 (80GB) or 2x A10G for inference

RESEARCH QUESTIONS

  1. What is the optimal NeuralProphet configuration for financial GL time-series with strong monthly seasonality and fiscal year patterns?
  2. How should the model registry be structured for SOC 2 compliance (version control, access logging, rollback)?
  3. What SQL generation approach works best for financial schemas — fine-tuned model vs. few-shot prompting vs. semantic layer?
  4. How to implement drift detection that triggers retraining without causing forecast instability?
  5. What is the right confidence threshold for auto-categorization vs. human review queue?

ADRs TO PRODUCE

  • ADR-003: DeepSeek-R1 over Azure OpenAI (data sovereignty, cost, air-gap)
  • ADR-006: NeuralProphet over Prophet (accuracy, PyTorch ecosystem, explainability)
  • ADR-AI-001: Model routing strategy (which model for which task)
  • ADR-AI-002: NLQ SQL generation approach (fine-tuned vs. few-shot vs. semantic layer)