FP&A Platform — AI Model Cards

Version: 1.0
Last Updated: 2026-02-03
Document ID: AI-001
Classification: Internal

Overview

This document provides model cards for all AI/ML models deployed in the FP&A Platform, following the Google Model Card framework. Each card documents model architecture, training data, performance metrics, limitations, and ethical considerations.

Model 1: Bank Reconciliation Matching Model

Model Details

Attribute	Value
Model Name	`fpa-recon-matcher-v2.1`
Model Type	Ensemble (XGBoost + Sentence Transformers)
Version	2.1.0
Release Date	2026-01-15
Framework	scikit-learn 1.4, XGBoost 2.0, sentence-transformers 2.3
License	Proprietary
Owner	AI/ML Team

Intended Use

Primary Use Case: Automatically matching bank transactions to GL journal entries during reconciliation.

Intended Users:

FP&A analysts performing bank reconciliations
Controllers reviewing reconciliation suggestions
Automated reconciliation workflows

Out-of-Scope Uses:

Fraud detection (use dedicated fraud model)
Credit decisioning
Transaction classification (use categorization model)

Model Architecture

┌─────────────────────────────────────────────────────────────┐
│                 RECONCILIATION MATCHER v2.1                  │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │   Feature    │    │   Text       │    │   Rule       │  │
│  │  Engineering │    │  Embedding   │    │   Engine     │  │
│  │              │    │  (MiniLM)    │    │              │  │
│  └──────┬───────┘    └──────┬───────┘    └──────┬───────┘  │
│         │                   │                   │           │
│         ▼                   ▼                   ▼           │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              Feature Concatenation                    │  │
│  │   [amount_diff, date_diff, text_sim, rule_matches]   │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │                XGBoost Classifier                     │  │
│  │        (500 trees, max_depth=6, lr=0.1)              │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           Confidence Calibration (Platt)              │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│              Match Probability [0.0 - 1.0]                  │
└─────────────────────────────────────────────────────────────┘

Input Features (32 total):

amount_diff: Absolute difference in amounts
amount_diff_pct: Percentage difference
date_diff: Days between transactions
text_similarity: Cosine similarity of embeddings
payee_fuzzy_score: Levenshtein ratio of payee names
reference_match: Binary reference number match
check_number_match: Binary check number match
amount_bucket: Categorical amount range
day_of_week_match: Same day of week
month_match: Same month
+ 22 rule-based binary features

Training Data

Attribute	Value
Dataset	Internal reconciliation history
Size	2.4M transaction pairs
Positive/Negative Ratio	1:15 (class imbalanced)
Time Range	2022-01-01 to 2025-12-31
Entities	847 unique entities
Industries	Manufacturing, Retail, Healthcare, Tech, Services

Data Preprocessing:

Amounts normalized to USD
Text cleaned (lowercase, remove punctuation)
Dates converted to days-from-reference
SMOTE oversampling for class balance

Data Splits:

Training: 70% (temporal, pre-2025-07)
Validation: 15% (2025-07 to 2025-09)
Test: 15% (2025-10 to 2025-12)

Evaluation Results

Overall Performance:

Metric	Validation	Test	Target
Accuracy	94.2%	93.8%	≥90% ✅
Precision	91.5%	91.0%	≥90% ✅
Recall	87.3%	86.8%	≥85% ✅
F1 Score	89.4%	88.9%	≥87% ✅
AUC-ROC	0.973	0.968	≥0.95 ✅

Performance by Confidence Tier:

Confidence	% of Predictions	Accuracy	Action
≥0.95	62%	99.2%	Auto-match
0.85-0.95	18%	94.5%	Suggest with review
0.70-0.85	12%	82.3%	Manual review
<0.70	8%	61.2%	No suggestion

Performance by Transaction Type:

Type	Count	Precision	Recall	Notes
Wire transfers	245K	94.2%	91.5%	Strong reference matching
ACH	890K	92.1%	88.3%	Good description matching
Checks	320K	89.5%	84.2%	Check # matching helps
Credit cards	156K	88.3%	82.1%	Variable descriptions
Cash	45K	78.4%	71.3%	Limited features

Fairness Analysis

Segment Analysis:

Segment	Accuracy	Gap vs Overall	Status
Small entities (<$1M rev)	92.1%	-1.7%	✅ Acceptable
Medium entities	94.2%	+0.4%	✅
Large entities (>$100M)	93.5%	-0.3%	✅
US entities	94.1%	+0.3%	✅
Brazilian entities	92.8%	-1.0%	✅ Acceptable
Healthcare industry	91.9%	-1.9%	⚠️ Monitor

Bias Mitigation:

Resampled training data to balance industry representation
Added industry-specific features
Separate calibration per entity size tier

Limitations and Risks

Known Limitations:

Minimum Training Data: Requires 1,000+ historical matches per entity for good performance
Cold Start: New entities with no history default to rule-based matching
Multi-currency: Performance degrades with frequent currency conversions
Batch Transactions: Struggles with batched payments split differently
Description Variance: Low performance when bank descriptions change format

Failure Modes:

Failure Mode	Frequency	Impact	Mitigation
False positive (wrong match)	2.1%	High (accounting error)	Confidence thresholds, human review
False negative (missed match)	4.8%	Medium (manual work)	Fuzzy matching rules
Duplicate suggestion	0.3%	Medium	Deduplication logic

Risk Assessment:

Financial Impact: False matches could cause accounting errors → Mitigated by SOX-compliant review process
Operational Impact: High false negative rate increases manual work → Continuous model improvement
Regulatory Impact: Audit trail required for all matches → immudb logging

Monitoring and Maintenance

Production Monitoring:

metrics:
  - name: daily_accuracy
    threshold: 0.88
    alert: slack_channel_ml
  
  - name: confidence_distribution
    expected_p95: 0.92
    alert_on: drift > 10%
  
  - name: inference_latency_p99
    threshold: 500ms
    alert: pagerduty
  
  - name: feature_drift
    method: PSI
    threshold: 0.1

Retraining Triggers:

Accuracy drops below 88% for 7 consecutive days
Feature drift (PSI > 0.1) detected
Monthly scheduled retraining
Major data schema changes

Model Versioning:

Version	Date	Changes	Performance
2.1.0	2026-01-15	Added Brazilian bank patterns	+2.1% accuracy
2.0.0	2025-10-01	Upgraded to sentence-transformers	+4.5% accuracy
1.5.0	2025-07-01	Added check number matching	+1.8% accuracy

Ethical Considerations

Data Privacy:

Model trained on de-identified transaction data
No PII features used (names hashed, accounts masked)
Training data retained for 3 years for audit purposes

Transparency:

All match suggestions include confidence score
Explanation API provides feature importance for each suggestion
Audit trail captures model version used

Human Oversight:

Matches below 0.95 confidence require human review
All auto-matches can be reversed
Daily sampling audit of auto-matched transactions

Model 2: Cash Flow Forecasting Model

Model Details

Attribute	Value
Model Name	`fpa-forecast-ensemble-v1.3`
Model Type	Ensemble (NeuralProphet + ARIMA + XGBoost)
Version	1.3.0
Release Date	2026-01-20
Framework	NeuralProphet 0.7, statsmodels 0.14, XGBoost 2.0
License	Proprietary
Owner	AI/ML Team

Intended Use

Primary Use Case: Generating 13-week and 12-month cash flow forecasts.

Intended Users:

Treasury managers for liquidity planning
CFOs for strategic planning
FP&A analysts for scenario analysis

Out-of-Scope Uses:

Intraday cash positioning
FX rate prediction
Stock price forecasting

Model Architecture

┌─────────────────────────────────────────────────────────────┐
│              FORECAST ENSEMBLE v1.3                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌────────────────┐  ┌────────────────┐  ┌────────────────┐ │
│  │  NeuralProphet │  │     ARIMA      │  │    XGBoost    │ │
│  │   (Primary)    │  │   (Fallback)   │  │  (Regressors) │ │
│  │                │  │                │  │               │ │
│  │ • Trend        │  │ • AR(p)        │  │ • Lag features│ │
│  │ • Seasonality  │  │ • I(d)         │  │ • External    │ │
│  │ • Holidays     │  │ • MA(q)        │  │   drivers     │ │
│  │ • Regressors   │  │                │  │               │ │
│  └───────┬────────┘  └───────┬────────┘  └───────┬───────┘ │
│          │                   │                   │          │
│          ▼                   ▼                   ▼          │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              Weighted Ensemble                        │  │
│  │     NeuralProphet: 0.5  ARIMA: 0.3  XGBoost: 0.2     │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │           Prediction Intervals                        │  │
│  │        (Conformal Prediction, 80/90/95%)             │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Input Features:

Historical cash flows (24+ months required)
Day-of-week, month, quarter indicators
Holiday calendars (US, Brazil)
Known future events (payroll dates, tax payments)
Economic indicators (optional: interest rates, GDP)
AR receipts aging schedule
AP payment schedule

Training Data

Attribute	Value
Dataset	Historical cash flow data
Size	156 entities × 36 months average
Granularity	Daily cash positions
Time Range	2021-01-01 to 2025-12-31
Industries	Multi-industry

Data Quality Requirements:

Minimum 24 months of history
No gaps > 7 consecutive days
Consistent currency reporting

Evaluation Results

Point Forecast Accuracy:

Horizon	MAPE	RMSE	MAE	Target
1 week	4.2%	$32K	$28K	<5% ✅
4 weeks	6.8%	$58K	$49K	<10% ✅
13 weeks	9.1%	$87K	$72K	<15% ✅
26 weeks	12.3%	$112K	$95K	<20% ✅
52 weeks	18.7%	$156K	$134K	<25% ✅

Prediction Interval Coverage:

Interval	Expected	Actual	Status
80%	80%	81.2%	✅
90%	90%	89.4%	✅
95%	95%	94.8%	✅

Performance by Industry:

Industry	MAPE (13-week)	Notes
Manufacturing	8.2%	Seasonal patterns well-captured
Retail	11.5%	Holiday spikes challenging
Healthcare	7.1%	Regular payment cycles
Technology	10.8%	Lumpy revenue recognition
Services	9.3%	Moderate seasonality

Limitations and Risks

Known Limitations:

Minimum History: Requires 24 months of data for reliable forecasts
Regime Changes: Cannot predict M&A, pandemics, major business shifts
External Shocks: No economic downturn prediction
Seasonality: New businesses without seasonal history underperform

Failure Modes:

Failure Mode	Frequency	Mitigation
Extreme over-prediction	2.3%	Sanity checks (>50% YoY growth flagged)
Missed seasonality	1.1%	Manual override capability
Negative forecast	0.4%	Floor at zero with warning

Monitoring

Production Metrics:

MAPE by horizon (daily tracking)
Interval coverage (weekly)
Forecast bias (systematic over/under)
Model inference latency

Retraining Schedule: Weekly incremental, monthly full retrain

Model 3: Variance Analysis NLG Model

Model Details

Attribute	Value
Model Name	`fpa-nlg-variance-v1.0`
Model Type	Fine-tuned LLM (DeepSeek-R1-32B)
Version	1.0.0
Release Date	2026-01-25
Framework	vLLM, DeepSeek-R1
License	DeepSeek License + Proprietary Fine-tuning
Owner	AI/ML Team

Intended Use

Primary Use Case: Generating natural language commentary for budget variance reports.

Intended Users:

CFOs reviewing monthly results
FP&A analysts preparing board decks
Controllers documenting variances

Model Architecture

┌─────────────────────────────────────────────────────────────┐
│              NLG VARIANCE COMMENTARY v1.0                    │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              DeepSeek-R1-32B Base                     │  │
│  │            (Quantized INT8 for inference)             │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              LoRA Fine-tuning Layer                   │  │
│  │     (rank=16, α=32, trained on CFO commentaries)     │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              Structured Prompt Template               │  │
│  │                                                       │  │
│  │  System: You are a senior financial analyst...       │  │
│  │  Context: {variance_data_json}                       │  │
│  │  Task: Generate 3-5 paragraph executive summary...   │  │
│  └──────────────────────────┬───────────────────────────┘  │
│                             │                               │
│                             ▼                               │
│  ┌──────────────────────────────────────────────────────┐  │
│  │              Output Validation                        │  │
│  │     • Factual grounding check                        │  │
│  │     • Number verification                            │  │
│  │     • Tone consistency                               │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Training Data

Attribute	Value
Fine-tuning Dataset	CFO commentary examples
Size	12,500 variance report + commentary pairs
Source	Anonymized customer data, public earnings calls
Quality	Human-reviewed, edited for clarity

Evaluation Results

Automated Metrics:

Metric	Score	Target
ROUGE-L	0.42	>0.35 ✅
BERTScore F1	0.87	>0.80 ✅
Factual Accuracy	94.2%	>90% ✅
Number Accuracy	98.1%	>95% ✅

Human Evaluation (n=200):

Criterion	Score (1-5)	Target
Clarity	4.3	>4.0 ✅
Accuracy	4.5	>4.0 ✅
Usefulness	4.1	>4.0 ✅
Tone Appropriateness	4.4	>4.0 ✅

Limitations and Risks

Hallucination Prevention:

All numbers must appear in input data
Citation required for specific claims
Post-generation validation checks
Human review recommended for external use

Known Limitations:

Cannot generate novel business insights
May miss industry-specific context
Tone can be overly formal for some audiences
Does not handle non-English well (yet)

Ethical Considerations

Content Safety:

No generation of financial advice
Disclaimers required for external distribution
Bias review for sentiment analysis components

Transparency:

AI-generated content labeled
Human can edit/override all outputs
Audit trail of all generations

Model 4: Transaction Anomaly Detection Model

Model Details

Attribute	Value
Model Name	`fpa-anomaly-detector-v1.2`
Model Type	Ensemble (Isolation Forest + Statistical)
Version	1.2.0
Release Date	2025-11-15
Framework	scikit-learn 1.4, scipy 1.11

Intended Use

Primary Use Case: Detecting unusual transactions that may indicate errors, fraud, or control failures.

Out-of-Scope: This is NOT a fraud detection system. Refer to dedicated fraud models for that purpose.

Model Architecture

┌──────────────────────────────────────────────────────────┐
│            ANOMALY DETECTION ENSEMBLE                     │
├──────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │
│  │  Isolation  │  │  Z-Score    │  │  Business Rule  │  │
│  │   Forest    │  │  Detector   │  │    Detector     │  │
│  └──────┬──────┘  └──────┬──────┘  └────────┬────────┘  │
│         │                │                   │           │
│         ▼                ▼                   ▼           │
│  ┌──────────────────────────────────────────────────┐   │
│  │              Anomaly Aggregator                   │   │
│  │   Score = 0.4×IF + 0.3×ZScore + 0.3×Rules        │   │
│  └──────────────────────────────────────────────────┘   │
└──────────────────────────────────────────────────────────┘

Evaluation Results

Metric	Value	Target
Precision@100	78.5%	>70% ✅
Recall (known anomalies)	91.2%	>85% ✅
False Positive Rate	3.2%	<5% ✅

Limitations

High false positive rate for seasonal businesses
Cannot detect collusion or sophisticated fraud
Requires 12+ months of baseline data

Appendix: Model Governance

Model Approval Process

1. Development → 2. Validation → 3. Approval → 4. Deployment → 5. Monitoring
       ↑                               ↓
       └───── Retraining ←─────────────┘

Required Approvals

Model Risk Tier	Approvers
High (financial decisions)	ML Lead + CFO + Compliance
Medium (suggestions)	ML Lead + Product Owner
Low (internal tooling)	ML Lead

Model Inventory

Model ID	Name	Risk Tier	Last Review	Next Review
ML-001	Reconciliation Matcher	High	2026-01-15	2026-04-15
ML-002	Forecast Ensemble	High	2026-01-20	2026-04-20
ML-003	NLG Variance	Medium	2026-01-25	2026-07-25
ML-004	Anomaly Detector	Medium	2025-11-15	2026-05-15
ML-005	Transaction Categorizer	Low	2025-09-01	2026-09-01

AI Model Cards v1.0 — FP&A Platform Document ID: AI-001

Overview​

Model 1: Bank Reconciliation Matching Model​

Model Details​

Intended Use​

Model Architecture​

Training Data​

Evaluation Results​

Fairness Analysis​

Limitations and Risks​

Monitoring and Maintenance​

Ethical Considerations​

Model 2: Cash Flow Forecasting Model​

Model Details​

Intended Use​

Model Architecture​

Training Data​

Evaluation Results​

Limitations and Risks​

Monitoring​

Model 3: Variance Analysis NLG Model​

Model Details​

Intended Use​

Model Architecture​

Training Data​

Evaluation Results​

Limitations and Risks​

Ethical Considerations​

Model 4: Transaction Anomaly Detection Model​

Model Details​

Intended Use​

Model Architecture​

Evaluation Results​

Limitations​

Appendix: Model Governance​

Model Approval Process​

Required Approvals​

Model Inventory​

Overview

Model 1: Bank Reconciliation Matching Model

Model Details

Intended Use

Model Architecture

Training Data

Evaluation Results

Fairness Analysis

Limitations and Risks

Monitoring and Maintenance

Ethical Considerations

Model 2: Cash Flow Forecasting Model

Model Details

Intended Use

Model Architecture

Training Data

Evaluation Results

Limitations and Risks

Monitoring

Model 3: Variance Analysis NLG Model

Model Details

Intended Use

Model Architecture

Training Data

Evaluation Results

Limitations and Risks

Ethical Considerations

Model 4: Transaction Anomaly Detection Model

Model Details

Intended Use

Model Architecture

Evaluation Results

Limitations

Appendix: Model Governance

Model Approval Process

Required Approvals

Model Inventory