Skip to main content

FP&A Platform — AI Model Cards

Version: 1.0
Last Updated: 2026-02-03
Document ID: AI-001
Classification: Internal


Overview

This document provides model cards for all AI/ML models deployed in the FP&A Platform, following the Google Model Card framework. Each card documents model architecture, training data, performance metrics, limitations, and ethical considerations.


Model 1: Bank Reconciliation Matching Model

Model Details

AttributeValue
Model Namefpa-recon-matcher-v2.1
Model TypeEnsemble (XGBoost + Sentence Transformers)
Version2.1.0
Release Date2026-01-15
Frameworkscikit-learn 1.4, XGBoost 2.0, sentence-transformers 2.3
LicenseProprietary
OwnerAI/ML Team

Intended Use

Primary Use Case: Automatically matching bank transactions to GL journal entries during reconciliation.

Intended Users:

  • FP&A analysts performing bank reconciliations
  • Controllers reviewing reconciliation suggestions
  • Automated reconciliation workflows

Out-of-Scope Uses:

  • Fraud detection (use dedicated fraud model)
  • Credit decisioning
  • Transaction classification (use categorization model)

Model Architecture

┌─────────────────────────────────────────────────────────────┐
│ RECONCILIATION MATCHER v2.1 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Feature │ │ Text │ │ Rule │ │
│ │ Engineering │ │ Embedding │ │ Engine │ │
│ │ │ │ (MiniLM) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Feature Concatenation │ │
│ │ [amount_diff, date_diff, text_sim, rule_matches] │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ XGBoost Classifier │ │
│ │ (500 trees, max_depth=6, lr=0.1) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Confidence Calibration (Platt) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ Match Probability [0.0 - 1.0] │
└─────────────────────────────────────────────────────────────┘

Input Features (32 total):

  • amount_diff: Absolute difference in amounts
  • amount_diff_pct: Percentage difference
  • date_diff: Days between transactions
  • text_similarity: Cosine similarity of embeddings
  • payee_fuzzy_score: Levenshtein ratio of payee names
  • reference_match: Binary reference number match
  • check_number_match: Binary check number match
  • amount_bucket: Categorical amount range
  • day_of_week_match: Same day of week
  • month_match: Same month
  • + 22 rule-based binary features

Training Data

AttributeValue
DatasetInternal reconciliation history
Size2.4M transaction pairs
Positive/Negative Ratio1:15 (class imbalanced)
Time Range2022-01-01 to 2025-12-31
Entities847 unique entities
IndustriesManufacturing, Retail, Healthcare, Tech, Services

Data Preprocessing:

  • Amounts normalized to USD
  • Text cleaned (lowercase, remove punctuation)
  • Dates converted to days-from-reference
  • SMOTE oversampling for class balance

Data Splits:

  • Training: 70% (temporal, pre-2025-07)
  • Validation: 15% (2025-07 to 2025-09)
  • Test: 15% (2025-10 to 2025-12)

Evaluation Results

Overall Performance:

MetricValidationTestTarget
Accuracy94.2%93.8%≥90% ✅
Precision91.5%91.0%≥90% ✅
Recall87.3%86.8%≥85% ✅
F1 Score89.4%88.9%≥87% ✅
AUC-ROC0.9730.968≥0.95 ✅

Performance by Confidence Tier:

Confidence% of PredictionsAccuracyAction
≥0.9562%99.2%Auto-match
0.85-0.9518%94.5%Suggest with review
0.70-0.8512%82.3%Manual review
<0.708%61.2%No suggestion

Performance by Transaction Type:

TypeCountPrecisionRecallNotes
Wire transfers245K94.2%91.5%Strong reference matching
ACH890K92.1%88.3%Good description matching
Checks320K89.5%84.2%Check # matching helps
Credit cards156K88.3%82.1%Variable descriptions
Cash45K78.4%71.3%Limited features

Fairness Analysis

Segment Analysis:

SegmentAccuracyGap vs OverallStatus
Small entities (<$1M rev)92.1%-1.7%✅ Acceptable
Medium entities94.2%+0.4%
Large entities (>$100M)93.5%-0.3%
US entities94.1%+0.3%
Brazilian entities92.8%-1.0%✅ Acceptable
Healthcare industry91.9%-1.9%⚠️ Monitor

Bias Mitigation:

  • Resampled training data to balance industry representation
  • Added industry-specific features
  • Separate calibration per entity size tier

Limitations and Risks

Known Limitations:

  1. Minimum Training Data: Requires 1,000+ historical matches per entity for good performance
  2. Cold Start: New entities with no history default to rule-based matching
  3. Multi-currency: Performance degrades with frequent currency conversions
  4. Batch Transactions: Struggles with batched payments split differently
  5. Description Variance: Low performance when bank descriptions change format

Failure Modes:

Failure ModeFrequencyImpactMitigation
False positive (wrong match)2.1%High (accounting error)Confidence thresholds, human review
False negative (missed match)4.8%Medium (manual work)Fuzzy matching rules
Duplicate suggestion0.3%MediumDeduplication logic

Risk Assessment:

  • Financial Impact: False matches could cause accounting errors → Mitigated by SOX-compliant review process
  • Operational Impact: High false negative rate increases manual work → Continuous model improvement
  • Regulatory Impact: Audit trail required for all matches → immudb logging

Monitoring and Maintenance

Production Monitoring:

metrics:
- name: daily_accuracy
threshold: 0.88
alert: slack_channel_ml

- name: confidence_distribution
expected_p95: 0.92
alert_on: drift > 10%

- name: inference_latency_p99
threshold: 500ms
alert: pagerduty

- name: feature_drift
method: PSI
threshold: 0.1

Retraining Triggers:

  • Accuracy drops below 88% for 7 consecutive days
  • Feature drift (PSI > 0.1) detected
  • Monthly scheduled retraining
  • Major data schema changes

Model Versioning:

VersionDateChangesPerformance
2.1.02026-01-15Added Brazilian bank patterns+2.1% accuracy
2.0.02025-10-01Upgraded to sentence-transformers+4.5% accuracy
1.5.02025-07-01Added check number matching+1.8% accuracy

Ethical Considerations

Data Privacy:

  • Model trained on de-identified transaction data
  • No PII features used (names hashed, accounts masked)
  • Training data retained for 3 years for audit purposes

Transparency:

  • All match suggestions include confidence score
  • Explanation API provides feature importance for each suggestion
  • Audit trail captures model version used

Human Oversight:

  • Matches below 0.95 confidence require human review
  • All auto-matches can be reversed
  • Daily sampling audit of auto-matched transactions

Model 2: Cash Flow Forecasting Model

Model Details

AttributeValue
Model Namefpa-forecast-ensemble-v1.3
Model TypeEnsemble (NeuralProphet + ARIMA + XGBoost)
Version1.3.0
Release Date2026-01-20
FrameworkNeuralProphet 0.7, statsmodels 0.14, XGBoost 2.0
LicenseProprietary
OwnerAI/ML Team

Intended Use

Primary Use Case: Generating 13-week and 12-month cash flow forecasts.

Intended Users:

  • Treasury managers for liquidity planning
  • CFOs for strategic planning
  • FP&A analysts for scenario analysis

Out-of-Scope Uses:

  • Intraday cash positioning
  • FX rate prediction
  • Stock price forecasting

Model Architecture

┌─────────────────────────────────────────────────────────────┐
│ FORECAST ENSEMBLE v1.3 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ NeuralProphet │ │ ARIMA │ │ XGBoost │ │
│ │ (Primary) │ │ (Fallback) │ │ (Regressors) │ │
│ │ │ │ │ │ │ │
│ │ • Trend │ │ • AR(p) │ │ • Lag features│ │
│ │ • Seasonality │ │ • I(d) │ │ • External │ │
│ │ • Holidays │ │ • MA(q) │ │ drivers │ │
│ │ • Regressors │ │ │ │ │ │
│ └───────┬────────┘ └───────┬────────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Weighted Ensemble │ │
│ │ NeuralProphet: 0.5 ARIMA: 0.3 XGBoost: 0.2 │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Prediction Intervals │ │
│ │ (Conformal Prediction, 80/90/95%) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Input Features:

  • Historical cash flows (24+ months required)
  • Day-of-week, month, quarter indicators
  • Holiday calendars (US, Brazil)
  • Known future events (payroll dates, tax payments)
  • Economic indicators (optional: interest rates, GDP)
  • AR receipts aging schedule
  • AP payment schedule

Training Data

AttributeValue
DatasetHistorical cash flow data
Size156 entities × 36 months average
GranularityDaily cash positions
Time Range2021-01-01 to 2025-12-31
IndustriesMulti-industry

Data Quality Requirements:

  • Minimum 24 months of history
  • No gaps > 7 consecutive days
  • Consistent currency reporting

Evaluation Results

Point Forecast Accuracy:

HorizonMAPERMSEMAETarget
1 week4.2%$32K$28K<5% ✅
4 weeks6.8%$58K$49K<10% ✅
13 weeks9.1%$87K$72K<15% ✅
26 weeks12.3%$112K$95K<20% ✅
52 weeks18.7%$156K$134K<25% ✅

Prediction Interval Coverage:

IntervalExpectedActualStatus
80%80%81.2%
90%90%89.4%
95%95%94.8%

Performance by Industry:

IndustryMAPE (13-week)Notes
Manufacturing8.2%Seasonal patterns well-captured
Retail11.5%Holiday spikes challenging
Healthcare7.1%Regular payment cycles
Technology10.8%Lumpy revenue recognition
Services9.3%Moderate seasonality

Limitations and Risks

Known Limitations:

  1. Minimum History: Requires 24 months of data for reliable forecasts
  2. Regime Changes: Cannot predict M&A, pandemics, major business shifts
  3. External Shocks: No economic downturn prediction
  4. Seasonality: New businesses without seasonal history underperform

Failure Modes:

Failure ModeFrequencyMitigation
Extreme over-prediction2.3%Sanity checks (>50% YoY growth flagged)
Missed seasonality1.1%Manual override capability
Negative forecast0.4%Floor at zero with warning

Monitoring

Production Metrics:

  • MAPE by horizon (daily tracking)
  • Interval coverage (weekly)
  • Forecast bias (systematic over/under)
  • Model inference latency

Retraining Schedule: Weekly incremental, monthly full retrain


Model 3: Variance Analysis NLG Model

Model Details

AttributeValue
Model Namefpa-nlg-variance-v1.0
Model TypeFine-tuned LLM (DeepSeek-R1-32B)
Version1.0.0
Release Date2026-01-25
FrameworkvLLM, DeepSeek-R1
LicenseDeepSeek License + Proprietary Fine-tuning
OwnerAI/ML Team

Intended Use

Primary Use Case: Generating natural language commentary for budget variance reports.

Intended Users:

  • CFOs reviewing monthly results
  • FP&A analysts preparing board decks
  • Controllers documenting variances

Model Architecture

┌─────────────────────────────────────────────────────────────┐
│ NLG VARIANCE COMMENTARY v1.0 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ DeepSeek-R1-32B Base │ │
│ │ (Quantized INT8 for inference) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ LoRA Fine-tuning Layer │ │
│ │ (rank=16, α=32, trained on CFO commentaries) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Structured Prompt Template │ │
│ │ │ │
│ │ System: You are a senior financial analyst... │ │
│ │ Context: {variance_data_json} │ │
│ │ Task: Generate 3-5 paragraph executive summary... │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Output Validation │ │
│ │ • Factual grounding check │ │
│ │ • Number verification │ │
│ │ • Tone consistency │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Training Data

AttributeValue
Fine-tuning DatasetCFO commentary examples
Size12,500 variance report + commentary pairs
SourceAnonymized customer data, public earnings calls
QualityHuman-reviewed, edited for clarity

Evaluation Results

Automated Metrics:

MetricScoreTarget
ROUGE-L0.42>0.35 ✅
BERTScore F10.87>0.80 ✅
Factual Accuracy94.2%>90% ✅
Number Accuracy98.1%>95% ✅

Human Evaluation (n=200):

CriterionScore (1-5)Target
Clarity4.3>4.0 ✅
Accuracy4.5>4.0 ✅
Usefulness4.1>4.0 ✅
Tone Appropriateness4.4>4.0 ✅

Limitations and Risks

Hallucination Prevention:

  • All numbers must appear in input data
  • Citation required for specific claims
  • Post-generation validation checks
  • Human review recommended for external use

Known Limitations:

  1. Cannot generate novel business insights
  2. May miss industry-specific context
  3. Tone can be overly formal for some audiences
  4. Does not handle non-English well (yet)

Ethical Considerations

Content Safety:

  • No generation of financial advice
  • Disclaimers required for external distribution
  • Bias review for sentiment analysis components

Transparency:

  • AI-generated content labeled
  • Human can edit/override all outputs
  • Audit trail of all generations

Model 4: Transaction Anomaly Detection Model

Model Details

AttributeValue
Model Namefpa-anomaly-detector-v1.2
Model TypeEnsemble (Isolation Forest + Statistical)
Version1.2.0
Release Date2025-11-15
Frameworkscikit-learn 1.4, scipy 1.11

Intended Use

Primary Use Case: Detecting unusual transactions that may indicate errors, fraud, or control failures.

Out-of-Scope: This is NOT a fraud detection system. Refer to dedicated fraud models for that purpose.

Model Architecture

┌──────────────────────────────────────────────────────────┐
│ ANOMALY DETECTION ENSEMBLE │
├──────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Isolation │ │ Z-Score │ │ Business Rule │ │
│ │ Forest │ │ Detector │ │ Detector │ │
│ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Anomaly Aggregator │ │
│ │ Score = 0.4×IF + 0.3×ZScore + 0.3×Rules │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘

Evaluation Results

MetricValueTarget
Precision@10078.5%>70% ✅
Recall (known anomalies)91.2%>85% ✅
False Positive Rate3.2%<5% ✅

Limitations

  • High false positive rate for seasonal businesses
  • Cannot detect collusion or sophisticated fraud
  • Requires 12+ months of baseline data

Appendix: Model Governance

Model Approval Process

1. Development → 2. Validation → 3. Approval → 4. Deployment → 5. Monitoring
↑ ↓
└───── Retraining ←─────────────┘

Required Approvals

Model Risk TierApprovers
High (financial decisions)ML Lead + CFO + Compliance
Medium (suggestions)ML Lead + Product Owner
Low (internal tooling)ML Lead

Model Inventory

Model IDNameRisk TierLast ReviewNext Review
ML-001Reconciliation MatcherHigh2026-01-152026-04-15
ML-002Forecast EnsembleHigh2026-01-202026-04-20
ML-003NLG VarianceMedium2026-01-252026-07-25
ML-004Anomaly DetectorMedium2025-11-152026-05-15
ML-005Transaction CategorizerLow2025-09-012026-09-01

AI Model Cards v1.0 — FP&A Platform Document ID: AI-001