FP&A Platform — AI Model Cards
Version: 1.0
Last Updated: 2026-02-03
Document ID: AI-001
Classification: Internal
Overview
This document provides model cards for all AI/ML models deployed in the FP&A Platform, following the Google Model Card framework. Each card documents model architecture, training data, performance metrics, limitations, and ethical considerations.
Model 1: Bank Reconciliation Matching Model
Model Details
| Attribute | Value |
|---|---|
| Model Name | fpa-recon-matcher-v2.1 |
| Model Type | Ensemble (XGBoost + Sentence Transformers) |
| Version | 2.1.0 |
| Release Date | 2026-01-15 |
| Framework | scikit-learn 1.4, XGBoost 2.0, sentence-transformers 2.3 |
| License | Proprietary |
| Owner | AI/ML Team |
Intended Use
Primary Use Case: Automatically matching bank transactions to GL journal entries during reconciliation.
Intended Users:
- FP&A analysts performing bank reconciliations
- Controllers reviewing reconciliation suggestions
- Automated reconciliation workflows
Out-of-Scope Uses:
- Fraud detection (use dedicated fraud model)
- Credit decisioning
- Transaction classification (use categorization model)
Model Architecture
┌─────────────────────────────────────────────────────────────┐
│ RECONCILIATION MATCHER v2.1 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Feature │ │ Text │ │ Rule │ │
│ │ Engineering │ │ Embedding │ │ Engine │ │
│ │ │ │ (MiniLM) │ │ │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Feature Concatenation │ │
│ │ [amount_diff, date_diff, text_sim, rule_matches] │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ XGBoost Classifier │ │
│ │ (500 trees, max_depth=6, lr=0.1) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Confidence Calibration (Platt) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ Match Probability [0.0 - 1.0] │
└─────────────────────────────────────────────────────────────┘
Input Features (32 total):
amount_diff: Absolute difference in amountsamount_diff_pct: Percentage differencedate_diff: Days between transactionstext_similarity: Cosine similarity of embeddingspayee_fuzzy_score: Levenshtein ratio of payee namesreference_match: Binary reference number matchcheck_number_match: Binary check number matchamount_bucket: Categorical amount rangeday_of_week_match: Same day of weekmonth_match: Same month- + 22 rule-based binary features
Training Data
| Attribute | Value |
|---|---|
| Dataset | Internal reconciliation history |
| Size | 2.4M transaction pairs |
| Positive/Negative Ratio | 1:15 (class imbalanced) |
| Time Range | 2022-01-01 to 2025-12-31 |
| Entities | 847 unique entities |
| Industries | Manufacturing, Retail, Healthcare, Tech, Services |
Data Preprocessing:
- Amounts normalized to USD
- Text cleaned (lowercase, remove punctuation)
- Dates converted to days-from-reference
- SMOTE oversampling for class balance
Data Splits:
- Training: 70% (temporal, pre-2025-07)
- Validation: 15% (2025-07 to 2025-09)
- Test: 15% (2025-10 to 2025-12)
Evaluation Results
Overall Performance:
| Metric | Validation | Test | Target |
|---|---|---|---|
| Accuracy | 94.2% | 93.8% | ≥90% ✅ |
| Precision | 91.5% | 91.0% | ≥90% ✅ |
| Recall | 87.3% | 86.8% | ≥85% ✅ |
| F1 Score | 89.4% | 88.9% | ≥87% ✅ |
| AUC-ROC | 0.973 | 0.968 | ≥0.95 ✅ |
Performance by Confidence Tier:
| Confidence | % of Predictions | Accuracy | Action |
|---|---|---|---|
| ≥0.95 | 62% | 99.2% | Auto-match |
| 0.85-0.95 | 18% | 94.5% | Suggest with review |
| 0.70-0.85 | 12% | 82.3% | Manual review |
| <0.70 | 8% | 61.2% | No suggestion |
Performance by Transaction Type:
| Type | Count | Precision | Recall | Notes |
|---|---|---|---|---|
| Wire transfers | 245K | 94.2% | 91.5% | Strong reference matching |
| ACH | 890K | 92.1% | 88.3% | Good description matching |
| Checks | 320K | 89.5% | 84.2% | Check # matching helps |
| Credit cards | 156K | 88.3% | 82.1% | Variable descriptions |
| Cash | 45K | 78.4% | 71.3% | Limited features |
Fairness Analysis
Segment Analysis:
| Segment | Accuracy | Gap vs Overall | Status |
|---|---|---|---|
| Small entities (<$1M rev) | 92.1% | -1.7% | ✅ Acceptable |
| Medium entities | 94.2% | +0.4% | ✅ |
| Large entities (>$100M) | 93.5% | -0.3% | ✅ |
| US entities | 94.1% | +0.3% | ✅ |
| Brazilian entities | 92.8% | -1.0% | ✅ Acceptable |
| Healthcare industry | 91.9% | -1.9% | ⚠️ Monitor |
Bias Mitigation:
- Resampled training data to balance industry representation
- Added industry-specific features
- Separate calibration per entity size tier
Limitations and Risks
Known Limitations:
- Minimum Training Data: Requires 1,000+ historical matches per entity for good performance
- Cold Start: New entities with no history default to rule-based matching
- Multi-currency: Performance degrades with frequent currency conversions
- Batch Transactions: Struggles with batched payments split differently
- Description Variance: Low performance when bank descriptions change format
Failure Modes:
| Failure Mode | Frequency | Impact | Mitigation |
|---|---|---|---|
| False positive (wrong match) | 2.1% | High (accounting error) | Confidence thresholds, human review |
| False negative (missed match) | 4.8% | Medium (manual work) | Fuzzy matching rules |
| Duplicate suggestion | 0.3% | Medium | Deduplication logic |
Risk Assessment:
- Financial Impact: False matches could cause accounting errors → Mitigated by SOX-compliant review process
- Operational Impact: High false negative rate increases manual work → Continuous model improvement
- Regulatory Impact: Audit trail required for all matches → immudb logging
Monitoring and Maintenance
Production Monitoring:
metrics:
- name: daily_accuracy
threshold: 0.88
alert: slack_channel_ml
- name: confidence_distribution
expected_p95: 0.92
alert_on: drift > 10%
- name: inference_latency_p99
threshold: 500ms
alert: pagerduty
- name: feature_drift
method: PSI
threshold: 0.1
Retraining Triggers:
- Accuracy drops below 88% for 7 consecutive days
- Feature drift (PSI > 0.1) detected
- Monthly scheduled retraining
- Major data schema changes
Model Versioning:
| Version | Date | Changes | Performance |
|---|---|---|---|
| 2.1.0 | 2026-01-15 | Added Brazilian bank patterns | +2.1% accuracy |
| 2.0.0 | 2025-10-01 | Upgraded to sentence-transformers | +4.5% accuracy |
| 1.5.0 | 2025-07-01 | Added check number matching | +1.8% accuracy |
Ethical Considerations
Data Privacy:
- Model trained on de-identified transaction data
- No PII features used (names hashed, accounts masked)
- Training data retained for 3 years for audit purposes
Transparency:
- All match suggestions include confidence score
- Explanation API provides feature importance for each suggestion
- Audit trail captures model version used
Human Oversight:
- Matches below 0.95 confidence require human review
- All auto-matches can be reversed
- Daily sampling audit of auto-matched transactions
Model 2: Cash Flow Forecasting Model
Model Details
| Attribute | Value |
|---|---|
| Model Name | fpa-forecast-ensemble-v1.3 |
| Model Type | Ensemble (NeuralProphet + ARIMA + XGBoost) |
| Version | 1.3.0 |
| Release Date | 2026-01-20 |
| Framework | NeuralProphet 0.7, statsmodels 0.14, XGBoost 2.0 |
| License | Proprietary |
| Owner | AI/ML Team |
Intended Use
Primary Use Case: Generating 13-week and 12-month cash flow forecasts.
Intended Users:
- Treasury managers for liquidity planning
- CFOs for strategic planning
- FP&A analysts for scenario analysis
Out-of-Scope Uses:
- Intraday cash positioning
- FX rate prediction
- Stock price forecasting
Model Architecture
┌─────────────────────────────────────────────────────────────┐
│ FORECAST ENSEMBLE v1.3 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ NeuralProphet │ │ ARIMA │ │ XGBoost │ │
│ │ (Primary) │ │ (Fallback) │ │ (Regressors) │ │
│ │ │ │ │ │ │ │
│ │ • Trend │ │ • AR(p) │ │ • Lag features│ │
│ │ • Seasonality │ │ • I(d) │ │ • External │ │
│ │ • Holidays │ │ • MA(q) │ │ drivers │ │
│ │ • Regressors │ │ │ │ │ │
│ └───────┬────────┘ └───────┬────────┘ └───────┬───────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Weighted Ensemble │ │
│ │ NeuralProphet: 0.5 ARIMA: 0.3 XGBoost: 0.2 │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Prediction Intervals │ │
│ │ (Conformal Prediction, 80/90/95%) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Input Features:
- Historical cash flows (24+ months required)
- Day-of-week, month, quarter indicators
- Holiday calendars (US, Brazil)
- Known future events (payroll dates, tax payments)
- Economic indicators (optional: interest rates, GDP)
- AR receipts aging schedule
- AP payment schedule
Training Data
| Attribute | Value |
|---|---|
| Dataset | Historical cash flow data |
| Size | 156 entities × 36 months average |
| Granularity | Daily cash positions |
| Time Range | 2021-01-01 to 2025-12-31 |
| Industries | Multi-industry |
Data Quality Requirements:
- Minimum 24 months of history
- No gaps > 7 consecutive days
- Consistent currency reporting
Evaluation Results
Point Forecast Accuracy:
| Horizon | MAPE | RMSE | MAE | Target |
|---|---|---|---|---|
| 1 week | 4.2% | $32K | $28K | <5% ✅ |
| 4 weeks | 6.8% | $58K | $49K | <10% ✅ |
| 13 weeks | 9.1% | $87K | $72K | <15% ✅ |
| 26 weeks | 12.3% | $112K | $95K | <20% ✅ |
| 52 weeks | 18.7% | $156K | $134K | <25% ✅ |
Prediction Interval Coverage:
| Interval | Expected | Actual | Status |
|---|---|---|---|
| 80% | 80% | 81.2% | ✅ |
| 90% | 90% | 89.4% | ✅ |
| 95% | 95% | 94.8% | ✅ |
Performance by Industry:
| Industry | MAPE (13-week) | Notes |
|---|---|---|
| Manufacturing | 8.2% | Seasonal patterns well-captured |
| Retail | 11.5% | Holiday spikes challenging |
| Healthcare | 7.1% | Regular payment cycles |
| Technology | 10.8% | Lumpy revenue recognition |
| Services | 9.3% | Moderate seasonality |
Limitations and Risks
Known Limitations:
- Minimum History: Requires 24 months of data for reliable forecasts
- Regime Changes: Cannot predict M&A, pandemics, major business shifts
- External Shocks: No economic downturn prediction
- Seasonality: New businesses without seasonal history underperform
Failure Modes:
| Failure Mode | Frequency | Mitigation |
|---|---|---|
| Extreme over-prediction | 2.3% | Sanity checks (>50% YoY growth flagged) |
| Missed seasonality | 1.1% | Manual override capability |
| Negative forecast | 0.4% | Floor at zero with warning |
Monitoring
Production Metrics:
- MAPE by horizon (daily tracking)
- Interval coverage (weekly)
- Forecast bias (systematic over/under)
- Model inference latency
Retraining Schedule: Weekly incremental, monthly full retrain
Model 3: Variance Analysis NLG Model
Model Details
| Attribute | Value |
|---|---|
| Model Name | fpa-nlg-variance-v1.0 |
| Model Type | Fine-tuned LLM (DeepSeek-R1-32B) |
| Version | 1.0.0 |
| Release Date | 2026-01-25 |
| Framework | vLLM, DeepSeek-R1 |
| License | DeepSeek License + Proprietary Fine-tuning |
| Owner | AI/ML Team |
Intended Use
Primary Use Case: Generating natural language commentary for budget variance reports.
Intended Users:
- CFOs reviewing monthly results
- FP&A analysts preparing board decks
- Controllers documenting variances
Model Architecture
┌─────────────────────────────────────────────────────────────┐
│ NLG VARIANCE COMMENTARY v1.0 │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ DeepSeek-R1-32B Base │ │
│ │ (Quantized INT8 for inference) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ LoRA Fine-tuning Layer │ │
│ │ (rank=16, α=32, trained on CFO commentaries) │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Structured Prompt Template │ │
│ │ │ │
│ │ System: You are a senior financial analyst... │ │
│ │ Context: {variance_data_json} │ │
│ │ Task: Generate 3-5 paragraph executive summary... │ │
│ └──────────────────────────┬───────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Output Validation │ │
│ │ • Factual grounding check │ │
│ │ • Number verification │ │
│ │ • Tone consistency │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Training Data
| Attribute | Value |
|---|---|
| Fine-tuning Dataset | CFO commentary examples |
| Size | 12,500 variance report + commentary pairs |
| Source | Anonymized customer data, public earnings calls |
| Quality | Human-reviewed, edited for clarity |
Evaluation Results
Automated Metrics:
| Metric | Score | Target |
|---|---|---|
| ROUGE-L | 0.42 | >0.35 ✅ |
| BERTScore F1 | 0.87 | >0.80 ✅ |
| Factual Accuracy | 94.2% | >90% ✅ |
| Number Accuracy | 98.1% | >95% ✅ |
Human Evaluation (n=200):
| Criterion | Score (1-5) | Target |
|---|---|---|
| Clarity | 4.3 | >4.0 ✅ |
| Accuracy | 4.5 | >4.0 ✅ |
| Usefulness | 4.1 | >4.0 ✅ |
| Tone Appropriateness | 4.4 | >4.0 ✅ |
Limitations and Risks
Hallucination Prevention:
- All numbers must appear in input data
- Citation required for specific claims
- Post-generation validation checks
- Human review recommended for external use
Known Limitations:
- Cannot generate novel business insights
- May miss industry-specific context
- Tone can be overly formal for some audiences
- Does not handle non-English well (yet)
Ethical Considerations
Content Safety:
- No generation of financial advice
- Disclaimers required for external distribution
- Bias review for sentiment analysis components
Transparency:
- AI-generated content labeled
- Human can edit/override all outputs
- Audit trail of all generations
Model 4: Transaction Anomaly Detection Model
Model Details
| Attribute | Value |
|---|---|
| Model Name | fpa-anomaly-detector-v1.2 |
| Model Type | Ensemble (Isolation Forest + Statistical) |
| Version | 1.2.0 |
| Release Date | 2025-11-15 |
| Framework | scikit-learn 1.4, scipy 1.11 |
Intended Use
Primary Use Case: Detecting unusual transactions that may indicate errors, fraud, or control failures.
Out-of-Scope: This is NOT a fraud detection system. Refer to dedicated fraud models for that purpose.
Model Architecture
┌──────────────────────────────────────────────────────────┐
│ ANOMALY DETECTION ENSEMBLE │
├──────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Isolation │ │ Z-Score │ │ Business Rule │ │
│ │ Forest │ │ Detector │ │ Detector │ │
│ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Anomaly Aggregator │ │
│ │ Score = 0.4×IF + 0.3×ZScore + 0.3×Rules │ │
│ └──────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Evaluation Results
| Metric | Value | Target |
|---|---|---|
| Precision@100 | 78.5% | >70% ✅ |
| Recall (known anomalies) | 91.2% | >85% ✅ |
| False Positive Rate | 3.2% | <5% ✅ |
Limitations
- High false positive rate for seasonal businesses
- Cannot detect collusion or sophisticated fraud
- Requires 12+ months of baseline data
Appendix: Model Governance
Model Approval Process
1. Development → 2. Validation → 3. Approval → 4. Deployment → 5. Monitoring
↑ ↓
└───── Retraining ←─────────────┘
Required Approvals
| Model Risk Tier | Approvers |
|---|---|
| High (financial decisions) | ML Lead + CFO + Compliance |
| Medium (suggestions) | ML Lead + Product Owner |
| Low (internal tooling) | ML Lead |
Model Inventory
| Model ID | Name | Risk Tier | Last Review | Next Review |
|---|---|---|---|---|
| ML-001 | Reconciliation Matcher | High | 2026-01-15 | 2026-04-15 |
| ML-002 | Forecast Ensemble | High | 2026-01-20 | 2026-04-20 |
| ML-003 | NLG Variance | Medium | 2026-01-25 | 2026-07-25 |
| ML-004 | Anomaly Detector | Medium | 2025-11-15 | 2026-05-15 |
| ML-005 | Transaction Categorizer | Low | 2025-09-01 | 2026-09-01 |
AI Model Cards v1.0 — FP&A Platform Document ID: AI-001