07 — Agentic AI System: Multi-Agent Orchestration, Chat & Autonomous Workflows
Domain: LangGraph agents, multi-agent coordination, conversational AI, autonomous financial workflows Dependencies: 02-AI/ML (inference), 04-Security (RBAC for agents), 05-Core Ops (data), 06-FP&A (analysis targets) Outputs: Agent architecture, LangGraph state machines, delegation templates, chat interface spec
ROLE
You are a Senior AI Systems Architect specializing in agentic AI for regulated industries. You design multi-agent orchestration systems using the Anthropic agent taxonomy (augmented LLM → workflow → autonomous agent) with LangGraph for deterministic financial workflows and CrewAI for parallel research/analysis tasks.
OBJECTIVE
Design the complete agentic AI layer that transforms Avivatec from a passive reporting tool into an autonomous financial co-pilot. This is the primary market differentiator — competitors offer copilots (workflow-level); Avivatec deploys agents (autonomous-level per Anthropic taxonomy).
DELIVERABLES
D1. Agent Architecture & Taxonomy
Agent Classification (Anthropic Pattern):
| Agent | Type | Pattern | Trust Level | Human Checkpoint |
|---|---|---|---|---|
| Chat Agent | Augmented LLM | Single-agent + tools | Medium | None for queries, confirm for actions |
| Categorization Agent | Workflow | Prompt chaining | High (validated) | Review queue for <80% confidence |
| Reconciliation Agent | Workflow | Orchestrator-workers | High | Exceptions only |
| Variance Analysis Agent | Agent | Evaluator-optimizer | Medium | Narrative review before distribution |
| Month-End Close Agent | Agent | Orchestrator-workers | Low (advisory) | Mandatory at each close step |
| Board Book Agent | Agent | Prompt chaining + parallel | Low | Final approval before distribution |
| Anomaly Detection Agent | Workflow | Parallelization | High (automated) | Critical alerts only |
| Forecast Agent | Workflow | Evaluator-optimizer | Medium | Drift alerts, model selection review |
D2. LangGraph State Machines
State Machine 1: Budget vs. Actual Variance Analysis
# States
class BvAState(TypedDict):
tenant_id: str
period: str
entity_ids: list[str]
actuals: dict # Fetched GL data
budget: dict # Fetched budget data
variances: dict # Calculated variances
significant_items: list # Items exceeding threshold
driver_decomposition: dict # Root cause analysis
narrative: str # AI-generated explanation
alerts: list # Threshold-exceeded alerts
audit_trail: list # Provenance log
# Graph
graph = StateGraph(BvAState)
graph.add_node("fetch_actuals", fetch_actuals_from_gl)
graph.add_node("fetch_budget", fetch_budget_version)
graph.add_node("calculate_variances", compute_bva_variances)
graph.add_node("filter_significant", apply_threshold_filter)
graph.add_node("decompose_drivers", analyze_variance_drivers)
graph.add_node("generate_narrative", llm_narrative_generation)
graph.add_node("route_alerts", send_threshold_alerts)
graph.add_node("log_provenance", write_audit_trail)
# Edges
graph.add_edge("fetch_actuals", "fetch_budget")
graph.add_edge("fetch_budget", "calculate_variances")
graph.add_edge("calculate_variances", "filter_significant")
graph.add_conditional_edges("filter_significant", has_significant_variances,
{True: "decompose_drivers", False: "log_provenance"})
graph.add_edge("decompose_drivers", "generate_narrative")
graph.add_edge("generate_narrative", "route_alerts")
graph.add_edge("route_alerts", "log_provenance")
graph.add_edge("log_provenance", END)
State Machine 2: Autonomous Month-End Close
[Start] → Check Reconciliation Status
→ IF incomplete: [Reconciliation Agent] → Auto-match remaining
→ Check AP Accruals → [AP Agent] → Post accrual entries
→ Check AR Recognition → [AR Agent] → Apply ASC 606 / CPC 47
→ FX Revaluation → Calculate unrealized gains/losses
→ Intercompany Elimination → Generate elimination entries
→ Trial Balance Review → [Anomaly Agent] → Flag suspicious items
→ CHECKPOINT: Human review of anomalies
→ Generate Variance Report → [Narrative Agent] → Write commentary
→ CHECKPOINT: Controller approves narrative
→ Lock Period → Prevent backdated entries
→ Distribute Reports → Email/Slack/WhatsApp
→ [End] → Log complete close with timing metrics
State Machine 3: Natural Language Query (NLQ)
[User Question] → Intent Classification
→ Route: {query, drill_down, comparison, forecast, action}
→ IF query: SQL Generation → RLS-filtered execution → Chart + Narrative
→ IF drill_down: Identify parent → Generate child query → Recurse
→ IF comparison: Generate parallel queries → Side-by-side formatting
→ IF forecast: Invoke forecast engine → Explain with provenance
→ IF action: Parse action → Confirm with user → Execute → Audit log
→ [Response] → Update conversation memory
D3. Agent Service Accounts & Permissions
Principle of Least Privilege:
| Agent | Read Access | Write Access | Delete | Special |
|---|---|---|---|---|
svc_chat | All read-only (RLS-filtered) | Conversation logs only | None | NLQ SQL execution (parameterized) |
svc_categorizer | Uncategorized transactions | Category assignments | None | Confidence < 80% → human queue |
svc_reconciler | Bank statements + GL | Match records, post adjustments | None | Unmatched → exception queue |
svc_variance | GL + budgets + forecasts | Variance reports, narratives | None | Distribution requires human approval |
svc_close | All financial data | Accrual entries, period locks | None | Mandatory human checkpoints |
svc_anomaly | GL transactions | Alert records | None | Critical alerts → immediate notification |
svc_forecast | Historical GL data | Forecast table writes | None | Model registry access |
D4. Conversational AI Interface
Chat Capabilities:
- Multi-turn financial Q&A with conversation memory
- Natural language commands: "Show me revenue by entity for Q3"
- Action execution: "Approve all invoices under R$5,000 from Supplier X"
- Proactive insights: "Cash balance will drop below R$100K in 3 weeks based on AP schedule"
- Voice input support (whisper transcription → NLQ pipeline)
- Context-aware: understands fiscal periods, entity names, account categories
Multi-Channel Deployment:
| Channel | Capabilities | Auth Method |
|---|---|---|
| Web chat (in-app) | Full functionality | Session cookie |
| WhatsApp Business | Query + alerts + approvals | Phone number verification |
| Slack | Query + alerts + commands | Slack OAuth |
| Microsoft Teams | Query + alerts + commands | Azure AD SSO |
| Scheduled reports + alerts | Verified sender |
D5. Agent Orchestration Infrastructure
Token Budget Management:
class AgentTokenBudget:
MAX_TOKENS = {
"chat_query": 4_000,
"categorization": 2_000,
"variance_analysis": 15_000,
"month_end_close": 50_000,
"board_book": 30_000,
"narrative": 8_000,
}
def allocate(self, task_type: str, subtask_count: int) -> dict:
total = self.MAX_TOKENS[task_type]
orchestrator_share = int(total * 0.2)
worker_share = int((total * 0.8) / max(subtask_count, 1))
return {"orchestrator": orchestrator_share, "per_worker": worker_share}
Model Routing:
| Task | Model | Rationale |
|---|---|---|
| Transaction categorization | DeepSeek-R1 7B (Haiku-class) | High volume, pattern-based |
| SQL generation from NLQ | DeepSeek-R1 32B (Sonnet-class) | Schema reasoning required |
| Variance narrative | DeepSeek-R1 32B (Sonnet-class) | Financial writing quality |
| Compliance interpretation | DeepSeek-R1 70B (Opus-class) | Regulatory accuracy critical |
| Anomaly explanation | DeepSeek-R1 32B | Balance of speed and quality |
| Board book generation | DeepSeek-R1 70B | Executive-quality output |
Error Handling & Recovery:
- Circuit breaker per agent (3 failures → 60s cooldown)
- Graceful degradation: if agent fails, fall back to rule-based logic
- Dead letter queue for failed tasks with retry policy
- Agent execution timeout: 30s for queries, 5min for analysis, 30min for close
- Partial result delivery: show what's available if some agents fail
D6. Agent Evaluation Framework
Quality Metrics:
| Metric | Target | Measurement |
|---|---|---|
| Categorization accuracy | > 95% | Backtesting against human-labeled data |
| Reconciliation auto-match rate | > 90% | Matched / Total transactions |
| NLQ SQL correctness | > 98% | Execution success + result validation |
| Narrative factual accuracy | > 99% | Cross-check claims against source data |
| Variance detection recall | > 95% | Significant variances identified / Total |
| Response latency (chat) | < 3s | P95 end-to-end |
Evaluation Loop (Evaluator-Optimizer):
Agent Output → Fact Checker (verify numbers against DB)
→ Compliance Checker (verify regulatory claims)
→ Quality Scorer (fluency, completeness, accuracy)
→ IF score < threshold: regenerate with feedback
→ IF score >= threshold: deliver to user
CONSTRAINTS
- Every agent action logged to
sys_ai_provenancewith full reasoning trace - Agents MUST respect RLS — no cross-tenant data access
- Action-executing agents MUST get human confirmation before mutations
- Token budgets enforced — agents terminate gracefully when budget exhausted
- All agent outputs include confidence scores
- Hallucination prevention: every numerical claim verified against database
RESEARCH QUESTIONS
- What is the optimal LangGraph checkpoint strategy for long-running financial workflows (month-end close)?
- How should agent memory work across sessions — conversation history vs. summarized context?
- What is the best approach for WhatsApp Business API integration with financial approval workflows?
- How to implement agent-to-agent delegation in LangGraph when one agent needs another's output?
- What guardrails prevent the NLQ agent from generating SQL that could leak cross-tenant data?
ADRs TO PRODUCE
- ADR-007: LangGraph over CrewAI (deterministic control for finance vs. parallel execution)
- ADR-AGENT-001: Agent trust levels (which actions require human approval)
- ADR-AGENT-002: Token budget allocation strategy across agent hierarchy
- ADR-AGENT-003: Multi-channel deployment architecture (WhatsApp/Slack/Teams)