07 — Agentic AI System: Multi-Agent Orchestration, Chat & Autonomous Workflows

Domain: LangGraph agents, multi-agent coordination, conversational AI, autonomous financial workflows Dependencies: 02-AI/ML (inference), 04-Security (RBAC for agents), 05-Core Ops (data), 06-FP&A (analysis targets) Outputs: Agent architecture, LangGraph state machines, delegation templates, chat interface spec

ROLE

You are a Senior AI Systems Architect specializing in agentic AI for regulated industries. You design multi-agent orchestration systems using the Anthropic agent taxonomy (augmented LLM → workflow → autonomous agent) with LangGraph for deterministic financial workflows and CrewAI for parallel research/analysis tasks.

OBJECTIVE

Design the complete agentic AI layer that transforms Avivatec from a passive reporting tool into an autonomous financial co-pilot. This is the primary market differentiator — competitors offer copilots (workflow-level); Avivatec deploys agents (autonomous-level per Anthropic taxonomy).

DELIVERABLES

D1. Agent Architecture & Taxonomy

Agent Classification (Anthropic Pattern):

Agent	Type	Pattern	Trust Level	Human Checkpoint
Chat Agent	Augmented LLM	Single-agent + tools	Medium	None for queries, confirm for actions
Categorization Agent	Workflow	Prompt chaining	High (validated)	Review queue for <80% confidence
Reconciliation Agent	Workflow	Orchestrator-workers	High	Exceptions only
Variance Analysis Agent	Agent	Evaluator-optimizer	Medium	Narrative review before distribution
Month-End Close Agent	Agent	Orchestrator-workers	Low (advisory)	Mandatory at each close step
Board Book Agent	Agent	Prompt chaining + parallel	Low	Final approval before distribution
Anomaly Detection Agent	Workflow	Parallelization	High (automated)	Critical alerts only
Forecast Agent	Workflow	Evaluator-optimizer	Medium	Drift alerts, model selection review

D2. LangGraph State Machines

State Machine 1: Budget vs. Actual Variance Analysis

# States
class BvAState(TypedDict):
    tenant_id: str
    period: str
    entity_ids: list[str]
    actuals: dict           # Fetched GL data
    budget: dict            # Fetched budget data
    variances: dict         # Calculated variances
    significant_items: list # Items exceeding threshold
    driver_decomposition: dict  # Root cause analysis
    narrative: str          # AI-generated explanation
    alerts: list            # Threshold-exceeded alerts
    audit_trail: list       # Provenance log

# Graph
graph = StateGraph(BvAState)
graph.add_node("fetch_actuals", fetch_actuals_from_gl)
graph.add_node("fetch_budget", fetch_budget_version)
graph.add_node("calculate_variances", compute_bva_variances)
graph.add_node("filter_significant", apply_threshold_filter)
graph.add_node("decompose_drivers", analyze_variance_drivers)
graph.add_node("generate_narrative", llm_narrative_generation)
graph.add_node("route_alerts", send_threshold_alerts)
graph.add_node("log_provenance", write_audit_trail)

# Edges
graph.add_edge("fetch_actuals", "fetch_budget")
graph.add_edge("fetch_budget", "calculate_variances")
graph.add_edge("calculate_variances", "filter_significant")
graph.add_conditional_edges("filter_significant", has_significant_variances,
    {True: "decompose_drivers", False: "log_provenance"})
graph.add_edge("decompose_drivers", "generate_narrative")
graph.add_edge("generate_narrative", "route_alerts")
graph.add_edge("route_alerts", "log_provenance")
graph.add_edge("log_provenance", END)

State Machine 2: Autonomous Month-End Close

[Start] → Check Reconciliation Status
    → IF incomplete: [Reconciliation Agent] → Auto-match remaining
    → Check AP Accruals → [AP Agent] → Post accrual entries
    → Check AR Recognition → [AR Agent] → Apply ASC 606 / CPC 47
    → FX Revaluation → Calculate unrealized gains/losses
    → Intercompany Elimination → Generate elimination entries
    → Trial Balance Review → [Anomaly Agent] → Flag suspicious items
    → CHECKPOINT: Human review of anomalies
    → Generate Variance Report → [Narrative Agent] → Write commentary
    → CHECKPOINT: Controller approves narrative
    → Lock Period → Prevent backdated entries
    → Distribute Reports → Email/Slack/WhatsApp
    → [End] → Log complete close with timing metrics

State Machine 3: Natural Language Query (NLQ)

[User Question] → Intent Classification
    → Route: {query, drill_down, comparison, forecast, action}
    → IF query: SQL Generation → RLS-filtered execution → Chart + Narrative
    → IF drill_down: Identify parent → Generate child query → Recurse
    → IF comparison: Generate parallel queries → Side-by-side formatting
    → IF forecast: Invoke forecast engine → Explain with provenance
    → IF action: Parse action → Confirm with user → Execute → Audit log
    → [Response] → Update conversation memory

D3. Agent Service Accounts & Permissions

Principle of Least Privilege:

Agent	Read Access	Write Access	Delete	Special
`svc_chat`	All read-only (RLS-filtered)	Conversation logs only	None	NLQ SQL execution (parameterized)
`svc_categorizer`	Uncategorized transactions	Category assignments	None	Confidence < 80% → human queue
`svc_reconciler`	Bank statements + GL	Match records, post adjustments	None	Unmatched → exception queue
`svc_variance`	GL + budgets + forecasts	Variance reports, narratives	None	Distribution requires human approval
`svc_close`	All financial data	Accrual entries, period locks	None	Mandatory human checkpoints
`svc_anomaly`	GL transactions	Alert records	None	Critical alerts → immediate notification
`svc_forecast`	Historical GL data	Forecast table writes	None	Model registry access

D4. Conversational AI Interface

Chat Capabilities:

Multi-turn financial Q&A with conversation memory
Natural language commands: "Show me revenue by entity for Q3"
Action execution: "Approve all invoices under R$5,000 from Supplier X"
Proactive insights: "Cash balance will drop below R$100K in 3 weeks based on AP schedule"
Voice input support (whisper transcription → NLQ pipeline)
Context-aware: understands fiscal periods, entity names, account categories

Multi-Channel Deployment:

Channel	Capabilities	Auth Method
Web chat (in-app)	Full functionality	Session cookie
WhatsApp Business	Query + alerts + approvals	Phone number verification
Slack	Query + alerts + commands	Slack OAuth
Microsoft Teams	Query + alerts + commands	Azure AD SSO
Email	Scheduled reports + alerts	Verified sender

D5. Agent Orchestration Infrastructure

Token Budget Management:

class AgentTokenBudget:
    MAX_TOKENS = {
        "chat_query": 4_000,
        "categorization": 2_000,
        "variance_analysis": 15_000,
        "month_end_close": 50_000,
        "board_book": 30_000,
        "narrative": 8_000,
    }

    def allocate(self, task_type: str, subtask_count: int) -> dict:
        total = self.MAX_TOKENS[task_type]
        orchestrator_share = int(total * 0.2)
        worker_share = int((total * 0.8) / max(subtask_count, 1))
        return {"orchestrator": orchestrator_share, "per_worker": worker_share}

Model Routing:

Task	Model	Rationale
Transaction categorization	DeepSeek-R1 7B (Haiku-class)	High volume, pattern-based
SQL generation from NLQ	DeepSeek-R1 32B (Sonnet-class)	Schema reasoning required
Variance narrative	DeepSeek-R1 32B (Sonnet-class)	Financial writing quality
Compliance interpretation	DeepSeek-R1 70B (Opus-class)	Regulatory accuracy critical
Anomaly explanation	DeepSeek-R1 32B	Balance of speed and quality
Board book generation	DeepSeek-R1 70B	Executive-quality output

Error Handling & Recovery:

Circuit breaker per agent (3 failures → 60s cooldown)
Graceful degradation: if agent fails, fall back to rule-based logic
Dead letter queue for failed tasks with retry policy
Agent execution timeout: 30s for queries, 5min for analysis, 30min for close
Partial result delivery: show what's available if some agents fail

D6. Agent Evaluation Framework

Quality Metrics:

Metric	Target	Measurement
Categorization accuracy	> 95%	Backtesting against human-labeled data
Reconciliation auto-match rate	> 90%	Matched / Total transactions
NLQ SQL correctness	> 98%	Execution success + result validation
Narrative factual accuracy	> 99%	Cross-check claims against source data
Variance detection recall	> 95%	Significant variances identified / Total
Response latency (chat)	< 3s	P95 end-to-end

Evaluation Loop (Evaluator-Optimizer):

Agent Output → Fact Checker (verify numbers against DB)
    → Compliance Checker (verify regulatory claims)
    → Quality Scorer (fluency, completeness, accuracy)
    → IF score < threshold: regenerate with feedback
    → IF score >= threshold: deliver to user

CONSTRAINTS

Every agent action logged to sys_ai_provenance with full reasoning trace
Agents MUST respect RLS — no cross-tenant data access
Action-executing agents MUST get human confirmation before mutations
Token budgets enforced — agents terminate gracefully when budget exhausted
All agent outputs include confidence scores
Hallucination prevention: every numerical claim verified against database

RESEARCH QUESTIONS

What is the optimal LangGraph checkpoint strategy for long-running financial workflows (month-end close)?
How should agent memory work across sessions — conversation history vs. summarized context?
What is the best approach for WhatsApp Business API integration with financial approval workflows?
How to implement agent-to-agent delegation in LangGraph when one agent needs another's output?
What guardrails prevent the NLQ agent from generating SQL that could leak cross-tenant data?

ADRs TO PRODUCE

ADR-007: LangGraph over CrewAI (deterministic control for finance vs. parallel execution)
ADR-AGENT-001: Agent trust levels (which actions require human approval)
ADR-AGENT-002: Token budget allocation strategy across agent hierarchy
ADR-AGENT-003: Multi-channel deployment architecture (WhatsApp/Slack/Teams)

ROLE​

OBJECTIVE​

DELIVERABLES​

D1. Agent Architecture & Taxonomy​

D2. LangGraph State Machines​

D3. Agent Service Accounts & Permissions​

D4. Conversational AI Interface​

D5. Agent Orchestration Infrastructure​

D6. Agent Evaluation Framework​

CONSTRAINTS​

RESEARCH QUESTIONS​

ADRs TO PRODUCE​