Skip to main content

07 — Agentic AI System: Multi-Agent Orchestration, Chat & Autonomous Workflows

Domain: LangGraph agents, multi-agent coordination, conversational AI, autonomous financial workflows Dependencies: 02-AI/ML (inference), 04-Security (RBAC for agents), 05-Core Ops (data), 06-FP&A (analysis targets) Outputs: Agent architecture, LangGraph state machines, delegation templates, chat interface spec


ROLE

You are a Senior AI Systems Architect specializing in agentic AI for regulated industries. You design multi-agent orchestration systems using the Anthropic agent taxonomy (augmented LLM → workflow → autonomous agent) with LangGraph for deterministic financial workflows and CrewAI for parallel research/analysis tasks.


OBJECTIVE

Design the complete agentic AI layer that transforms Avivatec from a passive reporting tool into an autonomous financial co-pilot. This is the primary market differentiator — competitors offer copilots (workflow-level); Avivatec deploys agents (autonomous-level per Anthropic taxonomy).


DELIVERABLES

D1. Agent Architecture & Taxonomy

Agent Classification (Anthropic Pattern):

AgentTypePatternTrust LevelHuman Checkpoint
Chat AgentAugmented LLMSingle-agent + toolsMediumNone for queries, confirm for actions
Categorization AgentWorkflowPrompt chainingHigh (validated)Review queue for <80% confidence
Reconciliation AgentWorkflowOrchestrator-workersHighExceptions only
Variance Analysis AgentAgentEvaluator-optimizerMediumNarrative review before distribution
Month-End Close AgentAgentOrchestrator-workersLow (advisory)Mandatory at each close step
Board Book AgentAgentPrompt chaining + parallelLowFinal approval before distribution
Anomaly Detection AgentWorkflowParallelizationHigh (automated)Critical alerts only
Forecast AgentWorkflowEvaluator-optimizerMediumDrift alerts, model selection review

D2. LangGraph State Machines

State Machine 1: Budget vs. Actual Variance Analysis

# States
class BvAState(TypedDict):
tenant_id: str
period: str
entity_ids: list[str]
actuals: dict # Fetched GL data
budget: dict # Fetched budget data
variances: dict # Calculated variances
significant_items: list # Items exceeding threshold
driver_decomposition: dict # Root cause analysis
narrative: str # AI-generated explanation
alerts: list # Threshold-exceeded alerts
audit_trail: list # Provenance log

# Graph
graph = StateGraph(BvAState)
graph.add_node("fetch_actuals", fetch_actuals_from_gl)
graph.add_node("fetch_budget", fetch_budget_version)
graph.add_node("calculate_variances", compute_bva_variances)
graph.add_node("filter_significant", apply_threshold_filter)
graph.add_node("decompose_drivers", analyze_variance_drivers)
graph.add_node("generate_narrative", llm_narrative_generation)
graph.add_node("route_alerts", send_threshold_alerts)
graph.add_node("log_provenance", write_audit_trail)

# Edges
graph.add_edge("fetch_actuals", "fetch_budget")
graph.add_edge("fetch_budget", "calculate_variances")
graph.add_edge("calculate_variances", "filter_significant")
graph.add_conditional_edges("filter_significant", has_significant_variances,
{True: "decompose_drivers", False: "log_provenance"})
graph.add_edge("decompose_drivers", "generate_narrative")
graph.add_edge("generate_narrative", "route_alerts")
graph.add_edge("route_alerts", "log_provenance")
graph.add_edge("log_provenance", END)

State Machine 2: Autonomous Month-End Close

[Start] → Check Reconciliation Status
→ IF incomplete: [Reconciliation Agent] → Auto-match remaining
→ Check AP Accruals → [AP Agent] → Post accrual entries
→ Check AR Recognition → [AR Agent] → Apply ASC 606 / CPC 47
→ FX Revaluation → Calculate unrealized gains/losses
→ Intercompany Elimination → Generate elimination entries
→ Trial Balance Review → [Anomaly Agent] → Flag suspicious items
→ CHECKPOINT: Human review of anomalies
→ Generate Variance Report → [Narrative Agent] → Write commentary
→ CHECKPOINT: Controller approves narrative
→ Lock Period → Prevent backdated entries
→ Distribute Reports → Email/Slack/WhatsApp
→ [End] → Log complete close with timing metrics

State Machine 3: Natural Language Query (NLQ)

[User Question] → Intent Classification
→ Route: {query, drill_down, comparison, forecast, action}
→ IF query: SQL Generation → RLS-filtered execution → Chart + Narrative
→ IF drill_down: Identify parent → Generate child query → Recurse
→ IF comparison: Generate parallel queries → Side-by-side formatting
→ IF forecast: Invoke forecast engine → Explain with provenance
→ IF action: Parse action → Confirm with user → Execute → Audit log
→ [Response] → Update conversation memory

D3. Agent Service Accounts & Permissions

Principle of Least Privilege:

AgentRead AccessWrite AccessDeleteSpecial
svc_chatAll read-only (RLS-filtered)Conversation logs onlyNoneNLQ SQL execution (parameterized)
svc_categorizerUncategorized transactionsCategory assignmentsNoneConfidence < 80% → human queue
svc_reconcilerBank statements + GLMatch records, post adjustmentsNoneUnmatched → exception queue
svc_varianceGL + budgets + forecastsVariance reports, narrativesNoneDistribution requires human approval
svc_closeAll financial dataAccrual entries, period locksNoneMandatory human checkpoints
svc_anomalyGL transactionsAlert recordsNoneCritical alerts → immediate notification
svc_forecastHistorical GL dataForecast table writesNoneModel registry access

D4. Conversational AI Interface

Chat Capabilities:

  • Multi-turn financial Q&A with conversation memory
  • Natural language commands: "Show me revenue by entity for Q3"
  • Action execution: "Approve all invoices under R$5,000 from Supplier X"
  • Proactive insights: "Cash balance will drop below R$100K in 3 weeks based on AP schedule"
  • Voice input support (whisper transcription → NLQ pipeline)
  • Context-aware: understands fiscal periods, entity names, account categories

Multi-Channel Deployment:

ChannelCapabilitiesAuth Method
Web chat (in-app)Full functionalitySession cookie
WhatsApp BusinessQuery + alerts + approvalsPhone number verification
SlackQuery + alerts + commandsSlack OAuth
Microsoft TeamsQuery + alerts + commandsAzure AD SSO
EmailScheduled reports + alertsVerified sender

D5. Agent Orchestration Infrastructure

Token Budget Management:

class AgentTokenBudget:
MAX_TOKENS = {
"chat_query": 4_000,
"categorization": 2_000,
"variance_analysis": 15_000,
"month_end_close": 50_000,
"board_book": 30_000,
"narrative": 8_000,
}

def allocate(self, task_type: str, subtask_count: int) -> dict:
total = self.MAX_TOKENS[task_type]
orchestrator_share = int(total * 0.2)
worker_share = int((total * 0.8) / max(subtask_count, 1))
return {"orchestrator": orchestrator_share, "per_worker": worker_share}

Model Routing:

TaskModelRationale
Transaction categorizationDeepSeek-R1 7B (Haiku-class)High volume, pattern-based
SQL generation from NLQDeepSeek-R1 32B (Sonnet-class)Schema reasoning required
Variance narrativeDeepSeek-R1 32B (Sonnet-class)Financial writing quality
Compliance interpretationDeepSeek-R1 70B (Opus-class)Regulatory accuracy critical
Anomaly explanationDeepSeek-R1 32BBalance of speed and quality
Board book generationDeepSeek-R1 70BExecutive-quality output

Error Handling & Recovery:

  • Circuit breaker per agent (3 failures → 60s cooldown)
  • Graceful degradation: if agent fails, fall back to rule-based logic
  • Dead letter queue for failed tasks with retry policy
  • Agent execution timeout: 30s for queries, 5min for analysis, 30min for close
  • Partial result delivery: show what's available if some agents fail

D6. Agent Evaluation Framework

Quality Metrics:

MetricTargetMeasurement
Categorization accuracy> 95%Backtesting against human-labeled data
Reconciliation auto-match rate> 90%Matched / Total transactions
NLQ SQL correctness> 98%Execution success + result validation
Narrative factual accuracy> 99%Cross-check claims against source data
Variance detection recall> 95%Significant variances identified / Total
Response latency (chat)< 3sP95 end-to-end

Evaluation Loop (Evaluator-Optimizer):

Agent Output → Fact Checker (verify numbers against DB)
→ Compliance Checker (verify regulatory claims)
→ Quality Scorer (fluency, completeness, accuracy)
→ IF score < threshold: regenerate with feedback
→ IF score >= threshold: deliver to user

CONSTRAINTS

  • Every agent action logged to sys_ai_provenance with full reasoning trace
  • Agents MUST respect RLS — no cross-tenant data access
  • Action-executing agents MUST get human confirmation before mutations
  • Token budgets enforced — agents terminate gracefully when budget exhausted
  • All agent outputs include confidence scores
  • Hallucination prevention: every numerical claim verified against database

RESEARCH QUESTIONS

  1. What is the optimal LangGraph checkpoint strategy for long-running financial workflows (month-end close)?
  2. How should agent memory work across sessions — conversation history vs. summarized context?
  3. What is the best approach for WhatsApp Business API integration with financial approval workflows?
  4. How to implement agent-to-agent delegation in LangGraph when one agent needs another's output?
  5. What guardrails prevent the NLQ agent from generating SQL that could leak cross-tenant data?

ADRs TO PRODUCE

  • ADR-007: LangGraph over CrewAI (deterministic control for finance vs. parallel execution)
  • ADR-AGENT-001: Agent trust levels (which actions require human approval)
  • ADR-AGENT-002: Token budget allocation strategy across agent hierarchy
  • ADR-AGENT-003: Multi-channel deployment architecture (WhatsApp/Slack/Teams)