Skip to main content

ADR 010: Autonomous Multi Agent Orchestration System

ADR-010: Autonomous Multi-Agent Orchestration System

Document: ADR-010-autonomous-orchestration-system
Version: 1.0.0
Purpose: Document architectural decisions for fully autonomous multi-agent task orchestration with automated sync, intelligent dispatch, and parallel execution
Audience: Framework contributors, developers, AI agents, operations teams
Date Created: 2025-12-19
Status: APPROVED
Related ADRs:
- ADR-006-work-item-hierarchy (task data model)
- ADR-001-async-task-executor-refactoring (execution patterns)
Related Documents:
- scripts/autonomous-orchestrator.py
- scripts/task-dispatcher.py
- scripts/agent-executor.py
- scripts/sync-daemon.py
- config/orchestrator-config.json

Context and Problem Statement

The Autonomous Operation Problem

CODITECT's V2 project plan contains 122+ tasks organized in ADR-006 hierarchy (Epic → Feature → Task), but execution requires:

  1. Manual Agent Coordination - Humans must assign tasks to appropriate agents
  2. Manual Status Sync - Markdown checkboxes and database drift apart
  3. Sequential Execution - Tasks executed one-at-a-time, not parallelized
  4. No Dependency Management - Tasks executed in arbitrary order
  5. No Progress Persistence - Session breaks lose execution state

Current State (Human-in-the-Loop):

User → Read Task → Pick Agent → Execute → Update Checkbox → Repeat
└── Manual sync ──┘ └── Manual sync ────┘

Target State (95% Autonomous):

User → Start Orchestrator → Autonomous Loop
├── Sync Daemon (bidirectional)
├── Task Dispatcher (intelligent assignment)
├── Agent Executor (parallel execution)
└── Checkpoint System (state preservation)

Business Impact:

  • 60% reduction in coordination overhead
  • 10x increase in task throughput (parallel execution)
  • 99.9% sync accuracy (automated bidirectional sync)
  • Zero state loss across sessions (checkpoint system)
  • March 11, 2026 launch timeline achievable

Decision Drivers

  1. Time to Market - 83 days to public launch requires parallel execution
  2. Human Bottleneck - Manual coordination cannot scale to 122+ tasks
  3. Quality Consistency - Automated dispatch ensures correct agent-to-task matching
  4. State Persistence - Multi-day execution requires checkpoint/resume capability
  5. Audit Trail - Compliance requires complete execution logging

Considered Options

Option A: Enhanced Manual Workflow

  • Improve tooling but keep human-in-the-loop
  • Rejected: Does not solve coordination bottleneck

Option B: Simple Queue System

  • Basic FIFO task queue with single agent
  • Rejected: No parallelization, no intelligent dispatch

Option C: Full Autonomous System (Selected)

  • Sync daemon + task dispatcher + parallel executor + orchestrator
  • Selected: Achieves 95% autonomy with checkpoint recovery

Option D: External Workflow Engine (Airflow/Temporal)

  • Use enterprise workflow orchestration
  • Rejected: Over-engineered for 122 tasks, adds infrastructure complexity

Decision

Implement Option C: Full Autonomous System with four integrated components:

1. Sync Daemon (sync-daemon.py)

Purpose: Bidirectional synchronization between markdown tasklist and database

Architecture:

┌─────────────────────────────────────────────────────────────┐
│ SYNC DAEMON │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ File Watcher │ ←→ │ Sync Engine │ ←→ │ DB Poller │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ↓ ↓ ↓ │
│ [MD Checkboxes] [Debounce] [Task Status] │
│ ↓ ↓ ↓ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ v2_plan_sync (Audit Log) │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘

Key Features:

  • MD5 hash-based change detection
  • 2-second debounce to prevent thrashing
  • WAL mode for concurrent read/write
  • Audit trail in v2_plan_sync table

2. Task Dispatcher (task-dispatcher.py)

Purpose: Intelligent task-to-agent matching with dependency resolution

Agent Mapping Algorithm:

AGENT_MAPPINGS = {
"devops-engineer": ["deploy", "docker", "kubernetes", "ci/cd"],
"security-specialist": ["security", "auth", "compliance"],
"testing-specialist": ["test", "validation", "coverage"],
"database-architect": ["database", "schema", "migration"],
"backend-development": ["api", "endpoint", "server"],
"frontend-development-agent": ["ui", "component", "react"],
"codi-documentation-writer": ["document", "guide", "readme"],
"general-purpose": [] # Default fallback
}

def match_agent(task_description, epic_name):
combined = f"{task_description} {epic_name}".lower()
scores = {agent: sum(1 for kw in keywords if kw in combined)
for agent, keywords in AGENT_MAPPINGS.items()}
return max(scores, key=scores.get) or "general-purpose"

Dispatch Queue:

Priority Order: P0 → P1 → P2
Dependency Check: blocked_by field must be empty
Duplicate Prevention: task_assignments tracking table

3. Agent Executor (agent-executor.py)

Purpose: Execute tasks via Claude Code with status tracking

Execution Flow:

1. Get assignment from task_assignments table
2. Update status to "in_progress"
3. Build prompt with task context
4. Execute via `claude --print -p "prompt"`
5. Capture output and exit code
6. Update status to "completed" or "failed"
7. Log execution details to file
8. Trigger sync daemon

Timeout and Retry:

  • Default timeout: 2 hours per task
  • Max retries: 3 with exponential backoff
  • Failed tasks return to pending queue

4. Autonomous Orchestrator (autonomous-orchestrator.py)

Purpose: Master control loop coordinating all components

Architecture:

┌──────────────────────────────────────────────────────────────────┐
│ AUTONOMOUS ORCHESTRATOR │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ │
│ │ Sync Daemon │ │ Task Pool │ │ Checkpoint │ │
│ │ (Thread) │ │ (Executor) │ │ Manager │ │
│ └──────┬──────┘ └──────┬───────┘ └──────┬──────┘ │
│ │ │ │ │
│ v v v │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ CONTROL LOOP │ │
│ │ ┌─────────────────────────────────────────────────┐ │ │
│ │ │ 1. Check completed futures │ │ │
│ │ │ 2. Update status (completed/failed) │ │ │
│ │ │ 3. Check failure threshold (pause if exceeded) │ │ │
│ │ │ 4. Get pending tasks (priority ordered) │ │ │
│ │ │ 5. Assign to available agent slots │ │ │
│ │ │ 6. Submit to thread pool │ │ │
│ │ │ 7. Create checkpoint if milestone reached │ │ │
│ │ │ 8. Sleep(poll_interval) │ │ │
│ │ └─────────────────────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Concurrent Agents: 5 (configurable) │
│ Poll Interval: 10 seconds │
│ Checkpoint Interval: Every 10 tasks │
│ │
└──────────────────────────────────────────────────────────────────┘

State Machine:

                 ┌──────────────┐
│ STOPPED │
└──────┬───────┘
│ start()
v
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ PAUSED │ ←─│ RUNNING │ ←─│ ERROR │
│ │ ─→│ │ ─→│ │
└──────────────┘ └──────┬───────┘ └──────────────┘
↑ │
│ failures >= threshold
└─────────────────┘

Database Schema Extensions

New Tables

-- Task assignment tracking
CREATE TABLE task_assignments (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL,
agent_type TEXT NOT NULL,
assigned_at TEXT NOT NULL,
started_at TEXT,
completed_at TEXT,
status TEXT DEFAULT 'assigned'
CHECK(status IN ('assigned', 'in_progress', 'completed', 'failed', 'cancelled')),
result TEXT,
FOREIGN KEY (task_id) REFERENCES v2_tasks(task_id)
);

-- Orchestrator state persistence
CREATE TABLE orchestrator_state (
id INTEGER PRIMARY KEY AUTOINCREMENT,
started_at TEXT NOT NULL,
stopped_at TEXT,
tasks_completed INTEGER DEFAULT 0,
tasks_failed INTEGER DEFAULT 0,
status TEXT DEFAULT 'running'
CHECK(status IN ('running', 'stopped', 'paused', 'error'))
);

-- Checkpoint tracking
CREATE TABLE orchestrator_checkpoints (
id INTEGER PRIMARY KEY AUTOINCREMENT,
checkpoint_name TEXT NOT NULL,
created_at TEXT NOT NULL,
tasks_completed INTEGER,
tasks_pending INTEGER,
notes TEXT
);

-- Indexes for performance
CREATE INDEX idx_task_assignments_task ON task_assignments(task_id);
CREATE INDEX idx_task_assignments_status ON task_assignments(status);

Configuration

orchestrator-config.json

{
"orchestrator": {
"max_concurrent_agents": 5,
"poll_interval": 10,
"checkpoint_interval": 10,
"retry_limit": 3,
"pause_on_failure_count": 5
},
"executor": {
"timeout": 7200,
"max_retries": 3,
"retry_delay": 30
},
"sync": {
"interval": 30,
"debounce": 2.0
},
"agent_mappings": {
"devops-engineer": ["deploy", "docker", "kubernetes"],
"security-specialist": ["security", "auth", "compliance"],
...
}
}

Success Metrics

MetricTargetMeasurement
Autonomy Rate95%+(tasks without human intervention) / total tasks
Dispatch Latency<5s (p95)Time from task available to assigned
Task Throughput10+ tasks/hourCompleted tasks per hour
Success Rate95%+Completed / (completed + failed)
Sync Accuracy99.9%MD checkboxes matching DB status
Checkpoint Coverage100%All milestone states recoverable

Usage Examples

Start Orchestrator (Full Autonomous)

cd submodules/core/coditect-core

# Start with 5 concurrent agents
python3 scripts/autonomous-orchestrator.py --max-agents 5

# P0 tasks only (critical path)
python3 scripts/autonomous-orchestrator.py --priority P0

# Preview mode (no actual execution)
python3 scripts/autonomous-orchestrator.py --dry-run

Manual Operations

# Check sync status
python3 scripts/sync-daemon.py --status

# Get next available tasks
python3 scripts/task-dispatcher.py --next 5

# Execute specific task
python3 scripts/agent-executor.py --task T001.002

# Create checkpoint
python3 scripts/autonomous-orchestrator.py --checkpoint "pre-deploy"

Monitor Progress

# Live dashboard
python3 scripts/autonomous-orchestrator.py --dashboard

# JSON status for scripting
python3 scripts/autonomous-orchestrator.py --status

# Sync status
python3 scripts/sync-project-plan.py --status

Consequences

Positive

  • 95% Autonomous Operation - Minimal human intervention required
  • 10x Throughput - Parallel execution with 5+ concurrent agents
  • Zero State Loss - Checkpoint/resume across sessions
  • Perfect Sync - Bidirectional MD↔DB synchronization
  • Full Audit Trail - Every execution logged with results
  • Intelligent Dispatch - Tasks matched to optimal agents

Negative

  • Complexity - Four integrated components to maintain
  • Resource Usage - Concurrent agents consume more compute
  • Debugging - Distributed execution harder to troubleshoot
  • Claude Dependency - Requires Claude Code binary availability

Mitigations

  • Dry-run Mode - Preview execution without changes
  • Pause Threshold - Auto-pause on repeated failures
  • Comprehensive Logging - Full execution logs per task
  • Checkpoint Recovery - Resume from any milestone

Implementation Checklist

  • Sync Daemon (sync-daemon.py)
  • Task Dispatcher (task-dispatcher.py)
  • Agent Executor (agent-executor.py)
  • Autonomous Orchestrator (autonomous-orchestrator.py)
  • Configuration (orchestrator-config.json)
  • Database schema extensions (in scripts)
  • ADR documentation (this document)
  • Integration tests
  • Load testing (10+ concurrent agents)
  • Production deployment guide

References

  • ADR-006: Work Item Hierarchy (task data model)
  • V2-CONSOLIDATED-project-plan.md (project plan source)
  • V2-tasklist-with-checkboxes.md (markdown tasklist)
  • v2-work-items.json (JSON extraction)

Decision: APPROVED Date: 2025-12-19 Author: CODITECT Orchestrator Agent Reviewers: Hal Casteel (Founder/CEO/CTO)