Skip to main content

Multi Agent Workflow Orchestration

Multi-Agent Workflow Orchestration

Expert skill for managing complex multi-agent workflows with token budget awareness, context management, and recursive state-based resolution. This skill enables coordination of multiple specialized agents across complex, multi-phase development workflows.

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Expert skill for managing complex multi-agent workflows with token budget awareness, context management, and recursive state-based resolution.

When to Use This Skill

Use this skill when:

  • Complex Multi-Phase Workflows: Tasks requiring coordination of 3+ phases across multiple modules
  • Token Budget Management: Workflows approaching 70K+ tokens (out of 160K limit)
  • Cascading Dependencies: Issues spanning backend → frontend → database → infrastructure
  • Recursive Resolution: Iterative problem-solving with retry/traceback logic
  • Context Collapse Prevention: Long-running sessions needing checkpoint/resume
  • Proven: 7 production workflows (full-stack features, security audits, deployment validation)
  • Token efficiency: Prevents 95% context collapse with 85%+ checkpoint triggers

Don't use this skill when:

  • Single-module tasks (no coordination needed)
  • Simple workflows (< 3 phases)
  • Token usage < 50% (no budget concerns)
  • No cascading dependencies

Core Capabilities

1. Complexity Assessment

Before starting any multi-phase workflow, assess complexity using the complexity gauge:

/complexity-gauge

Interpret Results:

  • Safe Zone (< 70% tokens): Proceed normally
  • Warning Zone (70-85% tokens): Apply context compression
  • Critical Zone (85-95% tokens): Mandatory checkpoint before continuing
  • Over-budget (> 95% tokens): STOP - Create handoff document

Complexity Factors (scoring):

  • Module count: 5 points per module
  • Dependency depth: 10 points per level
  • Subagent invocations: 3 points per agent
  • File operations: 1 point per file
  • Context switches: 15 points per handoff
  • Recursive calls: 20 points per iteration

2. Orchestrated Workflows

Use the orchestrator for multi-step coordinated workflows:

When to invoke:

"Use orchestrator to [implement full-stack feature / run security audit / validate deployment]"

Orchestrator will:

  • Create detailed execution plan with phase breakdown
  • Assign specialized subagents (codebase-locator, codebase-analyzer, etc.)
  • Generate ready-to-execute Task calls
  • Provide token budget tracking
  • Specify error handling strategies

7 Production Workflows:

  1. Full-Stack Feature Development (~60K tokens, 15-25 min)
  2. Bug Investigation & Fix (~50K tokens, 10-20 min)
  3. Security Audit (~55K tokens, 12-18 min)
  4. Deployment Validation (~50K tokens, 10-15 min)
  5. Code Quality Cycle (~60K tokens, 15-20 min)
  6. Codebase Research (~45K tokens, 8-12 min)
  7. Project Cleanup (~30K tokens, 5-10 min)

3. Recursive Workflows

For cascading multi-module issues requiring iterative resolution:

/recursive-workflow

FSM States:

INITIATE → IDENTIFY → DOCUMENT → SOLVE → CODE → DEPLOY → TEST → VALIDATE → COMPLETE

Features:

  • State persistence to FoundationDB (resume across sessions)
  • Traceback/retry logic (max 10 iterations)
  • Automatic /complexity-gauge checks at each state transition
  • Checkpoint creation when tokens > 85%
  • Context handoff for session restarts

Example Use Case:

Issue: "Session invalidation not propagating from backend to frontend"

Affected Modules:
- Backend: backend/src/handlers/auth.rs
- Frontend: src/services/authStore.ts
- Database: FoundationDB session table

Workflow:
1. IDENTIFY: Locate all session-related code
2. DOCUMENT: Capture current session flow
3. SOLVE: Design propagation strategy
4. CODE: Implement backend → frontend sync
5. DEPLOY: Build and deploy changes
6. TEST: Validate end-to-end flow
7. VALIDATE: Confirm no regressions

Token Budget Management

Pre-Workflow (Phase 0)

# Always assess before starting
# /complexity-gauge

if projected_tokens > 100K:
# Split into 2 sessions with context_save between
session_1 = phases[0:3]
session_2 = phases[4:7]

Mid-Workflow (Phase 3+)

# Check after 50% completion or Phase 3
# /complexity-gauge

if current_tokens > 70% of 160K:
# Apply context compression
- Summarize completed phases (3-5 bullets)
- Remove verbose subagent outputs
- Keep only file:line references

Critical Zone (> 85%)

/complexity-gauge  # Confirm critical status

/context-save project_root=/home/hal/v4/PROJECTS/t2 context_type=comprehensive

# Create handoff document:
# - What's complete
# - What remains
# - Exact next steps
# - Context recovery instructions
## Context Compression Strategies

### Level 1: Light Compression (10-20% reduction)
- Summarize completed phases (5 bullets max per phase)
- Remove verbose subagent outputs, keep findings
- Archive file contents, keep references

### Level 2: Moderate Compression (30-40% reduction)
- Aggressive phase summarization (1-2 sentences per phase)
- Remove all file contents, keep metadata only
- Consolidate duplicate information
- Store details in FoundationDB, keep references

### Level 3: Heavy Compression (50-70% reduction)
- Minimal phase tracking (FSM state only)
- External storage for all details (FDB + filesystem)
- Keep only: current state, next actions, critical blockers
- Use `/context-save` to archive everything else

## Workflow Patterns

### Pattern 1: Context-Aware Orchestration

  1. /complexity-gauge (assess upfront)
  2. Orchestrator creates plan
  3. Execute Phase 1-2
  4. /complexity-gauge (mid-check)
  5. If Warning: Apply Level 1 compression
  6. Execute Phase 3-4
  7. /complexity-gauge (final check)
  8. Complete or checkpoint

### Pattern 2: Recursive Resolution

  1. /complexity-gauge (baseline)
  2. /recursive-workflow (initiate FSM)
  3. Auto-monitors tokens at each state
  4. If tokens > 85%: Auto-checkpoint
  5. Resume from FDB state in next session

### Pattern 3: Parallel Decomposition

  1. /complexity-gauge (assess)
  2. If complex: Split into parallel paths
  3. Execute independent modules concurrently
  4. Final /complexity-gauge before merge
  5. Merge results with validation

## Integration with T2 Architecture

### Backend (Rust/Actix-web/FoundationDB)
- Use `WorkflowState` model for FSM persistence
- Use `StateTransition` for audit trail
- Use `WorkflowCheckpoint` for recovery points

**FDB Key Patterns**:

/{tenant_id}/workflows/{workflow_id}/state /{tenant_id}/workflows/{workflow_id}/history/{n} /{tenant_id}/workflows/{workflow_id}/checkpoints/{n}


### Frontend (React/TypeScript/Theia)
- Workflows tied to `WorkspaceSession` via `session_id`
- UI shows workflow progress (current_state, iteration_count)
- Checkpoint/resume UI for long-running workflows

### Specialized Subagents (7 available)
1. **codebase-analyzer** - Implementation details
2. **codebase-locator** - File/component location
3. **codebase-pattern-finder** - Pattern identification
4. **project-organizer** - Directory structure maintenance
5. **thoughts-analyzer** - Decision extraction
6. **thoughts-locator** - Documentation finding
7. **web-search-researcher** - External research


## Best Practices

### ✅ Do This
- Always run `/complexity-gauge` before multi-phase workflows
- Check token budget at 50% completion
- Use orchestrator for 3+ phase workflows
- Invoke `/recursive-workflow` for cascading issues
- Create checkpoints when tokens > 85%
- Compress context proactively at 70%

### ❌ Avoid This
- Don't skip complexity assessment
- Don't continue workflows > 95% token budget
- Don't use recursive workflow for simple issues
- Don't ignore checkpoint warnings
- Don't forget to use `/context-save` before stopping

## Troubleshooting

### "Workflow stuck in loop (IDENTIFY → TEST → IDENTIFY)"
**Cause**: Test failures not providing enough information for fix

**Solution**:
- Add explicit failure analysis in TEST state
- Use `codebase-pattern-finder` for similar fixes
- Consider ESCALATE if iteration > 5

### "Context collapse during workflow"
**Cause**: Too many iterations without checkpoint

**Solution**:
- Trigger checkpoint at 70% tokens (not 85%)
- Use Level 2+ compression earlier
- Archive completed states immediately

### "Can't resume workflow after session restart"
**Cause**: FDB state not persisted or query failed

**Solution**:
- Verify FDB connection
- Check `workflow_id` is correct
- Use `/context-restore` with project ID
- Review FDB key structure

## Examples

## Multi-Context Window Support

This skill supports long-running multi-agent orchestration across multiple context windows using Claude 4.5's enhanced state management capabilities.

### State Tracking

**Orchestration State (JSON):**
```json
{
"checkpoint_id": "ckpt_20251129_161000",
"workflow_type": "full_stack_feature",
"phases_completed": [
{"phase": "Research", "token_cost": 15000, "status": "complete"},
{"phase": "Design", "token_cost": 18000, "status": "complete"},
{"phase": "Backend", "token_cost": 22000, "status": "in_progress"}
],
"agent_invocations": [
{"agent": "codebase-locator", "phase": "Research", "result": "cached"},
{"agent": "codebase-analyzer", "phase": "Design", "result": "cached"}
],
"complexity_score": 185,
"token_budget": {"used": 55000, "projected": 120000, "limit": 160000},
"fsm_state": "CODE",
"iteration_count": 2,
"created_at": "2025-11-29T16:10:00Z"
}

Progress Notes (Markdown):

Multi-Agent Workflow Progress - 2025-11-29

Completed

  • Research phase: Located all relevant files (15 min)
  • Design phase: Architecture validated (18 min)
  • Backend implementation started

In Progress

  • Backend endpoint implementation (22K tokens used)
  • Iteration 2 of CODE phase (validation fixing)

Next Actions

  • Complete backend validation fixes
  • Start frontend implementation phase
  • Run integration tests
  • Create deployment plan

### Session Recovery

When starting a fresh context window after orchestration work:

1. **Load Checkpoint State**: Read `.coditect/checkpoints/orchestration-latest.json`
2. **Review Progress Notes**: Check `orchestration-progress.md` for context
3. **Verify FSM State**: Check current workflow state (INITIATE, CODE, etc.)
4. **Check Token Budget**: Review projected vs. actual token usage
5. **Resume Workflow**: Continue from last incomplete phase

**Recovery Commands:**
```bash
# 1. Check latest checkpoint
cat .coditect/checkpoints/orchestration-latest.json | jq '.phases_completed'

# 2. Review progress
tail -40 orchestration-progress.md

# 3. Check FSM state
cat .coditect/checkpoints/orchestration-latest.json | jq '.fsm_state'

# 4. Review token budget
cat .coditect/checkpoints/orchestration-latest.json | jq '.token_budget'

# 5. Check complexity score
cat .coditect/checkpoints/orchestration-latest.json | jq '.complexity_score'

# 6. List agent results
cat .coditect/checkpoints/orchestration-latest.json | jq '.agent_invocations'

State Management Best Practices

Checkpoint Files (JSON Schema):

  • Store in .coditect/checkpoints/orchestration-{timestamp}.json
  • Track phase completion with token costs
  • Cache agent invocation results
  • Monitor complexity score and iteration count

Progress Tracking (Markdown Narrative):

  • Maintain orchestration-progress.md with phase status
  • Document workflow decisions and trade-offs
  • Note iteration reasons and fixes applied
  • List next phases with time estimates

Git Integration:

  • Create checkpoint after each major phase
  • Commit with phase markers: feat(workflow): Complete Design phase
  • Tag workflow milestones: git tag workflow-backend-complete

Progress Checkpoints

Natural Breaking Points:

  1. After each workflow phase completed (Research, Design, CODE, etc.)
  2. After complexity gauge shows Warning or Critical zone
  3. Before FSM state transitions
  4. When iteration count reaches 5+
  5. After token budget reaches 70%

Checkpoint Creation Pattern:

# Automatic checkpoint at workflow milestones
if phase_complete or token_percentage >= 70 or iteration_count >= 5:
create_checkpoint({
"workflow": workflow_type,
"phases": completed_phases,
"agents": invocation_results,
"fsm": current_state,
"complexity": score,
"tokens": budget_status
})

Example: Multi-Context Full-Stack Workflow

Context Window 1: Research & Design

{
"checkpoint_id": "ckpt_workflow_design",
"phase": "design_complete",
"workflow": "full_stack_feature",
"phases": ["Research", "Design"],
"token_used": 33000,
"next_action": "Start Backend implementation",
"projected_total": 95000
}

Context Window 2: Implementation & Testing

# Load checkpoint
cat .coditect/checkpoints/ckpt_workflow_design.json

# Continue with Backend → Frontend → TEST
# Token savings: ~15000 tokens (research results cached)

Token Savings Analysis:

  • Without checkpoint: 110000 tokens (re-run research + design)
  • With checkpoint: 95000 tokens (resume from design state)
  • Savings: 14% reduction (110000 → 95000 tokens)
  • Context collapse prevented: Would have failed at ~105K without checkpoints

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: multi-agent-workflow

Workflow Execution Summary:
- Workflow type: {full_stack_feature|bug_investigation|security_audit|deployment_validation|etc.}
- Phases completed: {N/7}
- Token budget: {used}/{projected} of {limit} (X% efficiency)
- Complexity score: {score}
- FSM state: COMPLETE
- Iteration count: {N}

Completed:
- [x] Complexity assessment performed
- [x] Orchestration plan created
- [x] Phase 1-N executed successfully
- [x] Token budget managed within limits
- [x] Context compression applied (Level X)
- [x] Checkpoint created at {percentage}% completion
- [x] All specialized agents invoked successfully
- [x] Verification completed

Outputs:
- Checkpoint file: .coditect/checkpoints/orchestration-{timestamp}.json
- Progress notes: orchestration-progress.md
- Token savings: {X}% vs. manual approach
- Agent invocations cached: {N}

Completion Checklist

Before marking this skill as complete, verify:

  • /complexity-gauge ran before workflow start
  • Orchestrator created detailed execution plan
  • All required phases completed (or checkpointed)
  • Token budget stayed below 95% limit
  • Context compression applied when tokens > 70%
  • Checkpoint created when tokens > 85%
  • All specialized subagent invocations successful
  • FSM state persisted to FoundationDB (if using recursive workflow)
  • Final verification passed
  • Handoff document created if workflow incomplete

Failure Indicators

This skill has FAILED if:

  • ❌ Complexity assessment skipped (no /complexity-gauge)
  • ❌ Token budget exceeded 95% without checkpoint
  • ❌ Workflow stuck in retry loop (iteration > 10)
  • ❌ Context collapse occurred (>100% token usage)
  • ❌ FSM state not persisted (cannot resume)
  • ❌ Specialized agents returned errors without resolution
  • ❌ Cascading dependencies not resolved
  • ❌ Phase completion verification missing
  • ❌ No checkpoint created for long-running workflow
  • ❌ Incomplete workflow without handoff document

When NOT to Use

Do NOT use this skill when:

  • Single-module tasks requiring no coordination
  • Simple workflows with fewer than 3 phases
  • Token usage under 50% of limit (no budget concerns)
  • No cascading dependencies across modules
  • Task can be completed by single specialized agent
  • Quick exploratory work without formal orchestration
  • Prototype or throwaway code development
  • Documentation-only updates without code changes

Use alternatives instead:

  • Direct specialized agent invocation for single-module tasks
  • Manual coordination for simple 2-phase workflows
  • Standard development workflow for low-complexity tasks
  • Exploratory agents for research and investigation

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Skipping complexity assessmentUnmanaged context collapseAlways run /complexity-gauge first
Continuing workflow > 95% tokensContext failure imminentCreate checkpoint, create handoff
Using recursive workflow for simple issuesUnnecessary FSM overheadUse orchestrator for 3+ phases, direct agents for simple
Ignoring checkpoint warningsLost progress on failureCheckpoint at 85% tokens automatically
Not compressing context at 70%Gradual context degradationApply Level 1-3 compression proactively
Manual phase trackingInconsistent state managementUse FSM with FoundationDB persistence
No verification between phasesErrors compound across phasesVerify each phase before proceeding
Parallel execution without independence checkConflicts and race conditionsValidate tasks are truly independent
Missing handoff documentationCannot resume after sessionAlways create handoff if incomplete

Principles

This skill embodies:

  • #4 Separation of Concerns - Each phase has clear responsibilities, specialized agents
  • #8 No Assumptions - Complexity assessment before execution, verification after
  • #10 Measure → Learn → Improve - Token budget tracking, complexity scoring
  • #11 Inform → Do → Verify - Orchestration plan → execution → verification
  • #12 Self-Provisioning - Automatic checkpoint creation, context compression

Multi-Agent Orchestration Principles:

  1. Always assess complexity upfront with /complexity-gauge
  2. Check token budget at 50% completion (mid-workflow)
  3. Use orchestrator for 3+ phase workflows
  4. Invoke /recursive-workflow for cascading multi-module issues
  5. Create checkpoints when tokens > 85%
  6. Compress context proactively at 70% utilization
  7. Persist FSM state for resumability
  8. Verify each phase before proceeding to next

Token Budget Management:

  • Safe Zone (<70%): Proceed normally
  • Warning Zone (70-85%): Apply compression
  • Critical Zone (85-95%): Mandatory checkpoint
  • Over-budget (>95%): STOP, create handoff

Full Standard: CODITECT-STANDARD-AUTOMATION.md


Example 1: Full-Stack Feature

User: "Implement user profile editing with backend API and frontend UI"