Multi Agent Workflow Orchestration
Multi-Agent Workflow Orchestration
Expert skill for managing complex multi-agent workflows with token budget awareness, context management, and recursive state-based resolution. This skill enables coordination of multiple specialized agents across complex, multi-phase development workflows.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Expert skill for managing complex multi-agent workflows with token budget awareness, context management, and recursive state-based resolution.
When to Use This Skill
✅ Use this skill when:
- Complex Multi-Phase Workflows: Tasks requiring coordination of 3+ phases across multiple modules
- Token Budget Management: Workflows approaching 70K+ tokens (out of 160K limit)
- Cascading Dependencies: Issues spanning backend → frontend → database → infrastructure
- Recursive Resolution: Iterative problem-solving with retry/traceback logic
- Context Collapse Prevention: Long-running sessions needing checkpoint/resume
- Proven: 7 production workflows (full-stack features, security audits, deployment validation)
- Token efficiency: Prevents 95% context collapse with 85%+ checkpoint triggers
❌ Don't use this skill when:
- Single-module tasks (no coordination needed)
- Simple workflows (< 3 phases)
- Token usage < 50% (no budget concerns)
- No cascading dependencies
Core Capabilities
1. Complexity Assessment
Before starting any multi-phase workflow, assess complexity using the complexity gauge:
/complexity-gauge
Interpret Results:
- Safe Zone (< 70% tokens): Proceed normally
- Warning Zone (70-85% tokens): Apply context compression
- Critical Zone (85-95% tokens): Mandatory checkpoint before continuing
- Over-budget (> 95% tokens): STOP - Create handoff document
Complexity Factors (scoring):
- Module count: 5 points per module
- Dependency depth: 10 points per level
- Subagent invocations: 3 points per agent
- File operations: 1 point per file
- Context switches: 15 points per handoff
- Recursive calls: 20 points per iteration
2. Orchestrated Workflows
Use the orchestrator for multi-step coordinated workflows:
When to invoke:
"Use orchestrator to [implement full-stack feature / run security audit / validate deployment]"
Orchestrator will:
- Create detailed execution plan with phase breakdown
- Assign specialized subagents (codebase-locator, codebase-analyzer, etc.)
- Generate ready-to-execute Task calls
- Provide token budget tracking
- Specify error handling strategies
7 Production Workflows:
- Full-Stack Feature Development (~60K tokens, 15-25 min)
- Bug Investigation & Fix (~50K tokens, 10-20 min)
- Security Audit (~55K tokens, 12-18 min)
- Deployment Validation (~50K tokens, 10-15 min)
- Code Quality Cycle (~60K tokens, 15-20 min)
- Codebase Research (~45K tokens, 8-12 min)
- Project Cleanup (~30K tokens, 5-10 min)
3. Recursive Workflows
For cascading multi-module issues requiring iterative resolution:
/recursive-workflow
FSM States:
INITIATE → IDENTIFY → DOCUMENT → SOLVE → CODE → DEPLOY → TEST → VALIDATE → COMPLETE
Features:
- State persistence to FoundationDB (resume across sessions)
- Traceback/retry logic (max 10 iterations)
- Automatic
/complexity-gaugechecks at each state transition - Checkpoint creation when tokens > 85%
- Context handoff for session restarts
Example Use Case:
Issue: "Session invalidation not propagating from backend to frontend"
Affected Modules:
- Backend: backend/src/handlers/auth.rs
- Frontend: src/services/authStore.ts
- Database: FoundationDB session table
Workflow:
1. IDENTIFY: Locate all session-related code
2. DOCUMENT: Capture current session flow
3. SOLVE: Design propagation strategy
4. CODE: Implement backend → frontend sync
5. DEPLOY: Build and deploy changes
6. TEST: Validate end-to-end flow
7. VALIDATE: Confirm no regressions
Token Budget Management
Pre-Workflow (Phase 0)
# Always assess before starting
# /complexity-gauge
if projected_tokens > 100K:
# Split into 2 sessions with context_save between
session_1 = phases[0:3]
session_2 = phases[4:7]
Mid-Workflow (Phase 3+)
# Check after 50% completion or Phase 3
# /complexity-gauge
if current_tokens > 70% of 160K:
# Apply context compression
- Summarize completed phases (3-5 bullets)
- Remove verbose subagent outputs
- Keep only file:line references
Critical Zone (> 85%)
/complexity-gauge # Confirm critical status
/context-save project_root=/home/hal/v4/PROJECTS/t2 context_type=comprehensive
# Create handoff document:
# - What's complete
# - What remains
# - Exact next steps
# - Context recovery instructions
## Context Compression Strategies
### Level 1: Light Compression (10-20% reduction)
- Summarize completed phases (5 bullets max per phase)
- Remove verbose subagent outputs, keep findings
- Archive file contents, keep references
### Level 2: Moderate Compression (30-40% reduction)
- Aggressive phase summarization (1-2 sentences per phase)
- Remove all file contents, keep metadata only
- Consolidate duplicate information
- Store details in FoundationDB, keep references
### Level 3: Heavy Compression (50-70% reduction)
- Minimal phase tracking (FSM state only)
- External storage for all details (FDB + filesystem)
- Keep only: current state, next actions, critical blockers
- Use `/context-save` to archive everything else
## Workflow Patterns
### Pattern 1: Context-Aware Orchestration
- /complexity-gauge (assess upfront)
- Orchestrator creates plan
- Execute Phase 1-2
- /complexity-gauge (mid-check)
- If Warning: Apply Level 1 compression
- Execute Phase 3-4
- /complexity-gauge (final check)
- Complete or checkpoint
### Pattern 2: Recursive Resolution
- /complexity-gauge (baseline)
- /recursive-workflow (initiate FSM)
- Auto-monitors tokens at each state
- If tokens > 85%: Auto-checkpoint
- Resume from FDB state in next session
### Pattern 3: Parallel Decomposition
- /complexity-gauge (assess)
- If complex: Split into parallel paths
- Execute independent modules concurrently
- Final /complexity-gauge before merge
- Merge results with validation
## Integration with T2 Architecture
### Backend (Rust/Actix-web/FoundationDB)
- Use `WorkflowState` model for FSM persistence
- Use `StateTransition` for audit trail
- Use `WorkflowCheckpoint` for recovery points
**FDB Key Patterns**:
/{tenant_id}/workflows/{workflow_id}/state /{tenant_id}/workflows/{workflow_id}/history/{n} /{tenant_id}/workflows/{workflow_id}/checkpoints/{n}
### Frontend (React/TypeScript/Theia)
- Workflows tied to `WorkspaceSession` via `session_id`
- UI shows workflow progress (current_state, iteration_count)
- Checkpoint/resume UI for long-running workflows
### Specialized Subagents (7 available)
1. **codebase-analyzer** - Implementation details
2. **codebase-locator** - File/component location
3. **codebase-pattern-finder** - Pattern identification
4. **project-organizer** - Directory structure maintenance
5. **thoughts-analyzer** - Decision extraction
6. **thoughts-locator** - Documentation finding
7. **web-search-researcher** - External research
## Best Practices
### ✅ Do This
- Always run `/complexity-gauge` before multi-phase workflows
- Check token budget at 50% completion
- Use orchestrator for 3+ phase workflows
- Invoke `/recursive-workflow` for cascading issues
- Create checkpoints when tokens > 85%
- Compress context proactively at 70%
### ❌ Avoid This
- Don't skip complexity assessment
- Don't continue workflows > 95% token budget
- Don't use recursive workflow for simple issues
- Don't ignore checkpoint warnings
- Don't forget to use `/context-save` before stopping
## Troubleshooting
### "Workflow stuck in loop (IDENTIFY → TEST → IDENTIFY)"
**Cause**: Test failures not providing enough information for fix
**Solution**:
- Add explicit failure analysis in TEST state
- Use `codebase-pattern-finder` for similar fixes
- Consider ESCALATE if iteration > 5
### "Context collapse during workflow"
**Cause**: Too many iterations without checkpoint
**Solution**:
- Trigger checkpoint at 70% tokens (not 85%)
- Use Level 2+ compression earlier
- Archive completed states immediately
### "Can't resume workflow after session restart"
**Cause**: FDB state not persisted or query failed
**Solution**:
- Verify FDB connection
- Check `workflow_id` is correct
- Use `/context-restore` with project ID
- Review FDB key structure
## Examples
## Multi-Context Window Support
This skill supports long-running multi-agent orchestration across multiple context windows using Claude 4.5's enhanced state management capabilities.
### State Tracking
**Orchestration State (JSON):**
```json
{
"checkpoint_id": "ckpt_20251129_161000",
"workflow_type": "full_stack_feature",
"phases_completed": [
{"phase": "Research", "token_cost": 15000, "status": "complete"},
{"phase": "Design", "token_cost": 18000, "status": "complete"},
{"phase": "Backend", "token_cost": 22000, "status": "in_progress"}
],
"agent_invocations": [
{"agent": "codebase-locator", "phase": "Research", "result": "cached"},
{"agent": "codebase-analyzer", "phase": "Design", "result": "cached"}
],
"complexity_score": 185,
"token_budget": {"used": 55000, "projected": 120000, "limit": 160000},
"fsm_state": "CODE",
"iteration_count": 2,
"created_at": "2025-11-29T16:10:00Z"
}
Progress Notes (Markdown):
Multi-Agent Workflow Progress - 2025-11-29
Completed
- Research phase: Located all relevant files (15 min)
- Design phase: Architecture validated (18 min)
- Backend implementation started
In Progress
- Backend endpoint implementation (22K tokens used)
- Iteration 2 of CODE phase (validation fixing)
Next Actions
- Complete backend validation fixes
- Start frontend implementation phase
- Run integration tests
- Create deployment plan
### Session Recovery
When starting a fresh context window after orchestration work:
1. **Load Checkpoint State**: Read `.coditect/checkpoints/orchestration-latest.json`
2. **Review Progress Notes**: Check `orchestration-progress.md` for context
3. **Verify FSM State**: Check current workflow state (INITIATE, CODE, etc.)
4. **Check Token Budget**: Review projected vs. actual token usage
5. **Resume Workflow**: Continue from last incomplete phase
**Recovery Commands:**
```bash
# 1. Check latest checkpoint
cat .coditect/checkpoints/orchestration-latest.json | jq '.phases_completed'
# 2. Review progress
tail -40 orchestration-progress.md
# 3. Check FSM state
cat .coditect/checkpoints/orchestration-latest.json | jq '.fsm_state'
# 4. Review token budget
cat .coditect/checkpoints/orchestration-latest.json | jq '.token_budget'
# 5. Check complexity score
cat .coditect/checkpoints/orchestration-latest.json | jq '.complexity_score'
# 6. List agent results
cat .coditect/checkpoints/orchestration-latest.json | jq '.agent_invocations'
State Management Best Practices
Checkpoint Files (JSON Schema):
- Store in
.coditect/checkpoints/orchestration-{timestamp}.json - Track phase completion with token costs
- Cache agent invocation results
- Monitor complexity score and iteration count
Progress Tracking (Markdown Narrative):
- Maintain
orchestration-progress.mdwith phase status - Document workflow decisions and trade-offs
- Note iteration reasons and fixes applied
- List next phases with time estimates
Git Integration:
- Create checkpoint after each major phase
- Commit with phase markers:
feat(workflow): Complete Design phase - Tag workflow milestones:
git tag workflow-backend-complete
Progress Checkpoints
Natural Breaking Points:
- After each workflow phase completed (Research, Design, CODE, etc.)
- After complexity gauge shows Warning or Critical zone
- Before FSM state transitions
- When iteration count reaches 5+
- After token budget reaches 70%
Checkpoint Creation Pattern:
# Automatic checkpoint at workflow milestones
if phase_complete or token_percentage >= 70 or iteration_count >= 5:
create_checkpoint({
"workflow": workflow_type,
"phases": completed_phases,
"agents": invocation_results,
"fsm": current_state,
"complexity": score,
"tokens": budget_status
})
Example: Multi-Context Full-Stack Workflow
Context Window 1: Research & Design
{
"checkpoint_id": "ckpt_workflow_design",
"phase": "design_complete",
"workflow": "full_stack_feature",
"phases": ["Research", "Design"],
"token_used": 33000,
"next_action": "Start Backend implementation",
"projected_total": 95000
}
Context Window 2: Implementation & Testing
# Load checkpoint
cat .coditect/checkpoints/ckpt_workflow_design.json
# Continue with Backend → Frontend → TEST
# Token savings: ~15000 tokens (research results cached)
Token Savings Analysis:
- Without checkpoint: 110000 tokens (re-run research + design)
- With checkpoint: 95000 tokens (resume from design state)
- Savings: 14% reduction (110000 → 95000 tokens)
- Context collapse prevented: Would have failed at ~105K without checkpoints
Success Output
When successful, this skill MUST output:
✅ SKILL COMPLETE: multi-agent-workflow
Workflow Execution Summary:
- Workflow type: {full_stack_feature|bug_investigation|security_audit|deployment_validation|etc.}
- Phases completed: {N/7}
- Token budget: {used}/{projected} of {limit} (X% efficiency)
- Complexity score: {score}
- FSM state: COMPLETE
- Iteration count: {N}
Completed:
- [x] Complexity assessment performed
- [x] Orchestration plan created
- [x] Phase 1-N executed successfully
- [x] Token budget managed within limits
- [x] Context compression applied (Level X)
- [x] Checkpoint created at {percentage}% completion
- [x] All specialized agents invoked successfully
- [x] Verification completed
Outputs:
- Checkpoint file: .coditect/checkpoints/orchestration-{timestamp}.json
- Progress notes: orchestration-progress.md
- Token savings: {X}% vs. manual approach
- Agent invocations cached: {N}
Completion Checklist
Before marking this skill as complete, verify:
-
/complexity-gaugeran before workflow start - Orchestrator created detailed execution plan
- All required phases completed (or checkpointed)
- Token budget stayed below 95% limit
- Context compression applied when tokens > 70%
- Checkpoint created when tokens > 85%
- All specialized subagent invocations successful
- FSM state persisted to FoundationDB (if using recursive workflow)
- Final verification passed
- Handoff document created if workflow incomplete
Failure Indicators
This skill has FAILED if:
- ❌ Complexity assessment skipped (no
/complexity-gauge) - ❌ Token budget exceeded 95% without checkpoint
- ❌ Workflow stuck in retry loop (iteration > 10)
- ❌ Context collapse occurred (>100% token usage)
- ❌ FSM state not persisted (cannot resume)
- ❌ Specialized agents returned errors without resolution
- ❌ Cascading dependencies not resolved
- ❌ Phase completion verification missing
- ❌ No checkpoint created for long-running workflow
- ❌ Incomplete workflow without handoff document
When NOT to Use
Do NOT use this skill when:
- Single-module tasks requiring no coordination
- Simple workflows with fewer than 3 phases
- Token usage under 50% of limit (no budget concerns)
- No cascading dependencies across modules
- Task can be completed by single specialized agent
- Quick exploratory work without formal orchestration
- Prototype or throwaway code development
- Documentation-only updates without code changes
Use alternatives instead:
- Direct specialized agent invocation for single-module tasks
- Manual coordination for simple 2-phase workflows
- Standard development workflow for low-complexity tasks
- Exploratory agents for research and investigation
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Skipping complexity assessment | Unmanaged context collapse | Always run /complexity-gauge first |
| Continuing workflow > 95% tokens | Context failure imminent | Create checkpoint, create handoff |
| Using recursive workflow for simple issues | Unnecessary FSM overhead | Use orchestrator for 3+ phases, direct agents for simple |
| Ignoring checkpoint warnings | Lost progress on failure | Checkpoint at 85% tokens automatically |
| Not compressing context at 70% | Gradual context degradation | Apply Level 1-3 compression proactively |
| Manual phase tracking | Inconsistent state management | Use FSM with FoundationDB persistence |
| No verification between phases | Errors compound across phases | Verify each phase before proceeding |
| Parallel execution without independence check | Conflicts and race conditions | Validate tasks are truly independent |
| Missing handoff documentation | Cannot resume after session | Always create handoff if incomplete |
Principles
This skill embodies:
- #4 Separation of Concerns - Each phase has clear responsibilities, specialized agents
- #8 No Assumptions - Complexity assessment before execution, verification after
- #10 Measure → Learn → Improve - Token budget tracking, complexity scoring
- #11 Inform → Do → Verify - Orchestration plan → execution → verification
- #12 Self-Provisioning - Automatic checkpoint creation, context compression
Multi-Agent Orchestration Principles:
- Always assess complexity upfront with
/complexity-gauge - Check token budget at 50% completion (mid-workflow)
- Use orchestrator for 3+ phase workflows
- Invoke
/recursive-workflowfor cascading multi-module issues - Create checkpoints when tokens > 85%
- Compress context proactively at 70% utilization
- Persist FSM state for resumability
- Verify each phase before proceeding to next
Token Budget Management:
- Safe Zone (<70%): Proceed normally
- Warning Zone (70-85%): Apply compression
- Critical Zone (85-95%): Mandatory checkpoint
- Over-budget (>95%): STOP, create handoff
Full Standard: CODITECT-STANDARD-AUTOMATION.md
Example 1: Full-Stack Feature
User: "Implement user profile editing with backend API and frontend UI"