Multi Agent Workflow Orchestration

Multi-Agent Workflow Orchestration

Expert skill for managing complex multi-agent workflows with token budget awareness, context management, and recursive state-based resolution. This skill enables coordination of multiple specialized agents across complex, multi-phase development workflows.

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Expert skill for managing complex multi-agent workflows with token budget awareness, context management, and recursive state-based resolution.

When to Use This Skill

✅ Use this skill when:

Complex Multi-Phase Workflows: Tasks requiring coordination of 3+ phases across multiple modules
Token Budget Management: Workflows approaching 70K+ tokens (out of 160K limit)
Cascading Dependencies: Issues spanning backend → frontend → database → infrastructure
Recursive Resolution: Iterative problem-solving with retry/traceback logic
Context Collapse Prevention: Long-running sessions needing checkpoint/resume
Proven: 7 production workflows (full-stack features, security audits, deployment validation)
Token efficiency: Prevents 95% context collapse with 85%+ checkpoint triggers

❌ Don't use this skill when:

Single-module tasks (no coordination needed)
Simple workflows (< 3 phases)
Token usage < 50% (no budget concerns)
No cascading dependencies

Core Capabilities

1. Complexity Assessment

Before starting any multi-phase workflow, assess complexity using the complexity gauge:

/complexity-gauge

Interpret Results:

Safe Zone (< 70% tokens): Proceed normally
Warning Zone (70-85% tokens): Apply context compression
Critical Zone (85-95% tokens): Mandatory checkpoint before continuing
Over-budget (> 95% tokens): STOP - Create handoff document

Complexity Factors (scoring):

Module count: 5 points per module
Dependency depth: 10 points per level
Subagent invocations: 3 points per agent
File operations: 1 point per file
Context switches: 15 points per handoff
Recursive calls: 20 points per iteration

2. Orchestrated Workflows

Use the orchestrator for multi-step coordinated workflows:

When to invoke:

"Use orchestrator to [implement full-stack feature / run security audit / validate deployment]"

Orchestrator will:

Create detailed execution plan with phase breakdown
Assign specialized subagents (codebase-locator, codebase-analyzer, etc.)
Generate ready-to-execute Task calls
Provide token budget tracking
Specify error handling strategies

7 Production Workflows:

Full-Stack Feature Development (~60K tokens, 15-25 min)
Bug Investigation & Fix (~50K tokens, 10-20 min)
Security Audit (~55K tokens, 12-18 min)
Deployment Validation (~50K tokens, 10-15 min)
Code Quality Cycle (~60K tokens, 15-20 min)
Codebase Research (~45K tokens, 8-12 min)
Project Cleanup (~30K tokens, 5-10 min)

3. Recursive Workflows

For cascading multi-module issues requiring iterative resolution:

/recursive-workflow

FSM States:

INITIATE → IDENTIFY → DOCUMENT → SOLVE → CODE → DEPLOY → TEST → VALIDATE → COMPLETE

Features:

State persistence to FoundationDB (resume across sessions)
Traceback/retry logic (max 10 iterations)
Automatic /complexity-gauge checks at each state transition
Checkpoint creation when tokens > 85%
Context handoff for session restarts

Example Use Case:

Issue: "Session invalidation not propagating from backend to frontend"

Affected Modules:
- Backend: backend/src/handlers/auth.rs
- Frontend: src/services/authStore.ts
- Database: FoundationDB session table

Workflow:
1. IDENTIFY: Locate all session-related code
2. DOCUMENT: Capture current session flow
3. SOLVE: Design propagation strategy
4. CODE: Implement backend → frontend sync
5. DEPLOY: Build and deploy changes
6. TEST: Validate end-to-end flow
7. VALIDATE: Confirm no regressions

Token Budget Management

Pre-Workflow (Phase 0)

# Always assess before starting
# /complexity-gauge

if projected_tokens > 100K:
    # Split into 2 sessions with context_save between
    session_1 = phases[0:3]
    session_2 = phases[4:7]

Mid-Workflow (Phase 3+)

# Check after 50% completion or Phase 3
# /complexity-gauge

if current_tokens > 70% of 160K:
    # Apply context compression
    - Summarize completed phases (3-5 bullets)
    - Remove verbose subagent outputs
    - Keep only file:line references

Critical Zone (> 85%)

/complexity-gauge  # Confirm critical status

/context-save project_root=/home/hal/v4/PROJECTS/t2 context_type=comprehensive

# Create handoff document:
# - What's complete
# - What remains
# - Exact next steps
# - Context recovery instructions

## Context Compression Strategies

### Level 1: Light Compression (10-20% reduction)
- Summarize completed phases (5 bullets max per phase)
- Remove verbose subagent outputs, keep findings
- Archive file contents, keep references

### Level 2: Moderate Compression (30-40% reduction)
- Aggressive phase summarization (1-2 sentences per phase)
- Remove all file contents, keep metadata only
- Consolidate duplicate information
- Store details in FoundationDB, keep references

### Level 3: Heavy Compression (50-70% reduction)
- Minimal phase tracking (FSM state only)
- External storage for all details (FDB + filesystem)
- Keep only: current state, next actions, critical blockers
- Use `/context-save` to archive everything else

## Workflow Patterns

### Pattern 1: Context-Aware Orchestration

/complexity-gauge (assess upfront)
Orchestrator creates plan
Execute Phase 1-2
/complexity-gauge (mid-check)
If Warning: Apply Level 1 compression
Execute Phase 3-4
/complexity-gauge (final check)
Complete or checkpoint

### Pattern 2: Recursive Resolution

/complexity-gauge (baseline)
/recursive-workflow (initiate FSM)
Auto-monitors tokens at each state
If tokens > 85%: Auto-checkpoint
Resume from FDB state in next session

### Pattern 3: Parallel Decomposition

/complexity-gauge (assess)
If complex: Split into parallel paths
Execute independent modules concurrently
Final /complexity-gauge before merge
Merge results with validation

## Integration with T2 Architecture

### Backend (Rust/Actix-web/FoundationDB)
- Use `WorkflowState` model for FSM persistence
- Use `StateTransition` for audit trail
- Use `WorkflowCheckpoint` for recovery points

**FDB Key Patterns**:

/{tenant_id}/workflows/{workflow_id}/state /{tenant_id}/workflows/{workflow_id}/history/{n} /{tenant_id}/workflows/{workflow_id}/checkpoints/{n}

### Frontend (React/TypeScript/Theia)
- Workflows tied to `WorkspaceSession` via `session_id`
- UI shows workflow progress (current_state, iteration_count)
- Checkpoint/resume UI for long-running workflows

### Specialized Subagents (7 available)
1. **codebase-analyzer** - Implementation details
2. **codebase-locator** - File/component location
3. **codebase-pattern-finder** - Pattern identification
4. **project-organizer** - Directory structure maintenance
5. **thoughts-analyzer** - Decision extraction
6. **thoughts-locator** - Documentation finding
7. **web-search-researcher** - External research

## Best Practices

### ✅ Do This
- Always run `/complexity-gauge` before multi-phase workflows
- Check token budget at 50% completion
- Use orchestrator for 3+ phase workflows
- Invoke `/recursive-workflow` for cascading issues
- Create checkpoints when tokens > 85%
- Compress context proactively at 70%

### ❌ Avoid This
- Don't skip complexity assessment
- Don't continue workflows > 95% token budget
- Don't use recursive workflow for simple issues
- Don't ignore checkpoint warnings
- Don't forget to use `/context-save` before stopping

## Troubleshooting

### "Workflow stuck in loop (IDENTIFY → TEST → IDENTIFY)"
**Cause**: Test failures not providing enough information for fix

**Solution**:
- Add explicit failure analysis in TEST state
- Use `codebase-pattern-finder` for similar fixes
- Consider ESCALATE if iteration > 5

### "Context collapse during workflow"
**Cause**: Too many iterations without checkpoint

**Solution**:
- Trigger checkpoint at 70% tokens (not 85%)
- Use Level 2+ compression earlier
- Archive completed states immediately

### "Can't resume workflow after session restart"
**Cause**: FDB state not persisted or query failed

**Solution**:
- Verify FDB connection
- Check `workflow_id` is correct
- Use `/context-restore` with project ID
- Review FDB key structure

## Examples

## Multi-Context Window Support

This skill supports long-running multi-agent orchestration across multiple context windows using Claude 4.5's enhanced state management capabilities.

### State Tracking

**Orchestration State (JSON):**
```json
{
  "checkpoint_id": "ckpt_20251129_161000",
  "workflow_type": "full_stack_feature",
  "phases_completed": [
    {"phase": "Research", "token_cost": 15000, "status": "complete"},
    {"phase": "Design", "token_cost": 18000, "status": "complete"},
    {"phase": "Backend", "token_cost": 22000, "status": "in_progress"}
  ],
  "agent_invocations": [
    {"agent": "codebase-locator", "phase": "Research", "result": "cached"},
    {"agent": "codebase-analyzer", "phase": "Design", "result": "cached"}
  ],
  "complexity_score": 185,
  "token_budget": {"used": 55000, "projected": 120000, "limit": 160000},
  "fsm_state": "CODE",
  "iteration_count": 2,
  "created_at": "2025-11-29T16:10:00Z"
}

Progress Notes (Markdown):

Multi-Agent Workflow Progress - 2025-11-29

Completed

Research phase: Located all relevant files (15 min)
Design phase: Architecture validated (18 min)
Backend implementation started

In Progress

Backend endpoint implementation (22K tokens used)
Iteration 2 of CODE phase (validation fixing)

Next Actions

Complete backend validation fixes
Start frontend implementation phase
Run integration tests
Create deployment plan

### Session Recovery

When starting a fresh context window after orchestration work:

1. **Load Checkpoint State**: Read `.coditect/checkpoints/orchestration-latest.json`
2. **Review Progress Notes**: Check `orchestration-progress.md` for context
3. **Verify FSM State**: Check current workflow state (INITIATE, CODE, etc.)
4. **Check Token Budget**: Review projected vs. actual token usage
5. **Resume Workflow**: Continue from last incomplete phase

**Recovery Commands:**
```bash
# 1. Check latest checkpoint
cat .coditect/checkpoints/orchestration-latest.json | jq '.phases_completed'

# 2. Review progress
tail -40 orchestration-progress.md

# 3. Check FSM state
cat .coditect/checkpoints/orchestration-latest.json | jq '.fsm_state'

# 4. Review token budget
cat .coditect/checkpoints/orchestration-latest.json | jq '.token_budget'

# 5. Check complexity score
cat .coditect/checkpoints/orchestration-latest.json | jq '.complexity_score'

# 6. List agent results
cat .coditect/checkpoints/orchestration-latest.json | jq '.agent_invocations'

State Management Best Practices

Checkpoint Files (JSON Schema):

Store in .coditect/checkpoints/orchestration-{timestamp}.json
Track phase completion with token costs
Cache agent invocation results
Monitor complexity score and iteration count

Progress Tracking (Markdown Narrative):

Maintain orchestration-progress.md with phase status
Document workflow decisions and trade-offs
Note iteration reasons and fixes applied
List next phases with time estimates

Git Integration:

Create checkpoint after each major phase
Commit with phase markers: feat(workflow): Complete Design phase
Tag workflow milestones: git tag workflow-backend-complete

Progress Checkpoints

Natural Breaking Points:

After each workflow phase completed (Research, Design, CODE, etc.)
After complexity gauge shows Warning or Critical zone
Before FSM state transitions
When iteration count reaches 5+
After token budget reaches 70%

Checkpoint Creation Pattern:

# Automatic checkpoint at workflow milestones
if phase_complete or token_percentage >= 70 or iteration_count >= 5:
    create_checkpoint({
        "workflow": workflow_type,
        "phases": completed_phases,
        "agents": invocation_results,
        "fsm": current_state,
        "complexity": score,
        "tokens": budget_status
    })

Example: Multi-Context Full-Stack Workflow

Context Window 1: Research & Design

{
  "checkpoint_id": "ckpt_workflow_design",
  "phase": "design_complete",
  "workflow": "full_stack_feature",
  "phases": ["Research", "Design"],
  "token_used": 33000,
  "next_action": "Start Backend implementation",
  "projected_total": 95000
}

Context Window 2: Implementation & Testing

# Load checkpoint
cat .coditect/checkpoints/ckpt_workflow_design.json

# Continue with Backend → Frontend → TEST
# Token savings: ~15000 tokens (research results cached)

Token Savings Analysis:

Without checkpoint: 110000 tokens (re-run research + design)
With checkpoint: 95000 tokens (resume from design state)
Savings: 14% reduction (110000 → 95000 tokens)
Context collapse prevented: Would have failed at ~105K without checkpoints

Success Output

When successful, this skill MUST output:

✅ SKILL COMPLETE: multi-agent-workflow

Workflow Execution Summary:
- Workflow type: {full_stack_feature|bug_investigation|security_audit|deployment_validation|etc.}
- Phases completed: {N/7}
- Token budget: {used}/{projected} of {limit} (X% efficiency)
- Complexity score: {score}
- FSM state: COMPLETE
- Iteration count: {N}

Completed:
- [x] Complexity assessment performed
- [x] Orchestration plan created
- [x] Phase 1-N executed successfully
- [x] Token budget managed within limits
- [x] Context compression applied (Level X)
- [x] Checkpoint created at {percentage}% completion
- [x] All specialized agents invoked successfully
- [x] Verification completed

Outputs:
- Checkpoint file: .coditect/checkpoints/orchestration-{timestamp}.json
- Progress notes: orchestration-progress.md
- Token savings: {X}% vs. manual approach
- Agent invocations cached: {N}

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Complexity assessment skipped (no /complexity-gauge)
❌ Token budget exceeded 95% without checkpoint
❌ Workflow stuck in retry loop (iteration > 10)
❌ Context collapse occurred (>100% token usage)
❌ FSM state not persisted (cannot resume)
❌ Specialized agents returned errors without resolution
❌ Cascading dependencies not resolved
❌ Phase completion verification missing
❌ No checkpoint created for long-running workflow
❌ Incomplete workflow without handoff document

When NOT to Use

Do NOT use this skill when:

Single-module tasks requiring no coordination
Simple workflows with fewer than 3 phases
Token usage under 50% of limit (no budget concerns)
No cascading dependencies across modules
Task can be completed by single specialized agent
Quick exploratory work without formal orchestration
Prototype or throwaway code development
Documentation-only updates without code changes

Use alternatives instead:

Direct specialized agent invocation for single-module tasks
Manual coordination for simple 2-phase workflows
Standard development workflow for low-complexity tasks
Exploratory agents for research and investigation

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Skipping complexity assessment	Unmanaged context collapse	Always run `/complexity-gauge` first
Continuing workflow > 95% tokens	Context failure imminent	Create checkpoint, create handoff
Using recursive workflow for simple issues	Unnecessary FSM overhead	Use orchestrator for 3+ phases, direct agents for simple
Ignoring checkpoint warnings	Lost progress on failure	Checkpoint at 85% tokens automatically
Not compressing context at 70%	Gradual context degradation	Apply Level 1-3 compression proactively
Manual phase tracking	Inconsistent state management	Use FSM with FoundationDB persistence
No verification between phases	Errors compound across phases	Verify each phase before proceeding
Parallel execution without independence check	Conflicts and race conditions	Validate tasks are truly independent
Missing handoff documentation	Cannot resume after session	Always create handoff if incomplete

Principles

This skill embodies:

#4 Separation of Concerns - Each phase has clear responsibilities, specialized agents
#8 No Assumptions - Complexity assessment before execution, verification after
#10 Measure → Learn → Improve - Token budget tracking, complexity scoring
#11 Inform → Do → Verify - Orchestration plan → execution → verification
#12 Self-Provisioning - Automatic checkpoint creation, context compression

Multi-Agent Orchestration Principles:

Always assess complexity upfront with /complexity-gauge
Check token budget at 50% completion (mid-workflow)
Use orchestrator for 3+ phase workflows
Invoke /recursive-workflow for cascading multi-module issues
Create checkpoints when tokens > 85%
Compress context proactively at 70% utilization
Persist FSM state for resumability
Verify each phase before proceeding to next

Token Budget Management:

Safe Zone (<70%): Proceed normally
Warning Zone (70-85%): Apply compression
Critical Zone (85-95%): Mandatory checkpoint
Over-budget (>95%): STOP, create handoff

Full Standard: CODITECT-STANDARD-AUTOMATION.md

Example 1: Full-Stack Feature

User: "Implement user profile editing with backend API and frontend UI"

How to Use This Skill​

When to Use This Skill​

Core Capabilities​

1. Complexity Assessment​

2. Orchestrated Workflows​

3. Recursive Workflows​

Token Budget Management​

Pre-Workflow (Phase 0)​

Mid-Workflow (Phase 3+)​

Critical Zone (> 85%)​

Multi-Agent Workflow Progress - 2025-11-29

Completed​

In Progress​

Next Actions​

State Management Best Practices​

Progress Checkpoints​

Example: Multi-Context Full-Stack Workflow​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

Example 1: Full-Stack Feature​