Multi-llm Autonomous Workflow Examples: RESEARCH → PLAN → CODE

Analysis Date: 2025-10-14
Scope: Concrete examples of autonomous development workflows across Claude Code, Gemini CLI, Grok CLI, and OpenAI CLI

Overview

This document provides concrete examples of how the autonomous RESEARCH → PLAN → CODE workflow can be implemented across different llms while leveraging HumanLayer's .claude components and approval infrastructure.

Common Workflow Foundation

Reusable .claude Components

Autonomous Development System Prompt (Shared)

## 🤖 Autonomous Development Mode

### Four Operating Modes
- **DELIBERATION**: Analyze requirements, decompose tasks, NO code generation
- **RESEARCH**: Execute tool calls to verify assumptions and gather information
- **PLAN**: Synthesize research into detailed implementation specification
- **ACTION**: Generate working code artifacts with human approval gates

### Artifact Management Rules
- Size threshold: >20 lines OR >1500 characters becomes artifact
- One artifact per response to prevent state explosion
- Update vs rewrite: Max 4 updates before mandatory rewrite
- All artifacts include error handling and integration points

### State Manifest (Required in every response)

CURRENT STATE:

Phase: [DELIBERATION|RESEARCH|PLAN|ACTION]
Active artifact: [current file being worked on]
Iteration: [count]
Integration: [status of connections between components]
Next: [planned next steps]

### Error Ritual (When issues occur)
STOP - Don't continue broken iteration
ANALYZE - What caused the error?
EXTRACT - What lesson prevents recurrence?
CLEAN - Remove false assumptions
RETRY - Fresh attempt with learned knowledge

Example Scenario: Building a Real-Time Task Management System

Requirements:

Real-time collaboration features
WebSocket connections
Database persistence
User authentication
React frontend with TypeScript
Node.js backend

Let's see how each llm handles this using the autonomous workflow.

Claude Code Implementation (Baseline)

Phase 1: DELIBERATION

// Claude Code starts with MCP-integrated deliberation
const session = await createSession({
  provider: 'claude',
  model: 'sonnet',
  query: 'Build a real-time task management system with WebSocket collaboration',
  mode: 'autonomous'
})

// Claude's built-in reasoning follows the system prompt:

Claude's Deliberation Response:

ANALYSIS:
- Core requirements: Real-time updates, task CRUD, user auth, collaborative editing
- Technical constraints: WebSocket scalability, database consistency, auth security
- Unknown factors: WebSocket library choice, database schema design, state management

DECOMPOSITION:
Phase 1: Database schema + auth foundation (User, Task, Session models)
Phase 2: WebSocket server + real-time event handling
Phase 3: React frontend + real-time state management
Phase 4: Integration testing + error handling + deployment

ARTIFACT STRATEGY:
- Primary artifact: Backend API server (~200 lines)
- Supporting artifacts: Database schema, React components, WebSocket handlers
- Integration points: REST API ↔ WebSocket ↔ Database ↔ Frontend

READY TO PROCEED: No - need to research WebSocket libraries and database options

Phase 2: RESEARCH

// Claude executes research via MCP tools with approval
await session.executeTool('request_permission', {
  tool_name: 'web_search',
  input: { query: 'Node.js WebSocket libraries 2024 Socket.io vs ws comparison' },
  explanation: 'Research WebSocket library options for real-time features'
})

await session.executeTool('request_permission', {
  tool_name: 'read_file',
  input: { path: 'package.json' },
  explanation: 'Check existing project dependencies'
})

Research Findings:

RESEARCH RESULTS:
- WebSocket Library: Socket.io chosen (better fallback support, room management)
- Database: PostgreSQL with Prisma ORM (better real-time subscriptions)
- Auth: JWT + bcrypt (standard, secure)
- Frontend State: Zustand + Socket.io client (lightweight, real-time friendly)

TECH DECISIONS:
- Backend: Node.js + Express + Socket.io + Prisma
- Frontend: React + TypeScript + Zustand + Socket.io-client
- Database: PostgreSQL with row-level security

Phase 3: PLAN

// Claude synthesizes research into implementation plan
await session.executePlanPhase({
  researchFindings: researchResults,
  requirements: originalRequirements,
  constraints: technicalConstraints
})

Implementation Plan:

IMPLEMENTATION PLAN:

## System Architecture
- Real-time WebSocket server with Socket.io rooms
- PostgreSQL database with Prisma ORM and row-level security  
- React frontend with Zustand state management
- JWT-based authentication with secure token management

## Phase Breakdown
Phase 1: Foundation (300 lines)
- Database schema and models (User, Task, Room)
- Authentication system with JWT
- Basic API structure with Express

Phase 2: Real-time Core (400 lines)
- WebSocket server with Socket.io
- Room management and user sessions
- Real-time event broadcasting system

Phase 3: Frontend Integration (350 lines)
- React components with real-time hooks
- Zustand store with WebSocket integration
- Task management UI with live updates

Phase 4: Polish (150 lines)
- Error handling and offline support
- Performance optimization
- Testing and deployment configuration

## Approval Gates
- Phase 1→2: Database schema and auth review
- Phase 2→3: WebSocket implementation and security audit
- Phase 3→4: UI/UX review and performance validation
- Phase 4: Final security and deployment review

## Success Criteria
- All real-time updates work across multiple clients
- Authentication is secure and performant
- UI is responsive and handles edge cases
- System scales to 100+ concurrent users

REQUEST APPROVAL TO PROCEED TO ACTION PHASE

Phase 4: ACTION (CODE)

// Claude generates code artifacts with approval gates
await session.executeArtifactCreation({
  phase: 'ACTION',
  approvalRequired: true,
  artifacts: [
    'database-schema.prisma',
    'websocket-server.ts', 
    'task-api.ts',
    'react-task-manager.tsx'
  ]
})

Generated Artifacts (with approval):

Database schema with real-time subscriptions (Phase 1)
WebSocket server with room management (Phase 2)
REST API with authentication (Phase 1)
React components with real-time updates (Phase 3)

Gemini CLI Implementation

Phase 1: DELIBERATION via GEMINI.md Context

# GEMINI.md - Task Management Project Context

## Current Session
```yaml
phase: "DELIBERATION"
iteration: 1
requirements: "Real-time task management with WebSocket collaboration"
extensions_loaded: ["humanlayer-approval", "web-research", "file-operations"]

Analysis Framework

Use the structured approach for autonomous development:

ANALYSIS:

Analyze the requirements for a real-time task management system. Identify core capabilities, technical constraints, and unknown factors.

DECOMPOSITION:

Break down into 4 implementation phases following autonomous methodology.

RESEARCH NEEDS:

Identify what technology choices need validation before proceeding.

Use humanlayer-approval extension for proceeding to research phase.

### Gemini's Deliberation Process:
```bash
gemini --context-file GEMINI.md --extensions humanlayer-approval,web-research

Gemini's Analysis:

ANALYSIS COMPLETE:
- Core requirements: Task CRUD + real-time sync + multi-user collaboration
- Technical constraints: WebSocket connection scaling, conflict resolution, offline support
- Unknown factors: Real-time database choice, WebSocket library performance, state sync strategy

DECOMPOSITION:
Phase 1: Foundation - User models, authentication, basic task CRUD
Phase 2: Real-time layer - WebSocket integration, event broadcasting
Phase 3: Frontend - React components with real-time state synchronization  
Phase 4: Polish - Conflict resolution, offline support, performance optimization

RESEARCH PRIORITIES:
1. WebSocket library comparison (Socket.io vs native WebSockets vs alternatives)
2. Real-time database solutions (Supabase vs Firebase vs custom with PostgreSQL)
3. State synchronization patterns for collaborative editing

Requesting approval to proceed to RESEARCH phase via humanlayer-approval extension.

Phase 2: RESEARCH via Extensions

// Gemini uses extensions for research with approval integration
await gemini.useExtension('humanlayer-approval').requestPermission({
  action_type: 'research_phase',
  description: 'Research WebSocket libraries and real-time database solutions',
  risk_level: 'low'
})

// Approved - proceed with research using web-research extension
await gemini.useExtension('web-research').searchAndAnalyze([
  'Socket.io vs ws vs uws performance comparison 2024',
  'Supabase real-time subscriptions vs PostgreSQL LISTEN/NOTIFY',
  'React real-time state management patterns WebSockets'
])

Updated GEMINI.md Context:

phase: "RESEARCH"
iteration: 2
research_findings:
  websocket_library: "Socket.io - better browser fallback and room management"
  database: "Supabase - built-in real-time subscriptions and auth"
  frontend_state: "Zustand + Supabase client for real-time subscriptions"
tech_decisions:
  backend: "Node.js + Express + Socket.io + Supabase SDK"
  frontend: "React + TypeScript + Zustand + Supabase client"
  deployment: "Vercel + Supabase hosting"
next_phase: "PLAN"

Phase 3: PLAN with Context File Synthesis

// Gemini synthesizes research into implementation plan
await gemini.updateContext({
  phase: 'PLAN',
  research_complete: true,
  planning_mode: 'detailed_specification'
})

Implementation Plan in GEMINI.md:

## Implementation Plan

### System Architecture
- Supabase backend with real-time subscriptions
- Socket.io for enhanced WebSocket features
- React frontend with optimistic updates
- Zustand for client-side state management

### Phase Implementation
Phase 1: Supabase Setup (200 lines)
- Database schema with RLS policies
- Authentication configuration
- Real-time subscription setup

Phase 2: Socket.io Integration (300 lines)  
- WebSocket server with Supabase integration
- Room management and event broadcasting
- Conflict resolution for real-time updates

Phase 3: React Frontend (400 lines)
- Components with real-time hooks
- Zustand store with Supabase sync
- Optimistic UI updates with rollback

Phase 4: Integration Testing (100 lines)
- End-to-end testing setup
- Performance monitoring
- Deployment configuration

### Approval Strategy
- Context file updates after each phase
- Extension-based approval for file operations
- Batch approvals for related components

Phase 4: ACTION with Context File Updates

// Gemini generates code with context tracking
await gemini.useExtension('humanlayer-approval').requestPermission({
  action_type: 'code_generation_phase',
  description: 'Generate task management application with real-time features',
  parameters: {
    files_to_create: ['server.js', 'database.sql', 'TaskManager.tsx', 'useRealtimeTasks.ts'],
    estimated_lines: 800
  },
  risk_level: 'medium'
})

// Context file updated automatically after each artifact

Generated with Extensions Integration:

Supabase schema with real-time subscriptions
Socket.io server with Supabase integration
React hooks for real-time task synchronization
Context file continuously updated with progress

Grok CLI Implementation

Phase 1: DELIBERATION with Multi-Round Planning

# grok-session-config.yaml
session:
  model: "grok-4-latest"
  max_tool_rounds: 400
  high_speed_editing: true
  
deliberation_prompt: |
  MULTI-ROUND AUTONOMOUS DEVELOPMENT
  TOTAL ROUNDS: 400 available
  TASK: Build real-time task management system

  Plan round distribution:
  - DELIBERATION: 20 rounds (5% budget)
  - RESEARCH: 80 rounds (20% budget)  
  - ACTION: 300 rounds (75% budget)

  Use multi-round capabilities for parallel research and high-speed code generation.

Grok's Multi-Round Deliberation:

ROUND BUDGET ALLOCATION:
- Deliberation: 20 rounds for analysis and planning
- Research: 80 rounds for parallel technology investigation
- Action: 300 rounds for high-speed implementation

MULTI-ROUND STRATEGY:
- Parallel research streams: WebSocket libs, databases, frontend patterns
- Bulk code generation using Morph Fast Apply
- Incremental integration testing with approval checkpoints

DECOMPOSITION WITH ROUND ESTIMATES:
Phase 1: Foundation (80 rounds) - Auth, models, basic API
Phase 2: Real-time (120 rounds) - WebSocket server, event handling
Phase 3: Frontend (80 rounds) - React components, real-time hooks  
Phase 4: Integration (20 rounds) - Testing, optimization, deployment

APPROVAL STRATEGY:
- Batch approval for file operations (efficiency)
- Multi-round approval for complex operation sequences
- High-speed generation with approval checkpoints

Phase 2: RESEARCH with Parallel Execution

// Grok executes multiple research streams in parallel
await grokSession.executeMultiRound({
  rounds: 80,
  parallelStreams: [
    {
      topic: 'WebSocket libraries',
      rounds: 25,
      approach: 'comparative_analysis'
    },
    {
      topic: 'Real-time databases', 
      rounds: 30,
      approach: 'performance_benchmarks'
    },
    {
      topic: 'Frontend real-time patterns',
      rounds: 25, 
      approach: 'implementation_examples'
    }
  ],
  approvalRequired: true
})

High-Speed Research Results:

{
  "rounds_used": 78,
  "research_efficiency": "97.5%",
  "findings": {
    "websocket_choice": "Socket.io with clustering for scalability",
    "database_solution": "Redis + PostgreSQL hybrid (Redis for real-time, PG for persistence)",
    "frontend_pattern": "Optimistic updates with conflict resolution"
  },
  "parallel_execution_benefit": "3x faster than sequential research"
}

Phase 3: PLAN with Multi-Round Strategy Planning

// Grok creates comprehensive implementation plan optimized for 400-round execution
await grokSession.executePlanPhase({
  researchResults: highSpeedResearchResults,
  totalRounds: 400,
  remainingRounds: 272, // 400 - 128 used in deliberation/research
  planningBudget: 50
})

Multi-Round Implementation Plan:

{
  "system_architecture": {
    "websocket_clustering": "Socket.io with Redis adapter for horizontal scaling",
    "database_hybrid": "PostgreSQL for persistence + Redis for real-time cache",
    "frontend_optimization": "React with concurrent features for high-performance updates"
  },
  "multi_round_strategy": {
    "total_rounds": 400,
    "phase_allocation": {
      "foundation": 80,
      "real_time_core": 120, 
      "frontend_integration": 80,
      "performance_optimization": 40
    },
    "parallel_execution_opportunities": [
      "Database migration + Socket.io server setup",
      "Frontend component generation + API endpoint creation",
      "Testing setup + deployment configuration"
    ]
  },
  "high_speed_targets": [
    {
      "component": "database_models",
      "target_speed": "5000+ tokens/sec",
      "estimated_rounds": 15
    },
    {
      "component": "react_components", 
      "target_speed": "4500+ tokens/sec",
      "estimated_rounds": 25
    }
  ],
  "approval_strategy": {
    "batch_approvals": "Group file operations by component",
    "multi_round_context": "Track approval context across 300+ rounds",
    "efficiency_target": "Maximum 5 approval cycles for entire implementation"
  }
}

Phase 4: ACTION with High-Speed Code Generation

// Grok uses Morph Fast Apply for bulk code generation
await grokSession.requestApproval({
  operation: 'high_speed_code_generation',
  details: {
    target_speed: '4500+ tokens/sec',
    files_to_generate: 12,
    estimated_total_lines: 1200,
    multi_round_sequence: [
      'Database schemas and migrations',
      'WebSocket server with clustering', 
      'REST API with authentication',
      'React components with real-time hooks',
      'Integration tests and deployment config'
    ]
  },
  multi_round_context: {
    total_rounds_planned: 300,
    batch_generation: true
  }
})

// Execute high-speed generation with Morph Fast Apply
const generationResult = await grokSession.executeHighSpeedGeneration({
  morphFastApply: true,
  targetSpeed: 4500, // tokens/sec
  batchSize: 'optimal',
  approvalCheckpoints: [100, 200, 280] // rounds
})

High-Speed Generation Results:

Speed achieved: 4,650 tokens/sec
Files generated: 12 complete files
Approval efficiency: 3 batch approvals vs 12 individual approvals
Round utilization: 287/300 rounds (95.7% efficiency)

OpenAI CLI Implementation

Phase 1: DELIBERATION with Auto-Approval

# openai-config.yaml
auto_approval:
  safe_operations: ["read_file", "analyze_code", "list_directory", "generate_docs"]
  requires_approval: ["write_file", "execute_command", "install_package"]

deliberation_prompt: |
  AUTONOMOUS DEVELOPMENT with SELECTIVE AUTO-APPROVAL
  TASK: Real-time task management system

  Use execute_operation function for all operations.
  Safe operations (read, analyze, document) will auto-approve.
  Write operations require human approval.

OpenAI's Selective Approval Deliberation:

// Auto-approved operations during deliberation
await openaiSession.executeOperation({
  operation_type: 'analyze_code',
  operation_details: { directory: './project' },
  risk_assessment: 'safe',
  justification: 'Analyzing existing project structure'
})

await openaiSession.executeOperation({
  operation_type: 'read_file', 
  operation_details: { path: 'package.json' },
  risk_assessment: 'safe',
  justification: 'Understanding current dependencies'
})

Deliberation with Auto-Approved Analysis:

ANALYSIS (using 15 auto-approved operations):
- Existing codebase: React + Node.js foundation already present
- Current dependencies: Express, React 18, TypeScript configured
- Missing components: WebSocket support, real-time state management
- Architecture gaps: No database integration, no authentication

DECOMPOSITION:
Phase 1: Add database models and authentication (requires approval for new files)
Phase 2: WebSocket server integration (requires approval for server modifications)
Phase 3: Real-time React components (mix of auto-approved analysis + approved writes)
Phase 4: Testing and deployment (requires approval for config changes)

AUTO-APPROVAL STRATEGY:
- Analysis and documentation: Auto-approved for speed
- Code reading and exploration: Auto-approved 
- File creation and modification: Requires approval
- Package installation: Requires approval

Phase 2: RESEARCH with Heavy Auto-Approval

// OpenAI leverages auto-approved operations for rapid research
const researchOperations = [
  { type: 'read_file', path: 'docs/websocket-patterns.md', auto: true },
  { type: 'analyze_code', target: 'existing-auth-patterns', auto: true },
  { type: 'list_directory', path: 'examples/', auto: true },
  { type: 'generate_docs', topic: 'real-time-architecture-options', auto: true }
]

// Execute 25+ auto-approved operations for comprehensive research
const researchResults = await openaiSession.batchAutoApprovedOperations(researchOperations)

Efficient Research with Auto-Approval:

{
  "auto_approved_operations": 28,
  "manual_approvals_required": 3,
  "research_efficiency": "90% operations auto-approved",
  "findings": {
    "websocket_solution": "Socket.io integration with existing Express server",
    "database_choice": "SQLite + real-time WebSocket bridge (simple deployment)",
    "frontend_integration": "Custom React hooks with Socket.io client"
  },
  "approval_requests": [
    "External API documentation fetch",
    "Performance benchmark data collection", 
    "Community examples repository clone"
  ]
}

Phase 3: PLAN with Intelligent Batch Planning

// OpenAI creates implementation plan optimized for selective auto-approval
await openaiSession.executePlanPhase({
  researchResults: autoApprovedResearchResults,
  planningMode: 'intelligent_batch_optimization',
  contextManagement: 'stateless_reconstruction'
})

Intelligent Implementation Plan:

{
  "system_architecture": {
    "websocket_integration": "Socket.io with existing Express server",
    "database_strategy": "SQLite with real-time WebSocket synchronization", 
    "frontend_optimization": "React hooks with optimistic updates and auto-approved analysis"
  },
  "batch_approval_strategy": {
    "auto_approved_operations": [
      "read_file", "analyze_code", "generate_documentation", "list_directory"
    ],
    "batch_approval_groups": [
      {
        "group": "database_setup",
        "operations": ["create schema.sql", "create models.js", "create migrations/"],
        "justification": "Database foundation setup"
      },
      {
        "group": "websocket_server", 
        "operations": ["create websocket-server.js", "modify server.js", "create socket-handlers/"],
        "justification": "Real-time server implementation"
      }
    ]
  },
  "efficiency_optimization": {
    "auto_approval_target": "70% of operations",
    "batch_reduction": "50% fewer approval cycles",
    "context_management": "Stateless session reconstruction with 95% accuracy"
  },
  "approval_checkpoints": [
    "Database schema and models approval",
    "WebSocket server implementation approval", 
    "Frontend components batch approval",
    "Integration testing and deployment approval"
  ]
}

Phase 4: ACTION with Batch Approval Optimization

// OpenAI optimizes approval workflow with batching
await openaiSession.batchOperations({
  operations: [
    {
      operation_type: 'write_file',
      operation_details: { path: 'server/websocket.js', content: '...' }
    },
    {
      operation_type: 'write_file', 
      operation_details: { path: 'client/useRealtime.js', content: '...' }
    },
    {
      operation_type: 'write_file',
      operation_details: { path: 'database/schema.sql', content: '...' }
    }
  ],
  batch_justification: 'Creating core real-time infrastructure files for task management'
})

// Mix auto-approved analysis with batch-approved creation

Optimized Implementation Results:

Batch approvals: 4 approval cycles vs 12 individual approvals
Auto-approved operations: 67% of all operations
Implementation efficiency: 40% reduction in approval overhead
Mixed workflow: Safe operations auto-approved, risky operations batched

Comparison Summary

Aspect	Claude Code	Gemini CLI	Grok CLI	OpenAI CLI
Deliberation	Native reasoning	Context file driven	Multi-round planning	Auto-approval analysis
Research	MCP tool integration	Extension-based	Parallel execution	Heavy auto-approval
Plan	Comprehensive specification	Context file synthesis	Multi-round strategy	Intelligent batch planning
Action (Code)	Artifact management	Extension integration	High-speed bulk generation	Batch approval optimization
Approval Efficiency	Standard workflow	Context-aware	Multi-round batching	Selective auto-approval
Unique Strengths	Built-in methodology	Enterprise features	Speed + parallel execution	Intelligence + efficiency
Implementation Time	Baseline (current)	8-12 weeks	10-14 weeks	12-16 weeks

Reusable Components Across All llms

1. Autonomous System Prompt Template

Four-phase workflow definition (DELIBERATION→RESEARCH→PLAN→ACTION)
Artifact management rules
State manifest requirements
Error ritual process

2. Approval Integration Pattern

interface UniversalApproval {
  requestPermission(operation: ToolOperation): Promise<ApprovalResult>
  batchOperations(operations: ToolOperation[]): Promise<BatchApprovalResult>  
  autoApproveOperation(operation: SafeOperation): Promise<OperationResult>
}

3. State Management Pattern

interface WorkflowState {
  phase: 'DELIBERATION' | 'RESEARCH' | 'PLAN' | 'ACTION'
  iteration: number
  activeArtifacts: string[]
  completedPhases: string[]
  implementationPlan?: ImplementationPlan
  nextSteps: string[]
  providerSpecific: Record<string, any>
}

4. Artifact Generation Rules

Size thresholds (>20 lines / >1500 chars)
One artifact per response
Update vs rewrite decision tree
Integration point documentation

This foundation enables consistent autonomous RESEARCH→PLAN→ACTION workflows across all llm providers while leveraging each platform's unique strengths and optimizations. The critical PLAN phase bridges research findings with actionable implementation specifications, ensuring comprehensive system design before code generation begins.

Overview​

Common Workflow Foundation​

Reusable .claude Components​

Autonomous Development System Prompt (Shared)​

Example Scenario: Building a Real-Time Task Management System​

Claude Code Implementation (Baseline)​

Phase 1: DELIBERATION​

Phase 2: RESEARCH​

Phase 3: PLAN​

Phase 4: ACTION (CODE)​

Gemini CLI Implementation​

Phase 1: DELIBERATION via GEMINI.md Context​

Analysis Framework​

ANALYSIS:​

DECOMPOSITION:​

RESEARCH NEEDS:​

Phase 2: RESEARCH via Extensions​

Phase 3: PLAN with Context File Synthesis​

Phase 4: ACTION with Context File Updates​

Grok CLI Implementation​

Phase 1: DELIBERATION with Multi-Round Planning​

Phase 2: RESEARCH with Parallel Execution​

Phase 3: PLAN with Multi-Round Strategy Planning​

Phase 4: ACTION with High-Speed Code Generation​

OpenAI CLI Implementation​

Phase 1: DELIBERATION with Auto-Approval​

Phase 2: RESEARCH with Heavy Auto-Approval​

Phase 3: PLAN with Intelligent Batch Planning​

Phase 4: ACTION with Batch Approval Optimization​

Comparison Summary​

Reusable Components Across All llms​

1. Autonomous System Prompt Template​

2. Approval Integration Pattern​

3. State Management Pattern​

4. Artifact Generation Rules​