Multi-llm Autonomous Workflow Examples: RESEARCH → PLAN → CODE
Analysis Date: 2025-10-14
Scope: Concrete examples of autonomous development workflows across Claude Code, Gemini CLI, Grok CLI, and OpenAI CLI
Overview
This document provides concrete examples of how the autonomous RESEARCH → PLAN → CODE workflow can be implemented across different llms while leveraging HumanLayer's .claude components and approval infrastructure.
Common Workflow Foundation
Reusable .claude Components
Autonomous Development System Prompt (Shared)
## 🤖 Autonomous Development Mode
### Four Operating Modes
- **DELIBERATION**: Analyze requirements, decompose tasks, NO code generation
- **RESEARCH**: Execute tool calls to verify assumptions and gather information
- **PLAN**: Synthesize research into detailed implementation specification
- **ACTION**: Generate working code artifacts with human approval gates
### Artifact Management Rules
- Size threshold: >20 lines OR >1500 characters becomes artifact
- One artifact per response to prevent state explosion
- Update vs rewrite: Max 4 updates before mandatory rewrite
- All artifacts include error handling and integration points
### State Manifest (Required in every response)
CURRENT STATE:
- Phase: [DELIBERATION|RESEARCH|PLAN|ACTION]
- Active artifact: [current file being worked on]
- Iteration: [count]
- Integration: [status of connections between components]
- Next: [planned next steps]
### Error Ritual (When issues occur)
1. STOP - Don't continue broken iteration
2. ANALYZE - What caused the error?
3. EXTRACT - What lesson prevents recurrence?
4. CLEAN - Remove false assumptions
5. RETRY - Fresh attempt with learned knowledge
Example Scenario: Building a Real-Time Task Management System
Requirements:
- Real-time collaboration features
- WebSocket connections
- Database persistence
- User authentication
- React frontend with TypeScript
- Node.js backend
Let's see how each llm handles this using the autonomous workflow.
Claude Code Implementation (Baseline)
Phase 1: DELIBERATION
// Claude Code starts with MCP-integrated deliberation
const session = await createSession({
provider: 'claude',
model: 'sonnet',
query: 'Build a real-time task management system with WebSocket collaboration',
mode: 'autonomous'
})
// Claude's built-in reasoning follows the system prompt:
Claude's Deliberation Response:
ANALYSIS:
- Core requirements: Real-time updates, task CRUD, user auth, collaborative editing
- Technical constraints: WebSocket scalability, database consistency, auth security
- Unknown factors: WebSocket library choice, database schema design, state management
DECOMPOSITION:
Phase 1: Database schema + auth foundation (User, Task, Session models)
Phase 2: WebSocket server + real-time event handling
Phase 3: React frontend + real-time state management
Phase 4: Integration testing + error handling + deployment
ARTIFACT STRATEGY:
- Primary artifact: Backend API server (~200 lines)
- Supporting artifacts: Database schema, React components, WebSocket handlers
- Integration points: REST API ↔ WebSocket ↔ Database ↔ Frontend
READY TO PROCEED: No - need to research WebSocket libraries and database options
Phase 2: RESEARCH
// Claude executes research via MCP tools with approval
await session.executeTool('request_permission', {
tool_name: 'web_search',
input: { query: 'Node.js WebSocket libraries 2024 Socket.io vs ws comparison' },
explanation: 'Research WebSocket library options for real-time features'
})
await session.executeTool('request_permission', {
tool_name: 'read_file',
input: { path: 'package.json' },
explanation: 'Check existing project dependencies'
})
Research Findings:
RESEARCH RESULTS:
- WebSocket Library: Socket.io chosen (better fallback support, room management)
- Database: PostgreSQL with Prisma ORM (better real-time subscriptions)
- Auth: JWT + bcrypt (standard, secure)
- Frontend State: Zustand + Socket.io client (lightweight, real-time friendly)
TECH DECISIONS:
- Backend: Node.js + Express + Socket.io + Prisma
- Frontend: React + TypeScript + Zustand + Socket.io-client
- Database: PostgreSQL with row-level security
Phase 3: PLAN
// Claude synthesizes research into implementation plan
await session.executePlanPhase({
researchFindings: researchResults,
requirements: originalRequirements,
constraints: technicalConstraints
})
Implementation Plan:
IMPLEMENTATION PLAN:
## System Architecture
- Real-time WebSocket server with Socket.io rooms
- PostgreSQL database with Prisma ORM and row-level security
- React frontend with Zustand state management
- JWT-based authentication with secure token management
## Phase Breakdown
Phase 1: Foundation (300 lines)
- Database schema and models (User, Task, Room)
- Authentication system with JWT
- Basic API structure with Express
Phase 2: Real-time Core (400 lines)
- WebSocket server with Socket.io
- Room management and user sessions
- Real-time event broadcasting system
Phase 3: Frontend Integration (350 lines)
- React components with real-time hooks
- Zustand store with WebSocket integration
- Task management UI with live updates
Phase 4: Polish (150 lines)
- Error handling and offline support
- Performance optimization
- Testing and deployment configuration
## Approval Gates
- Phase 1→2: Database schema and auth review
- Phase 2→3: WebSocket implementation and security audit
- Phase 3→4: UI/UX review and performance validation
- Phase 4: Final security and deployment review
## Success Criteria
- All real-time updates work across multiple clients
- Authentication is secure and performant
- UI is responsive and handles edge cases
- System scales to 100+ concurrent users
REQUEST APPROVAL TO PROCEED TO ACTION PHASE
Phase 4: ACTION (CODE)
// Claude generates code artifacts with approval gates
await session.executeArtifactCreation({
phase: 'ACTION',
approvalRequired: true,
artifacts: [
'database-schema.prisma',
'websocket-server.ts',
'task-api.ts',
'react-task-manager.tsx'
]
})
Generated Artifacts (with approval):
- Database schema with real-time subscriptions (Phase 1)
- WebSocket server with room management (Phase 2)
- REST API with authentication (Phase 1)
- React components with real-time updates (Phase 3)
Gemini CLI Implementation
Phase 1: DELIBERATION via GEMINI.md Context
# GEMINI.md - Task Management Project Context
## Current Session
```yaml
phase: "DELIBERATION"
iteration: 1
requirements: "Real-time task management with WebSocket collaboration"
extensions_loaded: ["humanlayer-approval", "web-research", "file-operations"]
Analysis Framework
Use the structured approach for autonomous development:
ANALYSIS:
Analyze the requirements for a real-time task management system. Identify core capabilities, technical constraints, and unknown factors.
DECOMPOSITION:
Break down into 4 implementation phases following autonomous methodology.
RESEARCH NEEDS:
Identify what technology choices need validation before proceeding.
Use humanlayer-approval extension for proceeding to research phase.
### Gemini's Deliberation Process:
```bash
gemini --context-file GEMINI.md --extensions humanlayer-approval,web-research
Gemini's Analysis:
ANALYSIS COMPLETE:
- Core requirements: Task CRUD + real-time sync + multi-user collaboration
- Technical constraints: WebSocket connection scaling, conflict resolution, offline support
- Unknown factors: Real-time database choice, WebSocket library performance, state sync strategy
DECOMPOSITION:
Phase 1: Foundation - User models, authentication, basic task CRUD
Phase 2: Real-time layer - WebSocket integration, event broadcasting
Phase 3: Frontend - React components with real-time state synchronization
Phase 4: Polish - Conflict resolution, offline support, performance optimization
RESEARCH PRIORITIES:
1. WebSocket library comparison (Socket.io vs native WebSockets vs alternatives)
2. Real-time database solutions (Supabase vs Firebase vs custom with PostgreSQL)
3. State synchronization patterns for collaborative editing
Requesting approval to proceed to RESEARCH phase via humanlayer-approval extension.
Phase 2: RESEARCH via Extensions
// Gemini uses extensions for research with approval integration
await gemini.useExtension('humanlayer-approval').requestPermission({
action_type: 'research_phase',
description: 'Research WebSocket libraries and real-time database solutions',
risk_level: 'low'
})
// Approved - proceed with research using web-research extension
await gemini.useExtension('web-research').searchAndAnalyze([
'Socket.io vs ws vs uws performance comparison 2024',
'Supabase real-time subscriptions vs PostgreSQL LISTEN/NOTIFY',
'React real-time state management patterns WebSockets'
])
Updated GEMINI.md Context:
phase: "RESEARCH"
iteration: 2
research_findings:
websocket_library: "Socket.io - better browser fallback and room management"
database: "Supabase - built-in real-time subscriptions and auth"
frontend_state: "Zustand + Supabase client for real-time subscriptions"
tech_decisions:
backend: "Node.js + Express + Socket.io + Supabase SDK"
frontend: "React + TypeScript + Zustand + Supabase client"
deployment: "Vercel + Supabase hosting"
next_phase: "PLAN"
Phase 3: PLAN with Context File Synthesis
// Gemini synthesizes research into implementation plan
await gemini.updateContext({
phase: 'PLAN',
research_complete: true,
planning_mode: 'detailed_specification'
})
Implementation Plan in GEMINI.md:
## Implementation Plan
### System Architecture
- Supabase backend with real-time subscriptions
- Socket.io for enhanced WebSocket features
- React frontend with optimistic updates
- Zustand for client-side state management
### Phase Implementation
Phase 1: Supabase Setup (200 lines)
- Database schema with RLS policies
- Authentication configuration
- Real-time subscription setup
Phase 2: Socket.io Integration (300 lines)
- WebSocket server with Supabase integration
- Room management and event broadcasting
- Conflict resolution for real-time updates
Phase 3: React Frontend (400 lines)
- Components with real-time hooks
- Zustand store with Supabase sync
- Optimistic UI updates with rollback
Phase 4: Integration Testing (100 lines)
- End-to-end testing setup
- Performance monitoring
- Deployment configuration
### Approval Strategy
- Context file updates after each phase
- Extension-based approval for file operations
- Batch approvals for related components
Phase 4: ACTION with Context File Updates
// Gemini generates code with context tracking
await gemini.useExtension('humanlayer-approval').requestPermission({
action_type: 'code_generation_phase',
description: 'Generate task management application with real-time features',
parameters: {
files_to_create: ['server.js', 'database.sql', 'TaskManager.tsx', 'useRealtimeTasks.ts'],
estimated_lines: 800
},
risk_level: 'medium'
})
// Context file updated automatically after each artifact
Generated with Extensions Integration:
- Supabase schema with real-time subscriptions
- Socket.io server with Supabase integration
- React hooks for real-time task synchronization
- Context file continuously updated with progress
Grok CLI Implementation
Phase 1: DELIBERATION with Multi-Round Planning
# grok-session-config.yaml
session:
model: "grok-4-latest"
max_tool_rounds: 400
high_speed_editing: true
deliberation_prompt: |
MULTI-ROUND AUTONOMOUS DEVELOPMENT
TOTAL ROUNDS: 400 available
TASK: Build real-time task management system
Plan round distribution:
- DELIBERATION: 20 rounds (5% budget)
- RESEARCH: 80 rounds (20% budget)
- ACTION: 300 rounds (75% budget)
Use multi-round capabilities for parallel research and high-speed code generation.
Grok's Multi-Round Deliberation:
ROUND BUDGET ALLOCATION:
- Deliberation: 20 rounds for analysis and planning
- Research: 80 rounds for parallel technology investigation
- Action: 300 rounds for high-speed implementation
MULTI-ROUND STRATEGY:
- Parallel research streams: WebSocket libs, databases, frontend patterns
- Bulk code generation using Morph Fast Apply
- Incremental integration testing with approval checkpoints
DECOMPOSITION WITH ROUND ESTIMATES:
Phase 1: Foundation (80 rounds) - Auth, models, basic API
Phase 2: Real-time (120 rounds) - WebSocket server, event handling
Phase 3: Frontend (80 rounds) - React components, real-time hooks
Phase 4: Integration (20 rounds) - Testing, optimization, deployment
APPROVAL STRATEGY:
- Batch approval for file operations (efficiency)
- Multi-round approval for complex operation sequences
- High-speed generation with approval checkpoints
Phase 2: RESEARCH with Parallel Execution
// Grok executes multiple research streams in parallel
await grokSession.executeMultiRound({
rounds: 80,
parallelStreams: [
{
topic: 'WebSocket libraries',
rounds: 25,
approach: 'comparative_analysis'
},
{
topic: 'Real-time databases',
rounds: 30,
approach: 'performance_benchmarks'
},
{
topic: 'Frontend real-time patterns',
rounds: 25,
approach: 'implementation_examples'
}
],
approvalRequired: true
})
High-Speed Research Results:
{
"rounds_used": 78,
"research_efficiency": "97.5%",
"findings": {
"websocket_choice": "Socket.io with clustering for scalability",
"database_solution": "Redis + PostgreSQL hybrid (Redis for real-time, PG for persistence)",
"frontend_pattern": "Optimistic updates with conflict resolution"
},
"parallel_execution_benefit": "3x faster than sequential research"
}
Phase 3: PLAN with Multi-Round Strategy Planning
// Grok creates comprehensive implementation plan optimized for 400-round execution
await grokSession.executePlanPhase({
researchResults: highSpeedResearchResults,
totalRounds: 400,
remainingRounds: 272, // 400 - 128 used in deliberation/research
planningBudget: 50
})
Multi-Round Implementation Plan:
{
"system_architecture": {
"websocket_clustering": "Socket.io with Redis adapter for horizontal scaling",
"database_hybrid": "PostgreSQL for persistence + Redis for real-time cache",
"frontend_optimization": "React with concurrent features for high-performance updates"
},
"multi_round_strategy": {
"total_rounds": 400,
"phase_allocation": {
"foundation": 80,
"real_time_core": 120,
"frontend_integration": 80,
"performance_optimization": 40
},
"parallel_execution_opportunities": [
"Database migration + Socket.io server setup",
"Frontend component generation + API endpoint creation",
"Testing setup + deployment configuration"
]
},
"high_speed_targets": [
{
"component": "database_models",
"target_speed": "5000+ tokens/sec",
"estimated_rounds": 15
},
{
"component": "react_components",
"target_speed": "4500+ tokens/sec",
"estimated_rounds": 25
}
],
"approval_strategy": {
"batch_approvals": "Group file operations by component",
"multi_round_context": "Track approval context across 300+ rounds",
"efficiency_target": "Maximum 5 approval cycles for entire implementation"
}
}
Phase 4: ACTION with High-Speed Code Generation
// Grok uses Morph Fast Apply for bulk code generation
await grokSession.requestApproval({
operation: 'high_speed_code_generation',
details: {
target_speed: '4500+ tokens/sec',
files_to_generate: 12,
estimated_total_lines: 1200,
multi_round_sequence: [
'Database schemas and migrations',
'WebSocket server with clustering',
'REST API with authentication',
'React components with real-time hooks',
'Integration tests and deployment config'
]
},
multi_round_context: {
total_rounds_planned: 300,
batch_generation: true
}
})
// Execute high-speed generation with Morph Fast Apply
const generationResult = await grokSession.executeHighSpeedGeneration({
morphFastApply: true,
targetSpeed: 4500, // tokens/sec
batchSize: 'optimal',
approvalCheckpoints: [100, 200, 280] // rounds
})
High-Speed Generation Results:
- Speed achieved: 4,650 tokens/sec
- Files generated: 12 complete files
- Approval efficiency: 3 batch approvals vs 12 individual approvals
- Round utilization: 287/300 rounds (95.7% efficiency)
OpenAI CLI Implementation
Phase 1: DELIBERATION with Auto-Approval
# openai-config.yaml
auto_approval:
safe_operations: ["read_file", "analyze_code", "list_directory", "generate_docs"]
requires_approval: ["write_file", "execute_command", "install_package"]
deliberation_prompt: |
AUTONOMOUS DEVELOPMENT with SELECTIVE AUTO-APPROVAL
TASK: Real-time task management system
Use execute_operation function for all operations.
Safe operations (read, analyze, document) will auto-approve.
Write operations require human approval.
OpenAI's Selective Approval Deliberation:
// Auto-approved operations during deliberation
await openaiSession.executeOperation({
operation_type: 'analyze_code',
operation_details: { directory: './project' },
risk_assessment: 'safe',
justification: 'Analyzing existing project structure'
})
await openaiSession.executeOperation({
operation_type: 'read_file',
operation_details: { path: 'package.json' },
risk_assessment: 'safe',
justification: 'Understanding current dependencies'
})
Deliberation with Auto-Approved Analysis:
ANALYSIS (using 15 auto-approved operations):
- Existing codebase: React + Node.js foundation already present
- Current dependencies: Express, React 18, TypeScript configured
- Missing components: WebSocket support, real-time state management
- Architecture gaps: No database integration, no authentication
DECOMPOSITION:
Phase 1: Add database models and authentication (requires approval for new files)
Phase 2: WebSocket server integration (requires approval for server modifications)
Phase 3: Real-time React components (mix of auto-approved analysis + approved writes)
Phase 4: Testing and deployment (requires approval for config changes)
AUTO-APPROVAL STRATEGY:
- Analysis and documentation: Auto-approved for speed
- Code reading and exploration: Auto-approved
- File creation and modification: Requires approval
- Package installation: Requires approval
Phase 2: RESEARCH with Heavy Auto-Approval
// OpenAI leverages auto-approved operations for rapid research
const researchOperations = [
{ type: 'read_file', path: 'docs/websocket-patterns.md', auto: true },
{ type: 'analyze_code', target: 'existing-auth-patterns', auto: true },
{ type: 'list_directory', path: 'examples/', auto: true },
{ type: 'generate_docs', topic: 'real-time-architecture-options', auto: true }
]
// Execute 25+ auto-approved operations for comprehensive research
const researchResults = await openaiSession.batchAutoApprovedOperations(researchOperations)
Efficient Research with Auto-Approval:
{
"auto_approved_operations": 28,
"manual_approvals_required": 3,
"research_efficiency": "90% operations auto-approved",
"findings": {
"websocket_solution": "Socket.io integration with existing Express server",
"database_choice": "SQLite + real-time WebSocket bridge (simple deployment)",
"frontend_integration": "Custom React hooks with Socket.io client"
},
"approval_requests": [
"External API documentation fetch",
"Performance benchmark data collection",
"Community examples repository clone"
]
}
Phase 3: PLAN with Intelligent Batch Planning
// OpenAI creates implementation plan optimized for selective auto-approval
await openaiSession.executePlanPhase({
researchResults: autoApprovedResearchResults,
planningMode: 'intelligent_batch_optimization',
contextManagement: 'stateless_reconstruction'
})
Intelligent Implementation Plan:
{
"system_architecture": {
"websocket_integration": "Socket.io with existing Express server",
"database_strategy": "SQLite with real-time WebSocket synchronization",
"frontend_optimization": "React hooks with optimistic updates and auto-approved analysis"
},
"batch_approval_strategy": {
"auto_approved_operations": [
"read_file", "analyze_code", "generate_documentation", "list_directory"
],
"batch_approval_groups": [
{
"group": "database_setup",
"operations": ["create schema.sql", "create models.js", "create migrations/"],
"justification": "Database foundation setup"
},
{
"group": "websocket_server",
"operations": ["create websocket-server.js", "modify server.js", "create socket-handlers/"],
"justification": "Real-time server implementation"
}
]
},
"efficiency_optimization": {
"auto_approval_target": "70% of operations",
"batch_reduction": "50% fewer approval cycles",
"context_management": "Stateless session reconstruction with 95% accuracy"
},
"approval_checkpoints": [
"Database schema and models approval",
"WebSocket server implementation approval",
"Frontend components batch approval",
"Integration testing and deployment approval"
]
}
Phase 4: ACTION with Batch Approval Optimization
// OpenAI optimizes approval workflow with batching
await openaiSession.batchOperations({
operations: [
{
operation_type: 'write_file',
operation_details: { path: 'server/websocket.js', content: '...' }
},
{
operation_type: 'write_file',
operation_details: { path: 'client/useRealtime.js', content: '...' }
},
{
operation_type: 'write_file',
operation_details: { path: 'database/schema.sql', content: '...' }
}
],
batch_justification: 'Creating core real-time infrastructure files for task management'
})
// Mix auto-approved analysis with batch-approved creation
Optimized Implementation Results:
- Batch approvals: 4 approval cycles vs 12 individual approvals
- Auto-approved operations: 67% of all operations
- Implementation efficiency: 40% reduction in approval overhead
- Mixed workflow: Safe operations auto-approved, risky operations batched
Comparison Summary
| Aspect | Claude Code | Gemini CLI | Grok CLI | OpenAI CLI |
|---|---|---|---|---|
| Deliberation | Native reasoning | Context file driven | Multi-round planning | Auto-approval analysis |
| Research | MCP tool integration | Extension-based | Parallel execution | Heavy auto-approval |
| Plan | Comprehensive specification | Context file synthesis | Multi-round strategy | Intelligent batch planning |
| Action (Code) | Artifact management | Extension integration | High-speed bulk generation | Batch approval optimization |
| Approval Efficiency | Standard workflow | Context-aware | Multi-round batching | Selective auto-approval |
| Unique Strengths | Built-in methodology | Enterprise features | Speed + parallel execution | Intelligence + efficiency |
| Implementation Time | Baseline (current) | 8-12 weeks | 10-14 weeks | 12-16 weeks |
Reusable Components Across All llms
1. Autonomous System Prompt Template
- Four-phase workflow definition (DELIBERATION→RESEARCH→PLAN→ACTION)
- Artifact management rules
- State manifest requirements
- Error ritual process
2. Approval Integration Pattern
interface UniversalApproval {
requestPermission(operation: ToolOperation): Promise<ApprovalResult>
batchOperations(operations: ToolOperation[]): Promise<BatchApprovalResult>
autoApproveOperation(operation: SafeOperation): Promise<OperationResult>
}
3. State Management Pattern
interface WorkflowState {
phase: 'DELIBERATION' | 'RESEARCH' | 'PLAN' | 'ACTION'
iteration: number
activeArtifacts: string[]
completedPhases: string[]
implementationPlan?: ImplementationPlan
nextSteps: string[]
providerSpecific: Record<string, any>
}
4. Artifact Generation Rules
- Size thresholds (>20 lines / >1500 chars)
- One artifact per response
- Update vs rewrite decision tree
- Integration point documentation
This foundation enables consistent autonomous RESEARCH→PLAN→ACTION workflows across all llm providers while leveraging each platform's unique strengths and optimizations. The critical PLAN phase bridges research findings with actionable implementation specifications, ensuring comprehensive system design before code generation begins.