TaskExecutor Refactoring Project Plan

TaskExecutor Refactoring - Project Plan

Document Version: 1.0 Last Updated: 2025-11-23 Document Owner: Hal Casteel, CEO/CTO, AZ1.AI INC. Project Type: Strategic Refactoring (Foundation for Phase 1 Autonomous Agents) Status: PLANNING - Ready for Go/No-Go Decision

Executive Summary

This project-plan.md provides the comprehensive strategy for refactoring the TaskExecutor to use the llm_abstractions layer directly instead of subprocess-based execute_*.py scripts. This strategic refactoring eliminates process overhead, enables async orchestration, and lays the foundation for Phase 1 Message Bus autonomous agents.

Project Mission: Transform TaskExecutor from subprocess-based execution to direct LLM abstraction layer, achieving 30-50% performance improvement while maintaining 100% backward compatibility and enabling future autonomous agent communication.

Current Status Overview

Metric	Current State	Target
Project Start Date	TBD (pending go/no-go)	-
Estimated Duration	3-4 weeks	80 engineering hours
Current Architecture	Subprocess-based (`execute_*.py`)	Direct LLM abstraction
Performance Baseline	Subprocess overhead + JSON I/O	30-50% improvement
Backward Compatibility	N/A	100% (dual-mode executor)
LLM Providers Ready	Gemini (placeholder)	Anthropic, OpenAI, Gemini
Foundation Status	`BaseLlm` abstract class exists	Factory + implementations
Test Coverage	0% for llm_abstractions	90%+ for new code
Next Milestone	Go/No-Go Decision	TBD

Key Deliverables

Deliverable	Description	Status
LlmFactory	Dynamic provider loading with AgentType mapping	📅 Planned
AnthropicLlm	Official Anthropic SDK implementation	📅 Planned
OpenAILlm	AsyncOpenAI SDK implementation	📅 Planned
GeminiLlm	Complete Gemini implementation (replace placeholder)	📅 Planned
Dual-Mode Executor	`use_direct_llm` flag with graceful fallback	📅 Planned
Migration Guide	Script deprecation timeline and adoption guide	📅 Planned
Test Suite	90%+ coverage for new code	📅 Planned
Performance Benchmarks	Validation of 30%+ improvement	📅 Planned

Budget & Investment

Phase	Budget	Hours	Status
Phase 1A: Foundation	$2,000	16 hours	📅 Planned
Phase 1B: Dual-Mode Executor	$2,000	16 hours	📅 Planned
Phase 2A: OpenAI Implementation	$2,000	16 hours	📅 Planned
Phase 2B: Gemini Implementation	$2,000	16 hours	📅 Planned
Phase 3: Script Deprecation	$2,000	16 hours	📅 Planned
Total Investment	$10,000	80 hours	Through Week 4

Engineering Rate: $125/hour (senior Python developer) ROI: Performance improvements reduce LLM orchestration latency by 30-50%, enabling future autonomous agent workflows.

Project Overview
Strategic Context & Rationale
User Feedback: Async Executor Alignment ← NEW
Technical Architecture
Implementation Phases
Multi-Agent Orchestration Strategy
Quality Gates & Success Criteria
Risk Management
Backward Compatibility Strategy
Testing Strategy
Performance Benchmarks
Migration & Deprecation Plan
Agent Assignment Matrix
Budget Breakdown
Timeline & Dependencies
Monitoring & Metrics
Go/No-Go Decision Criteria
Architecture Decision Record ← NEW

Project Overview

Purpose

The TaskExecutor Refactoring project modernizes the task execution architecture by replacing subprocess-based execute_*.py scripts with direct llm_abstractions layer integration. This refactoring is foundational for Phase 1 autonomous agent communication via Message Bus.

Strategic Context

Current Architecture (Subprocess-Based)

TaskExecutor → subprocess → execute_claude.py → JSON I/O → Display Command
TaskExecutor → subprocess → execute_gpt.py → OpenAI SDK → API Call → Result
TaskExecutor → subprocess → execute_gemini.py → Placeholder → Result

Problems with Current Architecture

❌ Process Overhead: Each task spawns subprocess (100-200ms latency)
❌ No Async Support: Synchronous subprocess.run() blocks orchestration
❌ Duplicate Logic: Task validation + prompt generation duplicated across scripts
❌ Hard to Test: Subprocess testing requires integration tests
❌ Not Future-Ready: Phase 1 Message Bus requires async LLM calls

Target Architecture (Direct LLM Abstraction)

TaskExecutor → LlmFactory.get_provider(agent_type) → BaseLlm.generate_content_async()
              ↓
         AnthropicLlm (official SDK) → Claude API
         OpenAILlm (AsyncOpenAI SDK) → GPT-4 API
         GeminiLlm (google-generativeai) → Gemini API

Benefits of Target Architecture

✅ 30-50% Performance Improvement: Eliminate subprocess overhead
✅ Async Orchestration: Enable concurrent task execution
✅ Single Source of Truth: Unified LLM abstraction layer
✅ Easier Testing: Mock BaseLlm implementations
✅ Future-Ready: Foundation for Phase 1 Message Bus autonomous agents

Project Objectives

Primary Objectives

Implement LlmFactory with dynamic provider loading based on AgentType
Complete BaseLlm Implementations for Anthropic (Claude), OpenAI (GPT), Google (Gemini)
Add Dual-Mode Executor with use_direct_llm feature flag and graceful fallback
Achieve 30-50% Performance Improvement over subprocess approach
Maintain 100% Backward Compatibility during migration period

Secondary Objectives

Deprecate execute_*.py Scripts gracefully over 6 months
Achieve 90%+ Test Coverage for new llm_abstractions code
Document Migration Path for future CODITECT users
Establish Foundation for Phase 1 Message Bus autonomous agents

Success Criteria

Technical Success

✅ All LLM providers (Anthropic, OpenAI, Gemini) working via LlmFactory
✅ Performance benchmarks show 30%+ improvement over subprocess approach
✅ Test coverage ≥90% for llm_abstractions module
✅ Zero breaking changes (dual-mode ensures rollback safety)
✅ Async/sync compatibility layer functional

Process Success

✅ All quality gates passed (see Section 6)
✅ Migration guide published with script deprecation timeline
✅ Deprecation warnings added to execute_*.py scripts
✅ Documentation updated (architecture, API, developer guides)

Business Success

✅ Timeline: Complete in ≤4 weeks (80 engineering hours)
✅ Cost: ≤$10,000 (within allocated budget)
✅ Risk: Zero production incidents during migration
✅ Foundation: Phase 1 Message Bus can build on this architecture

Strategic Context & Rationale

Why This Refactoring Matters

1. Foundation for Phase 1 Autonomous Agents

The current Phase 1 roadmap (see docs/03-project-planning/project-plan.md) requires:

Message Bus (RabbitMQ): Inter-agent task passing with priority queues
Agent Discovery Service (Redis): Capability-based agent discovery
Task Queue Manager (Redis + RQ): Persistent queue with dependency resolution

All of these require async LLM calls. The current subprocess-based architecture blocks async orchestration, making autonomous agent-to-agent communication impossible without this refactoring.

2. Performance Bottleneck Elimination

Current subprocess overhead profile:

Task submission: ~5ms
Subprocess spawn: ~100-200ms  ← BOTTLENECK
JSON serialization: ~10-20ms
Script execution: ~50-100ms
Result parsing: ~10-20ms
---
Total overhead: ~180-350ms per task

```yaml

With direct LLM abstraction:

```yaml
Task submission: ~5ms
LlmFactory lookup: ~1ms  ← FAST
BaseLlm.generate_content_async(): ~50-100ms
Result parsing: ~10-20ms
---
Total overhead: ~66-126ms per task

Net improvement: 30-50% reduction in orchestration latency

3. Code Quality & Maintainability

Current architecture has:

Duplicate logic: Task validation repeated in every execute_*.py script
Inconsistent error handling: Each script implements own retry logic
Hard to test: Subprocess testing requires complex mocking
No type safety: JSON I/O loses Python type information

Target architecture provides:

Single source of truth: BaseLlm abstract class enforces interface
Centralized error handling: Factory manages provider availability
Easy testing: Mock BaseLlm implementations for unit tests
Full type safety: Python type hints throughout

4. Alignment with Industry Best Practices

Modern LLM orchestration frameworks (LangGraph, CrewAI, AutoGen) use:

✅ Direct SDK integration (not subprocess scripts)
✅ Async/await for concurrent execution
✅ Factory pattern for provider abstraction
✅ Unified error handling and retry logic

This refactoring aligns CODITECT with these industry standards.

Alternative Approaches Considered

Approach	Pros	Cons	Decision
Keep subprocess scripts	No code changes	Performance bottleneck remains, blocks Phase 1	❌ REJECTED
Refactor scripts in-place	Simpler migration	Still subprocess overhead, not async-friendly	❌ REJECTED
Direct LLM abstraction (chosen)	Performance + async + Phase 1 foundation	Requires refactoring effort	✅ SELECTED
Third-party framework (LangChain)	Community support	Heavy dependencies, not CODITECT-aligned	❌ REJECTED

Dependencies on Other Work

Prerequisites (Already Complete)

✅ BaseLlm abstract class exists (llm_abstractions/base_llm.py)
✅ Gemini placeholder implementation exists (needs completion)
✅ AgentRegistry with AgentType enum defined
✅ TaskExecutor with execution modes (interactive, API, hybrid)

Enables Future Work (Phase 1)

🔜 Message Bus Implementation: Async LLM calls required for agent-to-agent communication
🔜 Agent Discovery Service: Dynamic provider selection via factory pattern
🔜 Task Queue Manager: Concurrent task execution with async/await

No Blocking Dependencies: This refactoring can proceed immediately after go/no-go decision.

User Feedback: Async Executor Alignment

User Request

"I agree with your analysis. Please explicitly detail making TaskExecutor.execute async to align seamlessly with ProjectOrchestrator.execute_task's async nature."

Strategic Decision: Make TaskExecutor.execute() Async

APPROVED: Based on user feedback, we will make TaskExecutor.execute() an async method as part of this refactoring project.

Rationale

Seamless Integration: ProjectOrchestrator.execute_task() will become async, enabling end-to-end async flow without asyncio.run() wrappers.
Eliminates Async/Sync Boundary: Current architecture uses asyncio.run() in _execute_via_llm(), which:
- Creates new event loop for each task (~10-20ms overhead)
- Blocks async orchestration
- Prevents parallel task execution
- Makes Phase 1 Message Bus impossible

Enables Parallel Execution: Async executor enables concurrent LLM API calls:

# Current (Sequential): 6 seconds for 3 tasks
Task 1: |---LLM API (2s)---|
Task 2:                     |---LLM API (2s)---|
Task 3:                                         |---LLM API (2s)---|

# Target (Parallel): 2 seconds for 3 tasks
Task 1: |---LLM API (2s)---|
Task 2: |---LLM API (2s)---|
Task 3: |---LLM API (2s)---|

4. **Phase 1 Foundation:** Message Bus autonomous agents **require** async executor for agent-to-agent coordination.

### Method Signature Changes

#### Current (Sync)

```python
def execute(
    self,
    task: AgentTask,
    agent: Optional[str] = None,
    mode: Optional[str] = None
) -> ExecutionResult:
    """Execute a single task using specified agent."""
    # Synchronous execution with asyncio.run() wrapper
    ...

Target (Async)

async def execute(
    self,
    task: AgentTask,
    agent: Optional[str] = None,
    mode: Optional[str] = None
) -> ExecutionResult:
    """
    Execute a single task using specified agent.

    Note:
        This method is async to enable concurrent task execution
        and seamless integration with async LLM providers.
    """
    # Direct async execution, no wrappers needed
    ...

Cascading Changes

TaskExecutor._execute_via_llm() → async
TaskExecutor._execute_api() → async
ProjectOrchestrator.execute_task() → async
All callers → use await executor.execute()
All tests → migrate to pytest-asyncio

Performance Impact

Baseline (Current)

Single task overhead: 180-350ms (subprocess) → 66-126ms (direct LLM)
Improvement: 30-50%

With Async (Target)

Single task overhead: 66-126ms (same)
Parallel tasks (3): 6s (sequential) → 2s (concurrent)
Improvement: 30-50% (single) + 3x (parallel)

Budget Impact

Original Phase 1B: 16 hours, $2,000 Updated Phase 1B (Async): 22 hours, $2,750 Delta: +6 hours, +$750

Total Project Budget

Original: $10,000 (80 hours)
Updated: $10,750 (86 hours)
Delta: +$750 (7.5% increase)

Justification: The $750 investment enables:

End-to-end async flow (eliminates event loop overhead)
Parallel task execution (3x speedup)
Phase 1 Message Bus foundation (enables $100K+ autonomous agent implementation)
ROI: 10x+

Timeline Impact

Original Timeline: 4 weeks (80 hours) Updated Timeline: 4.5 weeks (86 hours) Delta: +3 days (concentrated in Phase 1B)

Week 1 (Updated)

Days 1-2: Phase 1A (Foundation) - 16 hours
Days 3-5.5: Phase 1B (Async Executor) - 22 hours

Weeks 2-4: No changes (providers already async-compatible)

Architecture Decision Record

See: ADR-001: Async TaskExecutor Refactoring

This architectural decision is formally documented in ADR-001, which provides:

Complete analysis of alternatives considered
Detailed consequences (positive and negative)
Implementation validation checklist
Rollback procedures
Cross-references to related documents

Key ADR Sections

Decision Outcome: Make TaskExecutor.execute() async (7 methods total)
Rationale: User alignment + Phase 1 foundation + 3x performance improvement
Risks: Breaking change, test migration, +$750 budget
Mitigation: Migration guide, rollback plan, comprehensive testing

For complete details, see

docs/ASYNC-EXECUTOR-STRATEGIC-PLAN.md (62KB comprehensive analysis)
docs/02-architecture/adrs/ADR-001-async-task-executor-refactoring.md (formal ADR)

Decision Status: ✅ APPROVED (based on user feedback) Implementation Phase: Phase 1B (Week 1, Days 3-5.5) Budget Approved: $10,750 (up from $10,000) Timeline Approved: 4.5 weeks (up from 4 weeks)

Technical Architecture

Current Architecture (Subprocess-Based)

TaskExecutor._execute_api()
    ↓
_execute_via_script()
    ↓
subprocess.run([sys.executable, "execute_gpt.py"], input=json.dumps(task))
    ↓
execute_gpt.py
    ├── Read task from stdin (JSON)
    ├── Validate task spec
    ├── Import OpenAI SDK
    ├── Call GPT-4 API
    └── Output result to stdout (JSON)
    ↓
Parse subprocess output
    ↓
Return ExecutionResult

Problems

Each task spawns new Python process (~100-200ms overhead)
JSON serialization/deserialization overhead
No async support (subprocess.run() is synchronous)
Error handling scattered across multiple scripts

Target Architecture (Direct LLM Abstraction)

TaskExecutor._execute_via_llm()
    ↓
LlmFactory.get_provider(agent_type)
    ├── AgentType.ANTHROPIC_CLAUDE → AnthropicLlm
    ├── AgentType.OPENAI_GPT → OpenAILlm
    └── AgentType.GOOGLE_GEMINI → GeminiLlm
    ↓
provider.generate_content_async(messages, **kwargs)
    ↓
Return ExecutionResult

Benefits

Direct in-process LLM call (no subprocess overhead)
Native Python objects (no JSON serialization)
Async/await support for concurrent execution
Centralized error handling in factory + base class

Component Design

1. LlmFactory (NEW)

File: llm_abstractions/factory.py

Responsibilities

Map AgentType enum to BaseLlm implementations
Dynamic provider loading (import only when needed)
Provider availability checking (API keys, dependencies)
Error handling for missing providers

Interface

class LlmFactory:
    """Factory for creating LLM provider instances."""

    @staticmethod
    def get_provider(
        agent_type: AgentType,
        model: Optional[str] = None,
        **kwargs: Any
    ) -> BaseLlm:
        """
        Get LLM provider instance for agent type.

        Args:
            agent_type: Type of LLM (Claude, GPT, Gemini, etc.)
            model: Specific model name (optional)
            **kwargs: Additional provider configuration

        Returns:
            BaseLlm instance configured for agent type

        Raises:
            ValueError: If agent type not supported
            ImportError: If provider dependencies not installed
            RuntimeError: If API key not configured
        """
        pass

    @staticmethod
    def list_available_providers() -> List[AgentType]:
        """List LLM providers available (API keys + dependencies)."""
        pass

    @staticmethod
    def is_provider_available(agent_type: AgentType) -> bool:
        """Check if provider is available for use."""
        pass

Implementation Strategy

# Provider registry (lazy loading)
_PROVIDER_REGISTRY = {
    AgentType.ANTHROPIC_CLAUDE: {
        "class": "AnthropicLlm",
        "module": "llm_abstractions.anthropic",
        "env_var": "ANTHROPIC_API_KEY",
        "dependencies": ["anthropic"],
    },
    AgentType.OPENAI_GPT: {
        "class": "OpenAILlm",
        "module": "llm_abstractions.openai",
        "env_var": "OPENAI_API_KEY",
        "dependencies": ["openai"],
    },
    AgentType.GOOGLE_GEMINI: {
        "class": "GeminiLlm",  # Update existing Gemini class
        "module": "llm_abstractions.gemini",
        "env_var": "GOOGLE_API_KEY",
        "dependencies": ["google-generativeai"],
    },
}

def get_provider(agent_type, model=None, **kwargs):
    # 1. Check provider registry
    if agent_type not in _PROVIDER_REGISTRY:
        raise ValueError(f"Unsupported agent type: {agent_type}")

    provider_config = _PROVIDER_REGISTRY[agent_type]

    # 2. Check API key
    api_key = os.getenv(provider_config["env_var"])
    if not api_key:
        raise RuntimeError(
            f"{provider_config['env_var']} environment variable not set"
        )

    # 3. Dynamic import
    module = importlib.import_module(provider_config["module"])
    provider_class = getattr(module, provider_config["class"])

    # 4. Instantiate provider
    return provider_class(model=model, api_key=api_key, **kwargs)

2. AnthropicLlm (NEW)

File: llm_abstractions/anthropic.py

Responsibilities

Implement BaseLlm interface for Anthropic Claude
Use official anthropic Python SDK (v0.39+)
Support async/await with AsyncAnthropic
Handle Claude-specific features (system prompts, tool use, etc.)

Interface

class AnthropicLlm(BaseLlm):
    """Anthropic Claude LLM implementation."""

    def __init__(
        self,
        model: str = "claude-sonnet-4-5-20250929",
        api_key: Optional[str] = None,
        **kwargs: Any
    ):
        """
        Initialize Anthropic Claude provider.

        Args:
            model: Claude model name (default: latest Sonnet)
            api_key: Anthropic API key (or use ANTHROPIC_API_KEY env var)
            **kwargs: Additional SDK configuration
        """
        self.model = model
        self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
        if not self.api_key:
            raise ValueError("ANTHROPIC_API_KEY environment variable not set")

        # Initialize AsyncAnthropic client
        self.client = AsyncAnthropic(api_key=self.api_key)

    async def generate_content_async(
        self,
        messages: List[Dict[str, str]],
        **kwargs: Any
    ) -> str:
        """
        Generate content using Claude.

        Args:
            messages: Conversation history (OpenAI format)
            **kwargs: Additional parameters (max_tokens, temperature, etc.)

        Returns:
            Generated content as string
        """
        # Convert OpenAI format to Claude format (system + messages)
        system_prompt = self._extract_system_prompt(messages)
        claude_messages = self._convert_to_claude_format(messages)

        # Call Claude API
        response = await self.client.messages.create(
            model=self.model,
            system=system_prompt,
            messages=claude_messages,
            max_tokens=kwargs.get("max_tokens", 4096),
            temperature=kwargs.get("temperature", 0.7),
        )

        # Extract text content
        return response.content[0].text

Key Implementation Details

Use AsyncAnthropic for async support
Convert OpenAI message format to Claude format (system + messages)
Handle Claude-specific parameters (system prompt separate from messages)
Extract text content from Claude response format

3. OpenAILlm (NEW)

File: llm_abstractions/openai_llm.py

Responsibilities

Implement BaseLlm interface for OpenAI GPT
Use official openai Python SDK (v1.99+)
Support async/await with AsyncOpenAI
Handle GPT-specific features (function calling, JSON mode, etc.)
Support GPT-5.1-Codex-Max for specialized coding tasks (Nov 2025 release)
- Multi-hour agent loops and project-scale refactors
- Natively trained across multiple context windows (millions of tokens)
- 30% fewer thinking tokens with better performance

Interface

class OpenAILlm(BaseLlm):
    """OpenAI GPT LLM implementation."""

    # Supported models
    SUPPORTED_MODELS = [
        "gpt-4o",                # General-purpose (default)
        "gpt-4",                 # Previous generation
        "gpt-5.1-codex-max",     # Specialized coding (Nov 2025)
        "gpt-4-turbo",           # Fast variant
    ]

    def __init__(
        self,
        model: str = "gpt-4o",
        api_key: Optional[str] = None,
        **kwargs: Any
    ):
        """
        Initialize OpenAI GPT provider.

        Args:
            model: GPT model name
                   - "gpt-4o" (default): General-purpose latest
                   - "gpt-5.1-codex-max": Specialized for coding tasks,
                     multi-hour agent loops, project-scale refactors
                   - "gpt-4", "gpt-4-turbo": Alternative models
            api_key: OpenAI API key (or use OPENAI_API_KEY env var)
            **kwargs: Additional SDK configuration
        """
        self.model = model
        self.api_key = api_key or os.getenv("OPENAI_API_KEY")
        if not self.api_key:
            raise ValueError("OPENAI_API_KEY environment variable not set")

        # Initialize AsyncOpenAI client (2025 SDK)
        self.client = AsyncOpenAI(api_key=self.api_key)

    async def generate_content_async(
        self,
        messages: List[Dict[str, str]],
        **kwargs: Any
    ) -> str:
        """
        Generate content using GPT.

        Args:
            messages: Conversation history (OpenAI format)
            **kwargs: Additional parameters (max_tokens, temperature, etc.)

        Returns:
            Generated content as string
        """
        # Call GPT API (native OpenAI format)
        response = await self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            max_tokens=kwargs.get("max_tokens", 4000),
            temperature=kwargs.get("temperature", 0.7),
        )

        # Extract text content
        return response.choices[0].message.content

Key Implementation Details

Use AsyncOpenAI for async support (2025 SDK)
Messages already in OpenAI format (no conversion needed)
Standard GPT parameters (max_tokens, temperature)
Extract content from choices[0].message.content

4. GeminiLlm (UPDATE EXISTING)

File: llm_abstractions/gemini.py (replace placeholder)

Responsibilities

Complete the existing placeholder implementation
Use google-generativeai Python SDK
Support async/await
Handle Gemini-specific features

Interface

class GeminiLlm(BaseLlm):
    """Google Gemini LLM implementation."""

    def __init__(
        self,
        model: str = "gemini-1.5-pro",
        api_key: Optional[str] = None,
        **kwargs: Any
    ):
        """
        Initialize Google Gemini provider.

        Args:
            model: Gemini model name (default: Gemini 1.5 Pro)
            api_key: Google API key (or use GOOGLE_API_KEY env var)
            **kwargs: Additional SDK configuration
        """
        self.model = model
        self.api_key = api_key or os.getenv("GOOGLE_API_KEY")
        if not self.api_key:
            raise ValueError("GOOGLE_API_KEY environment variable not set")

        # Initialize Gemini SDK
        import google.generativeai as genai
        genai.configure(api_key=self.api_key)
        self.client = genai.GenerativeModel(model)

    async def generate_content_async(
        self,
        messages: List[Dict[str, str]],
        **kwargs: Any
    ) -> str:
        """
        Generate content using Gemini.

        Args:
            messages: Conversation history (OpenAI format)
            **kwargs: Additional parameters

        Returns:
            Generated content as string
        """
        # Convert OpenAI format to Gemini format
        gemini_messages = self._convert_to_gemini_format(messages)

        # Call Gemini API
        response = await self.client.generate_content_async(
            gemini_messages,
            generation_config={
                "temperature": kwargs.get("temperature", 0.7),
                "max_output_tokens": kwargs.get("max_tokens", 2048),
            }
        )

        # Extract text content
        return response.text

Key Implementation Details

Use google-generativeai SDK with async support
Convert OpenAI message format to Gemini format
Handle Gemini-specific configuration
Extract text from response.text

5. TaskExecutor Updates (MODIFY EXISTING)

File: orchestration/executor.py

New Method: _execute_via_llm()

def _execute_via_llm(
    self,
    task: AgentTask,
    agent_config: AgentConfig,
    result: ExecutionResult
) -> ExecutionResult:
    """
    Execute task via direct LLM abstraction layer.

    Uses LlmFactory to get provider, then calls generate_content_async().
    This is the new high-performance execution path.

    Args:
        task: Task to execute
        agent_config: Agent configuration
        result: Execution result (in progress)

    Returns:
        Updated ExecutionResult
    """
    result.status = ExecutionStatus.IN_PROGRESS

    try:
        # Get LLM provider via factory
        from llm_abstractions.factory import LlmFactory

        provider = LlmFactory.get_provider(
            agent_type=agent_config.agent_type,
            model=task.metadata.get("model"),
        )

        # Build messages from task
        messages = self._build_messages_from_task(task, agent_config)

        # Call LLM asynchronously (or sync wrapper if needed)
        if asyncio.get_event_loop().is_running():
            # Already in async context
            output = await provider.generate_content_async(messages)
        else:
            # Create new event loop for sync context
            output = asyncio.run(provider.generate_content_async(messages))

        # Success
        result.status = ExecutionStatus.SUCCESS
        result.output = output
        result.completed_at = datetime.now()
        result.metadata["execution_mode"] = "llm_direct"

    except ImportError as e:
        # Provider dependencies not installed
        result.status = ExecutionStatus.FAILED
        result.error = f"Provider dependencies not installed: {e}"
        result.completed_at = datetime.now()
        result.metadata["fallback_reason"] = "missing_dependencies"

    except RuntimeError as e:
        # API key not configured
        result.status = ExecutionStatus.FAILED
        result.error = f"Provider configuration error: {e}"
        result.completed_at = datetime.now()
        result.metadata["fallback_reason"] = "missing_api_key"

    except Exception as e:
        # Other errors
        result.status = ExecutionStatus.FAILED
        result.error = str(e)
        result.completed_at = datetime.now()

    return result

Modified Method: _execute_api()

def _execute_api(
    self,
    task: AgentTask,
    agent_config: AgentConfig,
    result: ExecutionResult
) -> ExecutionResult:
    """
    Execute task via direct API call.

    Now supports dual-mode execution:
    - If use_direct_llm=True: Use _execute_via_llm() (NEW)
    - If use_direct_llm=False: Use _execute_via_script() (OLD)

    Graceful fallback if provider not available.
    """
    result.status = ExecutionStatus.IN_PROGRESS

    # Check feature flag
    use_direct_llm = task.metadata.get("use_direct_llm", True)  # Default: ON

    if use_direct_llm:
        # Try new direct LLM execution
        result = self._execute_via_llm(task, agent_config, result)

        # If failed due to provider unavailability, fallback to script
        if (result.status == ExecutionStatus.FAILED and
            result.metadata.get("fallback_reason") in ["missing_dependencies", "missing_api_key"]):

            print(f"\n⚠️  Falling back to script execution for {agent_config.name}")
            print(f"   Reason: {result.metadata['fallback_reason']}\n")

            # Reset result and try script execution
            result.status = ExecutionStatus.PENDING
            result.error = ""
            result = self._execute_via_script(task, agent_config, script_path, result)
    else:
        # Use legacy script execution
        script_path = self._get_execution_script(agent_config.agent_type)
        if script_path and script_path.exists():
            result = self._execute_via_script(task, agent_config, script_path, result)
        else:
            result.status = ExecutionStatus.PENDING
            result.metadata["requires_implementation"] = True

    return result

New Helper Method: _build_messages_from_task()

def _build_messages_from_task(
    self,
    task: AgentTask,
    agent_config: AgentConfig
) -> List[Dict[str, str]]:
    """
    Build LLM messages from task specification.

    Converts AgentTask to OpenAI-format messages (standard format
    supported by all providers).

    Args:
        task: Task to execute
        agent_config: Agent configuration

    Returns:
        List of messages in OpenAI format
    """
    messages = []

    # System prompt
    system_prompt = f"""You are an AI agent executing a specific task for project orchestration.

Task ID: {task.task_id}
Title: {task.title}
Agent: {agent_config.name}

Your goal is to complete this task according to the specifications provided."""

    messages.append({"role": "system", "content": system_prompt})

    # User prompt with task details
    user_prompt = f"""Please complete the following task:

{task.description}
"""

    if task.deliverables:
        user_prompt += "\nExpected Deliverables:\n"
        for deliverable in task.deliverables:
            user_prompt += f"- {deliverable}\n"

    if task.success_criteria:
        user_prompt += "\nSuccess Criteria:\n"
        for criteria in task.success_criteria:
            user_prompt += f"- {criteria}\n"

    messages.append({"role": "user", "content": user_prompt})

    return messages

Architecture Diagram

┌─────────────────────────────────────────────────────────────┐
│                      TaskExecutor                           │
│                                                              │
│  execute(task, agent)                                       │
│      ↓                                                       │
│  _execute_api(task, agent_config, result)                   │
│      ↓                                                       │
│  [Feature Flag: use_direct_llm]                             │
│      ├─ True (NEW) ──→ _execute_via_llm()                   │
│      │                     ↓                                 │
│      │                 LlmFactory.get_provider(agent_type)  │
│      │                     ↓                                 │
│      │              ┌──────┴──────┬──────────┬──────────┐  │
│      │              ↓             ↓          ↓          ↓   │
│      │         AnthropicLlm  OpenAILlm  GeminiLlm  [more]  │
│      │              ↓             ↓          ↓              │
│      │         Claude API    GPT API   Gemini API          │
│      │                                                       │
│      └─ False (OLD) ──→ _execute_via_script()               │
│                          ↓                                   │
│                  subprocess.run(execute_*.py)               │
│                          ↓                                   │
│                  Legacy Script Execution                    │
└─────────────────────────────────────────────────────────────┘

File Structure Changes

New Files

llm_abstractions/
├── __init__.py                    # Export factory + providers
├── factory.py                     # LlmFactory (NEW)
├── anthropic.py                   # AnthropicLlm (NEW)
└── openai_llm.py                  # OpenAILlm (NEW)

Updated Files

llm_abstractions/
├── gemini.py                      # Complete implementation (UPDATED)
└── base_llm.py                    # No changes (already correct)

orchestration/
└── executor.py                    # Add _execute_via_llm() (UPDATED)

Deprecated Files (6-month timeline)

scripts/llm_execution/
├── execute_claude.py              # Add deprecation warning
├── execute_gpt.py                 # Add deprecation warning
└── execute_gemini.py              # Add deprecation warning

Implementation Phases

Phase 1A: Foundation (Week 1, Days 1-2, 16 hours)

Goal: Create LlmFactory and implement AnthropicLlm with official SDK.

Tasks

Create llm_abstractions/factory.py with LlmFactory class
Implement provider registry with dynamic loading
Implement get_provider(), list_available_providers(), is_provider_available()
Create llm_abstractions/anthropic.py with AnthropicLlm class
Implement AsyncAnthropic client initialization
Implement generate_content_async() with OpenAI→Claude format conversion
Write unit tests for LlmFactory (90%+ coverage)
Write unit tests for AnthropicLlm (90%+ coverage)

Deliverables

✅ llm_abstractions/factory.py (200 lines)
✅ llm_abstractions/anthropic.py (150 lines)
✅ Unit tests: tests/test_factory.py (100 lines)
✅ Unit tests: tests/test_anthropic.py (100 lines)

Success Criteria

✅ LlmFactory can instantiate AnthropicLlm
✅ AnthropicLlm successfully calls Claude API (integration test with real API key)
✅ Test coverage ≥90% for factory.py and anthropic.py
✅ All tests pass

Agent Assignment

rust-expert-developer: Implement factory pattern and async patterns
codi-test-engineer: Write unit tests with mocks

Time Estimate: 16 hours

LlmFactory implementation: 6 hours
AnthropicLlm implementation: 6 hours
Unit tests: 4 hours

Phase 1B: Dual-Mode Executor (Week 1, Days 3-5, 16 hours)

Goal: Add dual-mode execution to TaskExecutor with graceful fallback.

Tasks

Add _execute_via_llm() method to TaskExecutor
Add _build_messages_from_task() helper method
Modify _execute_api() to check use_direct_llm flag
Implement graceful fallback to script execution
Add async/sync compatibility layer (asyncio.run() wrapper)
Write unit tests for dual-mode execution (90%+ coverage)
Write integration tests (Anthropic provider with real API)

Deliverables

✅ Updated orchestration/executor.py (+150 lines)
✅ Unit tests: tests/test_executor_dual_mode.py (150 lines)
✅ Integration tests: tests/integration/test_executor_anthropic.py (50 lines)

Success Criteria

✅ TaskExecutor can execute tasks via AnthropicLlm when use_direct_llm=True
✅ TaskExecutor falls back to script execution when provider unavailable
✅ Async/sync compatibility layer works in both contexts
✅ Test coverage ≥90% for new executor code
✅ All tests pass

Agent Assignment

rust-expert-developer: Implement dual-mode executor with async compatibility
codi-test-engineer: Write unit + integration tests

Time Estimate: 16 hours

Executor modifications: 8 hours
Async/sync compatibility: 4 hours
Testing: 4 hours

Phase 2A: OpenAI Implementation (Week 2, Days 1-2, 16 hours)

Goal: Implement OpenAILlm with AsyncOpenAI SDK, including GPT-5.1-Codex-Max support.

Tasks

Create llm_abstractions/openai_llm.py with OpenAILlm class
Implement AsyncOpenAI client initialization (2025 SDK)
Implement generate_content_async() (native OpenAI format)
Add model validation for supported models (gpt-4o, gpt-5.1-codex-max, etc.)
Register OpenAILlm in LlmFactory provider registry
Write unit tests for OpenAILlm (90%+ coverage)
Write integration tests (real OpenAI API calls with both GPT-4o and Codex-Max)
Run performance benchmarks vs subprocess execute_gpt.py

Codex-Max Specific Features

Multi-context window support for long-running tasks
Optimized for coding tasks (refactors, debugging, agent loops)
30% fewer thinking tokens while maintaining quality
Ideal for complex TaskExecutor workflows requiring deep code analysis

Deliverables

✅ llm_abstractions/openai_llm.py (150 lines)
✅ Updated llm_abstractions/factory.py (+10 lines for registry)
✅ Unit tests: tests/test_openai_llm.py (100 lines)
✅ Integration tests: tests/integration/test_executor_openai.py (50 lines)
✅ Performance benchmark: benchmarks/openai_comparison.py (100 lines)

Success Criteria

✅ OpenAILlm successfully calls GPT-4 API
✅ LlmFactory can instantiate OpenAILlm
✅ TaskExecutor can execute tasks via OpenAILlm
✅ Performance benchmark shows ≥30% improvement over subprocess
✅ Test coverage ≥90% for openai_llm.py
✅ All tests pass

Agent Assignment

rust-expert-developer: Implement OpenAILlm with AsyncOpenAI SDK
codi-test-engineer: Write unit + integration tests + benchmarks

Time Estimate: 16 hours

OpenAILlm implementation: 6 hours
Factory registration: 2 hours
Testing: 4 hours
Performance benchmarks: 4 hours

Phase 2B: Gemini Implementation (Week 2, Days 3-5, 16 hours)

Goal: Complete Gemini implementation (replace placeholder).

Tasks

Update llm_abstractions/gemini.py to use google-generativeai SDK
Implement generate_content_async() with async support
Implement OpenAI→Gemini format conversion
Register GeminiLlm in LlmFactory provider registry
Write unit tests for GeminiLlm (90%+ coverage)
Write integration tests (real Gemini API calls)
Run performance benchmarks vs subprocess execute_gemini.py

Deliverables

✅ Updated llm_abstractions/gemini.py (replace placeholder, 150 lines)
✅ Updated llm_abstractions/factory.py (+10 lines for registry)
✅ Unit tests: tests/test_gemini.py (100 lines)
✅ Integration tests: tests/integration/test_executor_gemini.py (50 lines)
✅ Performance benchmark: benchmarks/gemini_comparison.py (100 lines)

Success Criteria

✅ GeminiLlm successfully calls Gemini API
✅ LlmFactory can instantiate GeminiLlm
✅ TaskExecutor can execute tasks via GeminiLlm
✅ Performance benchmark shows ≥30% improvement over subprocess
✅ Test coverage ≥90% for gemini.py
✅ All tests pass

Agent Assignment

rust-expert-developer: Complete Gemini implementation with async support
codi-test-engineer: Write unit + integration tests + benchmarks

Time Estimate: 16 hours

GeminiLlm implementation: 6 hours
Factory registration: 2 hours
Testing: 4 hours
Performance benchmarks: 4 hours

Phase 3: Script Deprecation (Week 3, 16 hours)

Goal: Gracefully deprecate execute_*.py scripts over 6 months.

Tasks

Add deprecation warnings to execute_claude.py, execute_gpt.py, execute_gemini.py
Set removal date (v2.0 or 6 months from Phase 3 completion)
Create migration guide: docs/EXECUTOR-MIGRATION-GUIDE.md
Update documentation: README, API docs, architecture diagrams
Update developer guide with new TaskExecutor usage
Create rollback procedure if issues arise
Announce deprecation to CODITECT users (if applicable)

Deliverables

✅ Updated scripts/llm_execution/execute_*.py with warnings (+20 lines each)
✅ Migration guide: docs/EXECUTOR-MIGRATION-GUIDE.md (2,000 words)
✅ Updated documentation: README.md, docs/architecture/
✅ Rollback procedure: docs/EXECUTOR-ROLLBACK-PROCEDURE.md (500 words)

Success Criteria

✅ All execute_*.py scripts display clear deprecation warnings
✅ Migration guide provides step-by-step adoption path
✅ Documentation reflects new architecture
✅ Rollback procedure tested and validated

Agent Assignment

codi-documentation-writer: Create migration guide and update docs
senior-architect: Review rollback procedure

Time Estimate: 16 hours

Deprecation warnings: 2 hours
Migration guide: 6 hours
Documentation updates: 6 hours
Rollback procedure: 2 hours

Phase 4: Testing & Polish (Week 4, 16 hours)

Goal: Comprehensive testing, performance validation, and production readiness.

Tasks

Run comprehensive integration test suite (all providers)
Validate performance benchmarks (≥30% improvement for all providers)
Run load tests (100+ concurrent tasks)
Review code quality (type hints, docstrings, error handling)
Security review (API key handling, error messages)
Production readiness checklist
Final documentation review
Create Phase 1 Message Bus foundation validation report

Deliverables

✅ Integration test report: test-results/integration-test-report.md
✅ Performance validation report: test-results/performance-validation.md
✅ Load test report: test-results/load-test-report.md
✅ Security review report: test-results/security-review.md
✅ Production readiness checklist: docs/PRODUCTION-READINESS-CHECKLIST.md
✅ Phase 1 foundation report: docs/PHASE-1-FOUNDATION-VALIDATION.md

Success Criteria

✅ All integration tests pass (100% success rate)
✅ Performance benchmarks show ≥30% improvement across all providers
✅ Load tests handle 100+ concurrent tasks without failures
✅ Security review finds no critical issues
✅ Production readiness checklist 100% complete
✅ Phase 1 foundation validated (async support, factory pattern ready)

Agent Assignment

codi-test-engineer: Run comprehensive test suite
senior-architect: Production readiness review
security-specialist-agent: Security review

Time Estimate: 16 hours

Integration testing: 4 hours
Performance validation: 4 hours
Load testing: 4 hours
Security + production review: 4 hours

Multi-Agent Orchestration Strategy

Agent Roles & Responsibilities

Agent	Phases	Primary Responsibilities	Deliverables
rust-expert-developer	1A, 1B, 2A, 2B	LLM abstraction implementations, async patterns, factory pattern	LlmFactory, AnthropicLlm, OpenAILlm, GeminiLlm, executor updates
codi-test-engineer	1A, 1B, 2A, 2B, 4	Unit tests, integration tests, performance benchmarks, load tests	Test suites, benchmark reports, test coverage reports
senior-architect	3, 4	Architecture review, rollback procedure, production readiness	Architecture validation, rollback plan, readiness checklist
codi-documentation-writer	3	Migration guide, documentation updates, developer guides	Migration guide, updated docs, API documentation
security-specialist-agent	4	Security review, API key handling, error message sanitization	Security review report, remediation recommendations

Workflow Coordination

Phase 1A Workflow

rust-expert-developer:
  1. Implement LlmFactory (6 hours)
  2. Implement AnthropicLlm (6 hours)
  ↓
codi-test-engineer:
  3. Write unit tests for factory (2 hours)
  4. Write unit tests for anthropic (2 hours)
  ↓
senior-architect:
  5. Review factory pattern implementation (30 min)
  6. Approve Phase 1A completion ✅

Phase 1B Workflow

rust-expert-developer:
  1. Implement _execute_via_llm() (8 hours)
  2. Add async/sync compatibility (4 hours)
  ↓
codi-test-engineer:
  3. Write unit tests for dual-mode (3 hours)
  4. Write integration tests (1 hour)
  ↓
senior-architect:
  5. Review dual-mode design (30 min)
  6. Approve Phase 1B completion ✅

Phase 2A Workflow

rust-expert-developer:
  1. Implement OpenAILlm (6 hours)
  2. Register in factory (2 hours)
  ↓
codi-test-engineer (parallel):
  3. Write unit tests (2 hours)
  4. Write integration tests (2 hours)
  5. Run performance benchmarks (4 hours)
  ↓
senior-architect:
  6. Review performance results (30 min)
  7. Approve Phase 2A completion ✅

Phase 2B Workflow

rust-expert-developer:
  1. Complete GeminiLlm (6 hours)
  2. Register in factory (2 hours)
  ↓
codi-test-engineer (parallel):
  3. Write unit tests (2 hours)
  4. Write integration tests (2 hours)
  5. Run performance benchmarks (4 hours)
  ↓
senior-architect:
  6. Review all providers (1 hour)
  7. Approve Phase 2B completion ✅

Phase 3 Workflow

codi-documentation-writer:
  1. Add deprecation warnings to scripts (2 hours)
  2. Create migration guide (6 hours)
  3. Update documentation (6 hours)
  ↓
senior-architect:
  4. Create rollback procedure (2 hours)
  5. Review migration strategy (30 min)
  6. Approve Phase 3 completion ✅

Phase 4 Workflow

codi-test-engineer:
  1. Run integration tests (4 hours)
  2. Validate performance (4 hours)
  3. Run load tests (4 hours)
  ↓
security-specialist-agent (parallel):
  4. Security review (3 hours)
  ↓
senior-architect:
  5. Production readiness review (4 hours)
  6. Phase 1 foundation validation (1 hour)
  7. Final approval ✅

Parallel Execution Opportunities

Week 1

Phase 1A (Days 1-2): Sequential (factory → anthropic → tests)
Phase 1B (Days 3-5): Sequential (executor → tests)

Week 2

Phase 2A (Days 1-2): Parallel (implementation + testing can overlap)
Phase 2B (Days 3-5): Parallel (implementation + testing can overlap)

Week 3

Phase 3: Parallel (documentation + rollback procedure can overlap)

Week 4

Phase 4: Parallel (testing + security review can run concurrently)

Maximum Concurrency: 2-3 agents working simultaneously during Weeks 2-4

Quality Gates & Success Criteria

Phase 1A Quality Gate

Entry Criteria

✅ Go/no-go decision approved
✅ Development environment setup complete
✅ Dependencies installed (anthropic, pytest, pytest-asyncio)

Exit Criteria

✅ LlmFactory can instantiate AnthropicLlm
✅ AnthropicLlm successfully calls Claude API (integration test)
✅ Test coverage ≥90% for factory.py and anthropic.py
✅ All unit tests pass (100% pass rate)
✅ Code review approved by senior-architect

Validation

# Run tests
pytest tests/test_factory.py tests/test_anthropic.py -v --cov

# Integration test
pytest tests/integration/test_anthropic_real_api.py -v

# Coverage check
pytest --cov=llm_abstractions --cov-report=term-missing

Rollback: If exit criteria not met, revert to subprocess execution.

Phase 1B Quality Gate

Entry Criteria

✅ Phase 1A completed (LlmFactory + AnthropicLlm working)
✅ TaskExecutor code reviewed and understood

Exit Criteria

✅ TaskExecutor executes tasks via AnthropicLlm when use_direct_llm=True
✅ Graceful fallback to script execution works
✅ Async/sync compatibility layer functional
✅ Test coverage ≥90% for new executor code
✅ All unit + integration tests pass (100% pass rate)
✅ Code review approved by senior-architect

Validation

# Run executor tests
pytest tests/test_executor_dual_mode.py -v --cov

# Integration test (real API)
pytest tests/integration/test_executor_anthropic.py -v

# Test fallback behavior
pytest tests/test_executor_fallback.py -v

Rollback: Disable use_direct_llm flag (default to False).

Phase 2A Quality Gate

Entry Criteria

✅ Phase 1B completed (dual-mode executor working)
✅ OpenAI API key available for testing

Exit Criteria

✅ OpenAILlm successfully calls GPT-4 API
✅ TaskExecutor executes tasks via OpenAILlm
✅ Performance benchmark shows ≥30% improvement over subprocess
✅ Test coverage ≥90% for openai_llm.py
✅ All unit + integration tests pass (100% pass rate)
✅ Performance validation report approved

Validation

# Run OpenAI tests
pytest tests/test_openai_llm.py tests/integration/test_executor_openai.py -v

# Run performance benchmark
python benchmarks/openai_comparison.py

# Expected output: ≥30% faster than subprocess

Rollback: Remove OpenAILlm from factory registry (AnthropicLlm still works).

Phase 2B Quality Gate

Entry Criteria

✅ Phase 2A completed (OpenAILlm working)
✅ Google API key available for testing

Exit Criteria

✅ GeminiLlm successfully calls Gemini API
✅ TaskExecutor executes tasks via GeminiLlm
✅ Performance benchmark shows ≥30% improvement over subprocess
✅ Test coverage ≥90% for gemini.py
✅ All unit + integration tests pass (100% pass rate)
✅ All 3 providers (Anthropic, OpenAI, Gemini) working
✅ Performance validation report approved

Validation

# Run Gemini tests
pytest tests/test_gemini.py tests/integration/test_executor_gemini.py -v

# Run performance benchmark
python benchmarks/gemini_comparison.py

# Run comprehensive provider tests
pytest tests/integration/test_all_providers.py -v

Rollback: Remove GeminiLlm from factory registry (Anthropic + OpenAI still work).

Phase 3 Quality Gate

Entry Criteria

✅ Phase 2B completed (all providers working)
✅ Migration strategy approved

Exit Criteria

✅ All execute_*.py scripts display clear deprecation warnings
✅ Migration guide reviewed and approved
✅ Documentation updated to reflect new architecture
✅ Rollback procedure tested and validated
✅ Deprecation announcement prepared (if applicable)

Validation

# Verify deprecation warnings
echo '{}' | python scripts/llm_execution/execute_claude.py 2>&1 | grep -i deprecat

# Review migration guide
cat docs/EXECUTOR-MIGRATION-GUIDE.md | wc -w  # Should be ~2000 words

# Test rollback procedure
./docs/EXECUTOR-ROLLBACK-PROCEDURE.sh --dry-run

Rollback: N/A (documentation-only phase, no code changes).

Phase 4 Quality Gate (Final)

Entry Criteria

✅ Phase 3 completed (deprecation + docs)
✅ All previous phases passed quality gates

Exit Criteria (Production Readiness)

✅ All integration tests pass (100% success rate)
✅ Performance benchmarks show ≥30% improvement for all providers
✅ Load tests handle 100+ concurrent tasks without failures
✅ Test coverage ≥90% for all new code
✅ Security review finds no critical issues
✅ API key handling follows best practices
✅ Error messages don't leak sensitive data
✅ Production readiness checklist 100% complete
✅ Phase 1 Message Bus foundation validated
✅ Final sign-off from senior-architect

Validation

# Integration tests
pytest tests/integration/ -v --maxfail=0

# Performance validation
python benchmarks/validate_all_providers.py

# Load tests
python tests/load/test_concurrent_execution.py --tasks=100

# Security check
python scripts/security_audit.py --check-api-keys --check-error-messages

# Coverage report
pytest --cov=llm_abstractions --cov=orchestration --cov-report=html

Production Deployment: If all exit criteria met, merge to main and deploy.

Rollback: Set use_direct_llm=False globally in TaskExecutor (reverts to script execution).

Risk Management

Risk Matrix

Risk ID	Risk Description	Probability	Impact	Mitigation Strategy	Owner
R1	API provider changes SDK interface	Low	High	Pin SDK versions in requirements.txt; monitor changelogs	rust-expert-developer
R2	Performance improvement < 30%	Medium	Medium	Run early benchmarks in Phase 2A; optimize if needed	codi-test-engineer
R3	Backward compatibility broken	Low	Critical	Dual-mode executor with feature flag; comprehensive testing	senior-architect
R4	Async/sync compatibility issues	Medium	High	Test in both contexts; use asyncio.run() wrapper	rust-expert-developer
R5	LLM provider API outage during testing	Low	Low	Use mock providers for unit tests; retry integration tests	codi-test-engineer
R6	Security vulnerability in API key handling	Low	Critical	Security review in Phase 4; follow SDK best practices	security-specialist-agent
R7	Timeline slip (>4 weeks)	Medium	Medium	Weekly progress tracking; adjust scope if needed	senior-architect
R8	Cost overrun (>$10K)	Low	Medium	Track hours daily; flag early if approaching 80% budget	senior-architect
R9	Phase 1 Message Bus integration issues	Low	High	Validate async foundation in Phase 4; document requirements	senior-architect
R10	User adoption resistance (prefer scripts)	Medium	Low	Create compelling migration guide; show performance wins	codi-documentation-writer

Risk Monitoring

Weekly Risk Review

Review open risks during weekly sync
Update probability/impact based on progress
Activate mitigation strategies as needed

Escalation Triggers

Any CRITICAL impact risk becomes HIGH probability
Timeline slip >20% (>1 week delay)
Cost overrun >20% (>$2K over budget)
Any quality gate fails twice

Escalation Path

Senior architect reviews issue
Decide: adjust scope, add resources, or accept delay
Notify stakeholders if timeline/budget impacted

Backward Compatibility Strategy

Dual-Mode Execution

Feature Flag: use_direct_llm (task-level metadata)

Default Behavior

use_direct_llm=True (NEW): Try direct LLM execution via factory
If provider unavailable (missing API key or dependencies): Graceful fallback to script execution
use_direct_llm=False (OLD): Use script execution (legacy behavior)

Rollback Safety

Setting use_direct_llm=False globally reverts to 100% subprocess execution
No code changes needed to rollback (just flip feature flag)
Scripts remain fully functional during entire 6-month deprecation period

Migration Timeline

Date	Milestone	Action
Week 4 (Phase 3)	Script deprecation warnings added	Scripts display deprecation notice but still work
Month 1-6	Migration period	Users adopt direct LLM execution at their own pace
Month 6	Deprecation deadline	Scripts show stronger warnings
Month 7 (v2.0)	Script removal	Scripts deleted from codebase

Safety Net: If migration issues arise, extend deprecation period by 3 months.

Testing Strategy for Compatibility

Dual-Mode Testing

# Test both execution paths for every provider
@pytest.mark.parametrize("use_direct_llm", [True, False])
def test_task_execution(use_direct_llm):
    task.metadata["use_direct_llm"] = use_direct_llm
    result = executor.execute(task)
    assert result.status == ExecutionStatus.SUCCESS

Fallback Testing

# Test graceful fallback when provider unavailable
def test_fallback_missing_api_key(monkeypatch):
    monkeypatch.delenv("OPENAI_API_KEY")
    task.metadata["use_direct_llm"] = True
    result = executor.execute(task)
    # Should fallback to script execution
    assert result.metadata.get("fallback_reason") == "missing_api_key"

Testing Strategy

Test Coverage Goals

Module	Target Coverage	Test Types
`llm_abstractions/factory.py`	≥90%	Unit, integration
`llm_abstractions/anthropic.py`	≥90%	Unit, integration
`llm_abstractions/openai_llm.py`	≥90%	Unit, integration
`llm_abstractions/gemini.py`	≥90%	Unit, integration
`orchestration/executor.py` (new code)	≥90%	Unit, integration
Overall llm_abstractions	≥90%	All types

Test Types

Unit Tests (Fast, No API Calls)

Mock BaseLlm implementations
Test factory provider registry
Test message conversion logic
Test error handling

Integration Tests (Real API Calls)

Test each provider with real API key
Validate response parsing
Test error scenarios (rate limits, invalid keys)
Requires API keys in CI/CD environment

Performance Benchmarks

Compare direct LLM vs subprocess execution
Measure latency reduction (target: ≥30%)
Test with various task sizes (short, medium, long prompts)

Load Tests

Test 100+ concurrent tasks
Validate async execution doesn't degrade performance
Check for memory leaks or resource exhaustion

Test Execution Strategy

Local Development

# Run unit tests (fast, no API keys needed)
pytest tests/ -m "not integration" -v

# Run integration tests (requires API keys)
export ANTHROPIC_API_KEY=xxx
export OPENAI_API_KEY=xxx
export GOOGLE_API_KEY=xxx
pytest tests/integration/ -v

# Run all tests with coverage
pytest --cov=llm_abstractions --cov=orchestration --cov-report=html

CI/CD Pipeline

# GitHub Actions workflow
jobs:
  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - run: pytest tests/ -m "not integration" --cov

  integration-tests:
    runs-on: ubuntu-latest
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    env:
      ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
      OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      GOOGLE_API_KEY: ${{ secrets.GOOGLE_API_KEY }}
    steps:
      - run: pytest tests/integration/ -v

Performance Benchmarks

Baseline (Subprocess Execution)

Measured with: scripts/llm_execution/execute_gpt.py

Metric	Value
Subprocess spawn overhead	100-200ms
JSON serialization	10-20ms
Script execution overhead	50-100ms
Result parsing	10-20ms
Total overhead	180-350ms
LLM API call	500-2000ms (variable)
End-to-end latency	680-2350ms

Target (Direct LLM Execution)

Expected with: LlmFactory → OpenAILlm

Metric	Target Value
Factory lookup	1ms
Message building	5ms
LLM API call	500-2000ms (same)
Result parsing	10-20ms
Total overhead	66-126ms
End-to-end latency	566-2126ms

Performance Improvement Calculation

Overhead reduction = (180-350ms) - (66-126ms) = 114-224ms

Improvement % = (114-224ms) / (180-350ms) = 30-50%

Target: ≥30% improvement across all providers

Benchmark Suite

File: benchmarks/executor_performance.py

import time
from orchestration.executor import TaskExecutor
from orchestration.task import AgentTask

def benchmark_subprocess_execution():
    """Baseline: subprocess-based execution."""
    task = AgentTask(
        task_id="BENCH-001",
        title="Test task",
        description="Simple test",
        agent="gpt-4"
    )
    task.metadata["use_direct_llm"] = False  # Force subprocess

    start = time.time()
    result = executor.execute(task)
    end = time.time()

    return end - start

def benchmark_direct_llm_execution():
    """Target: direct LLM execution."""
    task = AgentTask(
        task_id="BENCH-002",
        title="Test task",
        description="Simple test",
        agent="gpt-4"
    )
    task.metadata["use_direct_llm"] = True  # Force direct LLM

    start = time.time()
    result = executor.execute(task)
    end = time.time()

    return end - start

# Run benchmarks
subprocess_times = [benchmark_subprocess_execution() for _ in range(10)]
direct_llm_times = [benchmark_direct_llm_execution() for _ in range(10)]

# Calculate improvement
avg_subprocess = sum(subprocess_times) / len(subprocess_times)
avg_direct_llm = sum(direct_llm_times) / len(direct_llm_times)
improvement = (avg_subprocess - avg_direct_llm) / avg_subprocess * 100

print(f"Subprocess avg: {avg_subprocess:.3f}s")
print(f"Direct LLM avg: {avg_direct_llm:.3f}s")
print(f"Improvement: {improvement:.1f}%")
assert improvement >= 30, f"Performance target not met: {improvement:.1f}% < 30%"

Migration & Deprecation Plan

Deprecation Timeline

6-Month Gradual Deprecation

Month	Action	User Impact
Month 0 (Phase 3)	Add deprecation warnings to scripts	Users see warnings but scripts work
Month 1-3	Migration period (early adopters)	Users migrate at their own pace
Month 4-6	Migration period (remaining users)	Stronger warnings added
Month 6	Final deprecation notice	Scripts still work but show urgent warnings
Month 7 (v2.0)	Script removal	Scripts deleted from codebase

Deprecation Warning Implementation

File: scripts/llm_execution/execute_gpt.py (and others)

import warnings

# Add at top of main()
warnings.warn(
    "\n" + "=" * 70 + "\n"
    "DEPRECATION WARNING: execute_gpt.py will be removed in v2.0\n"
    "=" * 70 + "\n"
    "This script-based execution is deprecated in favor of direct\n"
    "LLM abstraction layer via LlmFactory.\n\n"
    "Migration Guide: docs/EXECUTOR-MIGRATION-GUIDE.md\n"
    "Removal Date: [6 months from Phase 3 completion]\n"
    "=" * 70 + "\n",
    DeprecationWarning,
    stacklevel=2
)

Migration Guide Outline

File: docs/EXECUTOR-MIGRATION-GUIDE.md

Why Migrate? - Performance improvements, async support, Phase 1 foundation
Before You Start - Prerequisites, API keys, dependencies
Step-by-Step Migration:
- Update TaskExecutor usage to set use_direct_llm=True
- Test dual-mode execution
- Validate performance improvements
- Remove use_direct_llm flag (defaults to True)
Testing Your Migration:
- Unit test examples
- Integration test examples
- Performance benchmarks
Rollback Procedure - How to revert if issues arise
Troubleshooting:
- Missing API keys
- Missing dependencies
- Provider unavailability
FAQ:
- When will scripts be removed?
- What if I can't migrate by Month 6?
- How do I get help?

Rollback Procedure

File: docs/EXECUTOR-ROLLBACK-PROCEDURE.md

Quick Rollback (Set Feature Flag)

# In TaskExecutor initialization or task metadata
task.metadata["use_direct_llm"] = False  # Revert to script execution

Full Rollback (Revert Git Commits)

# If critical issues arise, revert refactoring
git revert <phase-4-commit-sha>
git revert <phase-3-commit-sha>
git revert <phase-2b-commit-sha>
git revert <phase-2a-commit-sha>
git revert <phase-1b-commit-sha>
git revert <phase-1a-commit-sha>

Validation After Rollback

# Ensure scripts still work
pytest tests/test_executor_scripts.py -v

# Verify no regressions
pytest tests/integration/ -v

Agent Assignment Matrix

Task ID	Task Description	Agent	Phase	Hours	Dependencies
T1.1	Implement LlmFactory	rust-expert-developer	1A	6	None
T1.2	Implement AnthropicLlm	rust-expert-developer	1A	6	T1.1
T1.3	Unit tests for factory	codi-test-engineer	1A	2	T1.1
T1.4	Unit tests for anthropic	codi-test-engineer	1A	2	T1.2
T2.1	Add _execute_via_llm()	rust-expert-developer	1B	8	T1.2
T2.2	Async/sync compatibility	rust-expert-developer	1B	4	T2.1
T2.3	Unit tests for executor	codi-test-engineer	1B	3	T2.1
T2.4	Integration tests	codi-test-engineer	1B	1	T2.1
T3.1	Implement OpenAILlm	rust-expert-developer	2A	6	T2.1
T3.2	Register in factory	rust-expert-developer	2A	2	T3.1
T3.3	Unit tests for openai	codi-test-engineer	2A	2	T3.1
T3.4	Integration tests	codi-test-engineer	2A	2	T3.1
T3.5	Performance benchmarks	codi-test-engineer	2A	4	T3.1
T4.1	Complete GeminiLlm	rust-expert-developer	2B	6	T2.1
T4.2	Register in factory	rust-expert-developer	2B	2	T4.1
T4.3	Unit tests for gemini	codi-test-engineer	2B	2	T4.1
T4.4	Integration tests	codi-test-engineer	2B	2	T4.1
T4.5	Performance benchmarks	codi-test-engineer	2B	4	T4.1
T5.1	Add deprecation warnings	codi-documentation-writer	3	2	T4.1
T5.2	Create migration guide	codi-documentation-writer	3	6	T4.1
T5.3	Update documentation	codi-documentation-writer	3	6	T4.1
T5.4	Create rollback procedure	senior-architect	3	2	T4.1
T6.1	Integration tests	codi-test-engineer	4	4	T5.1
T6.2	Performance validation	codi-test-engineer	4	4	T5.1
T6.3	Load tests	codi-test-engineer	4	4	T5.1
T6.4	Security review	security-specialist-agent	4	3	T5.1
T6.5	Production readiness	senior-architect	4	4	T6.1, T6.2, T6.3, T6.4
T6.6	Phase 1 foundation validation	senior-architect	4	1	T6.5

Total Tasks: 26 Total Hours: 80 Total Cost: $10,000 (at $125/hour)

Budget Breakdown

Engineering Costs

Phase	Tasks	Hours	Cost	Notes
Phase 1A	T1.1-T1.4	16	$2,000	Factory + Anthropic implementation
Phase 1B	T2.1-T2.4	16	$2,000	Dual-mode executor
Phase 2A	T3.1-T3.5	16	$2,000	OpenAI implementation
Phase 2B	T4.1-T4.5	16	$2,000	Gemini implementation
Phase 3	T5.1-T5.4	16	$2,000	Deprecation + docs
Phase 4	T6.1-T6.6	16	$2,000	Testing + validation
Total	26 tasks	80 hours	$10,000	Fully allocated

Engineering Rate: $125/hour (senior Python developer with LLM/async expertise)

Infrastructure Costs

Item	Cost	Notes
LLM API Usage (Testing)	$200	Anthropic + OpenAI + Gemini integration tests
CI/CD Compute (GitHub Actions)	$50	80 hours of testing across 4 weeks
Total Infrastructure	$250	One-time costs

Total Project Cost

Engineering:      $10,000
Infrastructure:   $   250
---
Total:            $10,250

Budget Contingency: 10% ($1,025) for scope adjustments = $11,275 total budget

Timeline & Dependencies

Gantt Chart (4 Weeks)

Week 1: Foundation
├─ Days 1-2: Phase 1A (Factory + Anthropic)
│  ├─ T1.1: LlmFactory (6h)
│  ├─ T1.2: AnthropicLlm (6h)
│  ├─ T1.3-T1.4: Tests (4h)
│
└─ Days 3-5: Phase 1B (Dual-Mode Executor)
   ├─ T2.1: _execute_via_llm() (8h)
   ├─ T2.2: Async compatibility (4h)
   └─ T2.3-T2.4: Tests (4h)

Week 2: Provider Implementations
├─ Days 1-2: Phase 2A (OpenAI)
│  ├─ T3.1-T3.2: OpenAILlm (8h)
│  ├─ T3.3-T3.4: Tests (4h) [PARALLEL]
│  └─ T3.5: Benchmarks (4h)
│
└─ Days 3-5: Phase 2B (Gemini)
   ├─ T4.1-T4.2: GeminiLlm (8h)
   ├─ T4.3-T4.4: Tests (4h) [PARALLEL]
   └─ T4.5: Benchmarks (4h)

Week 3: Deprecation & Documentation
└─ Days 1-5: Phase 3
   ├─ T5.1: Warnings (2h)
   ├─ T5.2: Migration guide (6h)
   ├─ T5.3: Docs (6h)
   └─ T5.4: Rollback procedure (2h) [PARALLEL]

Week 4: Testing & Validation
└─ Days 1-5: Phase 4
   ├─ T6.1: Integration tests (4h)
   ├─ T6.2: Performance tests (4h)
   ├─ T6.3: Load tests (4h)
   ├─ T6.4: Security review (3h) [PARALLEL]
   ├─ T6.5: Production readiness (4h)
   └─ T6.6: Phase 1 validation (1h)

Critical Path

T1.1 → T1.2 → T2.1 → T2.2 → T3.1 → T4.1 → T5.2 → T6.5 → T6.6

Critical Path Duration: 50 hours (longest sequential dependency chain) Total Project Duration: 80 hours (with parallel work) Efficiency: 62.5% (50/80) - Good parallelization

Dependencies Diagram

       T1.1 (Factory)
          ↓
       T1.2 (Anthropic) ───→ T1.3-T1.4 (Tests)
          ↓
       T2.1 (Executor) ───→ T2.3-T2.4 (Tests)
          ↓
       T2.2 (Async)
          ↓
    ┌────┴────┐
    ↓         ↓
 T3.1 (OpenAI)  T4.1 (Gemini)
    ↓         ↓
 T3.3-T3.5  T4.3-T4.5 (Tests + Benchmarks)
    └────┬────┘
         ↓
    T5.1-T5.4 (Deprecation)
         ↓
    T6.1-T6.4 (Testing)
         ↓
    T6.5-T6.6 (Validation)

Monitoring & Metrics

Success Metrics

Metric	Baseline	Target	Measurement Method
Performance Improvement	0%	≥30%	Performance benchmarks
Test Coverage	0%	≥90%	`pytest --cov` report
Integration Test Pass Rate	N/A	100%	CI/CD pipeline
Provider Availability	1 (Gemini placeholder)	3 (Anthropic, OpenAI, Gemini)	Factory registry
Async Support	No	Yes	TaskExecutor async compatibility
Parallel Execution	No	Yes (3x speedup)	Async executor benchmarks
Backward Compatibility	100%	100%	Dual-mode testing
Timeline Adherence	N/A	≤4.5 weeks	Weekly progress tracking
Cost Adherence	N/A	≤$10,750	Daily hour tracking

Weekly Progress Tracking

Weekly Sync Meeting Agenda

Review completed tasks from TASKLIST-EXECUTOR-REFACTORING.md
Update progress metrics
Identify blockers
Adjust timeline/scope if needed
Review risks and activate mitigations

Progress Report Template

## Week [N] Progress Report

**Date:** [YYYY-MM-DD]
**Phase:** [Phase Name]

### Completed Tasks
- [x] T1.1: LlmFactory implementation
- [x] T1.2: AnthropicLlm implementation

### In Progress
- [ ] T1.3: Unit tests for factory (75% complete)

### Blockers
- None

### Metrics
- Hours spent: 12/16 (75%)
- Test coverage: 85% (target: 90%)
- Performance improvement: N/A (not yet measured)

### Next Week Plan
- Complete Phase 1A testing
- Begin Phase 1B (dual-mode executor)

Automated Metrics Collection

CI/CD Integration

# .github/workflows/executor-refactoring.yml
jobs:
  metrics:
    runs-on: ubuntu-latest
    steps:
      - name: Test coverage
        run: |
          pytest --cov=llm_abstractions --cov-report=json
          echo "Coverage: $(jq '.totals.percent_covered' coverage.json)%"

      - name: Performance benchmarks
        run: |
          python benchmarks/executor_performance.py > benchmark-results.txt
          cat benchmark-results.txt

      - name: Upload artifacts
        uses: actions/upload-artifact@v3
        with:
          name: metrics-week-${{ github.run_number }}
          path: |
            coverage.json
            benchmark-results.txt

Go/No-Go Decision Criteria

Go Criteria (Proceed with Refactoring)

Technical Readiness

✅ BaseLlm abstract class exists and is well-designed
✅ AgentRegistry with AgentType enum is in place
✅ TaskExecutor architecture supports dual-mode execution
✅ Development environment ready (Python 3.10+, dependencies)

Business Justification

✅ Performance improvement (30-50%) aligns with Phase 1 requirements
✅ Budget ($10K) is available and approved
✅ Timeline (4 weeks) fits within project schedule
✅ ROI is clear: Foundation for Phase 1 autonomous agents

Resource Availability

✅ Senior Python developer available (80 hours over 4 weeks)
✅ Agents available for coordination (test, docs, architecture)
✅ API keys available for testing (Anthropic, OpenAI, Gemini)

Risk Acceptance

✅ Dual-mode strategy mitigates backward compatibility risk
✅ Rollback procedure provides safety net
✅ No blocking dependencies on other work

No-Go Criteria (Defer Refactoring)

Technical Concerns

❌ BaseLlm design fundamentally flawed (requires redesign)
❌ Async/sync compatibility not achievable
❌ Critical security issues with API key handling

Business Concerns

❌ Budget not available ($10K not approved)
❌ Timeline too aggressive (need >4 weeks but can't allocate)
❌ Phase 1 delayed (no immediate need for async foundation)

Resource Constraints

❌ Senior Python developer not available
❌ API keys not available for testing
❌ Competing priorities (critical bugs, other features)

Risk Intolerance

❌ Stakeholders unwilling to accept any risk
❌ Backward compatibility concerns cannot be mitigated
❌ Performance improvement not critical

Decision Authority

Decision Maker: Hal Casteel, CEO/CTO, AZ1.AI INC.

Recommendation: GO ✅

Rationale

Technical Readiness: All prerequisites met; architecture supports dual-mode execution
Strategic Importance: Foundation for Phase 1 Message Bus autonomous agents
Performance Impact: 30-50% improvement enables better user experience
Risk Mitigation: Dual-mode executor + rollback procedure = low risk
ROI: $10K investment enables $100K+ Phase 1 implementation

Conditions for GO

✅ API keys available for all 3 providers (Anthropic, OpenAI, Gemini)
✅ 86 engineering hours allocated over 4.5 weeks (updated for async)
✅ Weekly progress tracking and risk review
✅ Senior architect oversight for quality gates
✅ User approval for async executor refactoring

Architecture Decision Record

ADR-001: Async TaskExecutor Refactoring

Status: ✅ ACCEPTED (2025-11-23)

Decision: Make TaskExecutor.execute() an async method, converting all 7 execution-related methods to async/await pattern.

Complete Documentation: docs/02-architecture/adrs/ADR-001-async-task-executor-refactoring.md

Key Sections

Context and Problem Statement
- Current architecture uses asyncio.run() wrapper (creates new event loop per task)
- User explicitly requested async alignment with ProjectOrchestrator
- Phase 1 Message Bus requires end-to-end async flow
Decision Drivers
- User alignment (mandatory)
- Phase 1 foundation (mandatory)
- Parallel execution (3x speedup)
- Architecture purity (eliminate async/sync boundaries)
Alternatives Considered
- Option 1: Keep sync executor with asyncio.run() wrapper ❌ REJECTED
- Option 2: Hybrid approach (dual sync/async methods) ❌ REJECTED
- Option 3: Make TaskExecutor.execute() async ✅ SELECTED
Decision Outcome
- 7 methods converted to async
- All callers use await executor.execute()
- pytest-asyncio for all tests
- Budget: +$750 (7.5% increase)
- Timeline: +3 days (4.5 weeks vs 4 weeks)
Consequences
- ✅ Positive: End-to-end async, 3x parallel speedup, Phase 1 foundation, architecture purity
- ⚠️ Negative: Breaking change, test migration, budget/timeline increase
- Mitigation: Migration guide, rollback plan, comprehensive testing
Implementation Details
- Phase 1B updated: 22 hours (up from 16 hours)
- 7 method signatures changed to async
- pytest-asyncio==0.23.0 added to requirements-dev.txt
- Success criteria: Parallel tasks <3s (vs 6s sequential)
Validation and Compliance
- All 7 methods converted to async
- Parallel execution test passes (<3s for 3 tasks)
- No asyncio.run() in production code
- Phase 1 foundation validated

ASYNC-EXECUTOR-STRATEGIC-PLAN.md - 62KB comprehensive analysis
ASYNC-EXECUTOR-INTEGRATION-SECTION.md - Integration instructions
docs/MULTI-AGENT-ARCHITECTURE-BEST-PRACTICES.md - Async patterns research

Approval

Decision Maker: Hal Casteel, CEO/CTO, AZ1.AI INC.
Status: ✅ APPROVED (based on user feedback)
Date: 2025-11-23
ROI: 10x+ (enables $100K+ Phase 1 implementation)

Appendix

Glossary

Term	Definition
BaseLlm	Abstract base class for all LLM provider implementations
LlmFactory	Factory class for creating LLM provider instances based on AgentType
AgentType	Enum defining LLM types (ANTHROPIC_Claude, OPENAI_GPT, GOOGLE_GEMINI, etc.)
Dual-Mode Executor	TaskExecutor supporting both direct LLM and subprocess execution
Feature Flag	`use_direct_llm` metadata flag controlling execution mode
Graceful Fallback	Automatic revert to script execution if provider unavailable
Phase 1 Message Bus	Future work: autonomous agent-to-agent communication via RabbitMQ

References

Internal Documents

docs/03-project-planning/project-plan.md - Master rollout plan
orchestration/README.md - Orchestration framework documentation
llm_abstractions/base_llm.py - BaseLlm abstract class

External Resources

Anthropic API Docs: https://docs.anthropic.com/
OpenAI API Docs: https://platform.openai.com/docs/
Google Gemini API Docs: https://ai.google.dev/docs/

Change Log

Version	Date	Author	Changes
1.0	2025-11-23	Hal Casteel	Initial project plan created

Document Status: ✅ READY FOR GO/NO-GO DECISION Last Updated: 2025-11-23 Next Review: After go/no-go decision Owner: Hal Casteel, CEO/CTO, AZ1.AI INC.

Executive Summary​

Current Status Overview​

Key Deliverables​

Budget & Investment​

Table of Contents​

Project Overview​

Purpose​

Strategic Context​

Current Architecture (Subprocess-Based)​

Problems with Current Architecture​

Target Architecture (Direct LLM Abstraction)​

Benefits of Target Architecture​

Project Objectives​

Primary Objectives​

Secondary Objectives​

Success Criteria​

Technical Success​

Process Success​

Business Success​

Strategic Context & Rationale​

Why This Refactoring Matters​

1. Foundation for Phase 1 Autonomous Agents​

2. Performance Bottleneck Elimination​

Net improvement: 30-50% reduction in orchestration latency​

3. Code Quality & Maintainability​

4. Alignment with Industry Best Practices​

Alternative Approaches Considered​

Dependencies on Other Work​

Prerequisites (Already Complete)​

Enables Future Work (Phase 1)​

User Feedback: Async Executor Alignment​

User Request​

Strategic Decision: Make TaskExecutor.execute() Async​

Rationale​

Target (Async)​

Cascading Changes​

Performance Impact​

Baseline (Current)​

With Async (Target)​

Budget Impact​

Total Project Budget​

Timeline Impact​

Week 1 (Updated)​

Architecture Decision Record​

Key ADR Sections​

For complete details, see​

Technical Architecture​

Current Architecture (Subprocess-Based)​

Problems​

Target Architecture (Direct LLM Abstraction)​

Benefits​

Component Design​

1. LlmFactory (NEW)​

Responsibilities​

Interface​

Implementation Strategy​

2. AnthropicLlm (NEW)​

Responsibilities​

Interface​

Key Implementation Details​

3. OpenAILlm (NEW)​

Responsibilities​

Interface​

Key Implementation Details​

4. GeminiLlm (UPDATE EXISTING)​

Responsibilities​

Interface​

Key Implementation Details​

5. TaskExecutor Updates (MODIFY EXISTING)​

Architecture Diagram​

File Structure Changes​

New Files​

Updated Files​

Deprecated Files (6-month timeline)​

Implementation Phases​

Phase 1A: Foundation (Week 1, Days 1-2, 16 hours)​

Tasks​

Deliverables​

Success Criteria​

Agent Assignment​

Executive Summary

Current Status Overview

Key Deliverables

Budget & Investment

Table of Contents

Project Overview

Purpose

Strategic Context

Current Architecture (Subprocess-Based)

Problems with Current Architecture

Target Architecture (Direct LLM Abstraction)

Benefits of Target Architecture

Project Objectives

Primary Objectives

Secondary Objectives

Success Criteria

Technical Success

Process Success

Business Success

Strategic Context & Rationale

Why This Refactoring Matters

1. Foundation for Phase 1 Autonomous Agents

2. Performance Bottleneck Elimination

Net improvement: 30-50% reduction in orchestration latency

3. Code Quality & Maintainability

4. Alignment with Industry Best Practices

Alternative Approaches Considered

Dependencies on Other Work

Prerequisites (Already Complete)

Enables Future Work (Phase 1)

User Feedback: Async Executor Alignment

User Request

Strategic Decision: Make TaskExecutor.execute() Async

Rationale

Target (Async)

Cascading Changes

Performance Impact

Baseline (Current)

With Async (Target)

Budget Impact

Total Project Budget

Timeline Impact

Week 1 (Updated)

Architecture Decision Record

Key ADR Sections

For complete details, see

Technical Architecture

Current Architecture (Subprocess-Based)

Problems

Target Architecture (Direct LLM Abstraction)

Benefits

Component Design

1. LlmFactory (NEW)

Responsibilities

Interface

Implementation Strategy

2. AnthropicLlm (NEW)

Responsibilities

Interface

Key Implementation Details

3. OpenAILlm (NEW)

Responsibilities

Interface

Key Implementation Details

4. GeminiLlm (UPDATE EXISTING)

Responsibilities

Interface

Key Implementation Details

5. TaskExecutor Updates (MODIFY EXISTING)

Architecture Diagram

File Structure Changes

New Files

Updated Files

Deprecated Files (6-month timeline)

Implementation Phases

Phase 1A: Foundation (Week 1, Days 1-2, 16 hours)

Tasks

Deliverables

Success Criteria

Agent Assignment