Architectural Investment Deep Dive: Protocol Translation & Session State Management

Analysis Date: 2025-10-14
Focus: Detailed breakdown of architectural investment requirements for multi-llm support

Investment Scope Overview

Total Estimated Effort: 18-24 months, 4-6 engineers
Code Impact: ~40,000 lines of new code, ~15,000 lines refactored
Infrastructure: New abstraction layers, protocol adapters, state management systems

Part 1: Protocol Translation Layer Investment

Current Protocol Coupling Depth

Critical Dependencies across 8 core components:

hlyr/src/mcp.ts - 192 lines of MCP-specific code
hld/mcp/server.go - 261 lines of daemon MCP integration
hld/api/handlers/proxy_transform.go - 370 lines of message transformation
hld/session/manager.go - Claude binary integration and MCP injection

Refactoring Scope: ~2,500 lines directly MCP-coupled, requiring complete abstraction

Required Translation Architecture

1. Universal Protocol Interface

type ProtocolAdapter interface {
    // Core transformations (5 methods × 4 providers = 20 implementations)
    TransformInbound([]byte, string) (*UniversalMessage, error)
    TransformOutbound(*UniversalMessage, string) ([]byte, error)
    TranslateToolCall([]byte, string) (*UniversalToolCall, error)
    TranslateApprovalResponse(*ApprovalDecision, string) ([]byte, error)
    InjectApprovalMechanism(interface{}, *ApprovalConfig) error
}

// 4 adapters × 2,000 lines each = 8,000 lines
// MCP, OpenAI, Gemini Extensions, Custom Protocol adapters

2. Message Transformation Pipeline

Complexity: Handle 6 different message formats with validation and error handling

type MessageTransformer struct {
    validators map[string]*jsonschema.Schema    // 6 schemas
    cache      map[string][]byte                // Transform cache
    metrics    *TransformMetrics               // Performance monitoring
}

// Estimated: 3,000 lines for transformation logic
// Additional: 1,500 lines for validation and caching

3. Protocol-Specific Error Handling

Challenge: Map error codes across providers with different semantics

MCP: JSON-RPC 2.0 error codes
OpenAI: HTTP status + error types
Gemini: Extension-specific errors
Custom: Provider-defined formats

type ProtocolErrorTranslator struct {
    mappings map[string]map[string]UniversalErrorCode
    circuit  map[string]*CircuitBreaker
}

// Estimated: 2,500 lines for error mapping and circuit breakers

Performance Investment Requirements

Latency Overhead Analysis

Current Claude MCP: 50ms baseline
Protocol translation: +15-25ms per transformation
Validation overhead: +5-10ms per message
Total impact: 40-85ms additional latency per interaction

Caching and Optimization

type ProtocolCache struct {
    transformCache   map[string][]byte           // 500MB memory allocation
    schemaCache      map[string]*jsonschema.Schema
    connectionPools  map[string]*ConnectionPool   // Provider-specific pools
    metrics         *CacheMetrics
}

// Estimated: 2,000 lines for caching infrastructure

Testing Investment

Test Matrix: 4 providers × 6 message types × 3 error scenarios = 72 test cases

Unit tests: ~3,000 lines
Integration tests: ~4,000 lines
Performance benchmarks: ~1,500 lines
Total testing: ~8,500 lines

Part 2: Session State Management Investment

Current State Architecture Limitations

Database Schema Issues:

-- Claude-specific fields throughout schema
claude_session_id TEXT,           -- Provider-agnostic needed
model TEXT,                       -- Provider-specific model formats
proxy_model_override TEXT,        -- Limited to proxy providers

Required Schema Refactoring:

-- New provider-agnostic schema (+15 new columns)
ALTER TABLE sessions ADD COLUMN provider_type TEXT;
ALTER TABLE sessions ADD COLUMN provider_config JSON;  
ALTER TABLE sessions ADD COLUMN provider_session_id TEXT;
ALTER TABLE sessions ADD COLUMN provider_metadata JSON;
ALTER TABLE sessions ADD COLUMN context_tokens_used INTEGER;
ALTER TABLE sessions ADD COLUMN context_tokens_limit INTEGER;
-- ... additional provider-neutral fields

State Management Complexity

1. Stateful vs Stateless Provider Abstraction

type SessionStateManager interface {
    // 3 implementations: Local Process, HTTP API, Hybrid
    CreateSession(SessionConfig) (ProviderSession, error)
    RestoreSession(string) (ProviderSession, error) 
    PersistState(*SessionSnapshot) error
}

type SessionSnapshot struct {
    Messages         []UniversalMessage    // Variable size per provider
    ToolCalls        []UniversalToolCall   // Cross-provider tool tracking
    ApprovalState    map[string]Approval   // Approval correlation
    ProviderState    map[string]interface{} // Provider-specific state
    ContextTokens    int                   // Provider-specific limits
}

// Estimated: 4,000 lines for state management abstractions

2. Context Window Management

Challenge: Handle different context limits and formats

Claude: 200K tokens, structured messages
GPT-4: 128K tokens, OpenAI format
Gemini: 2M tokens, conversation checkpoints
Grok: Variable limits, multi-round execution

type ContextManager struct {
    summarizers  map[string]ContextSummarizer  // Provider-specific
    transformers map[string]MessageTransformer // Format conversion
    limiters     map[string]TokenLimiter       // Context enforcement
}

// Estimated: 3,500 lines for context management

3. Cross-Provider Event Sourcing

type EventStore interface {
    StoreEvent(*UniversalEvent) error
    GetEvents(sessionID string, filters EventFilters) ([]UniversalEvent, error)
    ReconstructSession(sessionID string, provider string) (*SessionState, error)
}

type UniversalEvent struct {
    ID           string
    SessionID    string  
    Provider     string                 // Event attribution
    Type         UniversalEventType     // Normalized event types
    Data         map[string]interface{} // Provider-neutral data
    ProviderRaw  json.RawMessage        // Original provider format
    Timestamp    time.Time
}

// Estimated: 2,500 lines for event sourcing system

Memory and Performance Impact

Memory Requirements

Current per session: ~10MB
Multi-provider overhead: +15MB per session
State caching: +50MB for 100 concurrent sessions
Protocol adapters: +100MB for adapter instances

Concurrent Session Handling

type ConcurrentSessionManager struct {
    sessions    map[string]ProviderSession    // Mixed session types
    pools       map[string]*ResourcePool      // Provider-specific pools
    scheduler   *SessionScheduler             // Load balancing
    monitor     *HealthMonitor                // Cross-provider health
}

// Estimated: 2,000 lines for concurrency management

Part 3: Investment Quantification

Development Timeline

Phase 1: Core Abstraction (6 months, 3 engineers)

Universal message format: 4 weeks
Protocol adapter interface: 6 weeks
Session state abstraction: 8 weeks
Database schema migration: 4 weeks

Phase 2: Provider Implementations (12 months, 4 engineers)

Claude adapter (maintain compatibility): 6 weeks
OpenAI/Grok adapter: 8 weeks each
Gemini adapter: 10 weeks
Testing and integration: 16 weeks

Phase 3: Production Hardening (6 months, 2 engineers)

Performance optimization: 12 weeks
Error handling and monitoring: 8 weeks
Documentation and migration tools: 4 weeks

Code Volume Estimates

New Code:
- Protocol adapters: 12,000 lines
- State management: 8,000 lines
- Testing infrastructure: 10,000 lines  
- Migration and tooling: 5,000 lines
Total New: ~35,000 lines

Refactored Code:
- Session management: 5,000 lines
- Database layer: 3,000 lines
- API handlers: 4,000 lines
- UI components: 3,000 lines  
Total Refactored: ~15,000 lines

Infrastructure Requirements

Development environments: 4 provider test setups
CI/CD expansion: 3x increase in test matrix
Monitoring systems: Provider-specific dashboards
Documentation: Provider integration guides

Risk Factors and Mitigation

Technical Risks

Protocol evolution: Providers change APIs independently
- Mitigation: Version management and backward compatibility layers
Performance degradation: Translation overhead impacts user experience
- Mitigation: Caching, connection pooling, async processing
Complexity explosion: Maintenance burden becomes unsustainable
- Mitigation: Clear abstraction boundaries, automated testing

Business Risks

Development timeline: 18-24 months before multi-llm value delivery
Resource allocation: 4-6 engineers dedicated to this effort
Opportunity cost: Delayed feature development in other areas

Success Criteria

Feature parity: 95% of Claude Code features work across all providers
Performance: <2x latency increase from current baseline
Reliability: 99.9% approval delivery success rate
Maintainability: <20% increase in bug report volume

Conclusion

The architectural investment for multi-llm support represents a fundamental platform evolution requiring:

18-24 months development time
4-6 engineer team
~50,000 lines of code (new + refactored)
Significant infrastructure expansion

While technically feasible, this investment should be weighed against alternative strategies like provider-specific implementations or gradual migration approaches. The complexity and timeline suggest this is a major architectural decision requiring substantial organizational commitment.

Investment Scope Overview​

Part 1: Protocol Translation Layer Investment​

Current Protocol Coupling Depth​

Required Translation Architecture​

1. Universal Protocol Interface​

2. Message Transformation Pipeline​

3. Protocol-Specific Error Handling​

Performance Investment Requirements​

Latency Overhead Analysis​

Caching and Optimization​

Testing Investment​

Part 2: Session State Management Investment​

Current State Architecture Limitations​

State Management Complexity​

1. Stateful vs Stateless Provider Abstraction​

2. Context Window Management​

3. Cross-Provider Event Sourcing​

Memory and Performance Impact​

Memory Requirements​

Concurrent Session Handling​

Part 3: Investment Quantification​

Development Timeline​

Code Volume Estimates​

Infrastructure Requirements​

Risk Factors and Mitigation​

Technical Risks​

Business Risks​

Success Criteria​

Conclusion​