Skip to main content

Architectural Investment Deep Dive: Protocol Translation & Session State Management

Analysis Date: 2025-10-14
Focus: Detailed breakdown of architectural investment requirements for multi-llm support

Investment Scope Overview​

Total Estimated Effort: 18-24 months, 4-6 engineers
Code Impact: ~40,000 lines of new code, ~15,000 lines refactored
Infrastructure: New abstraction layers, protocol adapters, state management systems

Part 1: Protocol Translation Layer Investment​

Current Protocol Coupling Depth​

Critical Dependencies across 8 core components:

  • hlyr/src/mcp.ts - 192 lines of MCP-specific code
  • hld/mcp/server.go - 261 lines of daemon MCP integration
  • hld/api/handlers/proxy_transform.go - 370 lines of message transformation
  • hld/session/manager.go - Claude binary integration and MCP injection

Refactoring Scope: ~2,500 lines directly MCP-coupled, requiring complete abstraction

Required Translation Architecture​

1. Universal Protocol Interface​

type ProtocolAdapter interface {
// Core transformations (5 methods × 4 providers = 20 implementations)
TransformInbound([]byte, string) (*UniversalMessage, error)
TransformOutbound(*UniversalMessage, string) ([]byte, error)
TranslateToolCall([]byte, string) (*UniversalToolCall, error)
TranslateApprovalResponse(*ApprovalDecision, string) ([]byte, error)
InjectApprovalMechanism(interface{}, *ApprovalConfig) error
}

// 4 adapters × 2,000 lines each = 8,000 lines
// MCP, OpenAI, Gemini Extensions, Custom Protocol adapters

2. Message Transformation Pipeline​

Complexity: Handle 6 different message formats with validation and error handling

type MessageTransformer struct {
validators map[string]*jsonschema.Schema // 6 schemas
cache map[string][]byte // Transform cache
metrics *TransformMetrics // Performance monitoring
}

// Estimated: 3,000 lines for transformation logic
// Additional: 1,500 lines for validation and caching

3. Protocol-Specific Error Handling​

Challenge: Map error codes across providers with different semantics

  • MCP: JSON-RPC 2.0 error codes
  • OpenAI: HTTP status + error types
  • Gemini: Extension-specific errors
  • Custom: Provider-defined formats
type ProtocolErrorTranslator struct {
mappings map[string]map[string]UniversalErrorCode
circuit map[string]*CircuitBreaker
}

// Estimated: 2,500 lines for error mapping and circuit breakers

Performance Investment Requirements​

Latency Overhead Analysis​

  • Current Claude MCP: 50ms baseline
  • Protocol translation: +15-25ms per transformation
  • Validation overhead: +5-10ms per message
  • Total impact: 40-85ms additional latency per interaction

Caching and Optimization​

type ProtocolCache struct {
transformCache map[string][]byte // 500MB memory allocation
schemaCache map[string]*jsonschema.Schema
connectionPools map[string]*ConnectionPool // Provider-specific pools
metrics *CacheMetrics
}

// Estimated: 2,000 lines for caching infrastructure

Testing Investment​

Test Matrix: 4 providers × 6 message types × 3 error scenarios = 72 test cases

  • Unit tests: ~3,000 lines
  • Integration tests: ~4,000 lines
  • Performance benchmarks: ~1,500 lines
  • Total testing: ~8,500 lines

Part 2: Session State Management Investment​

Current State Architecture Limitations​

Database Schema Issues:

-- Claude-specific fields throughout schema
claude_session_id TEXT, -- Provider-agnostic needed
model TEXT, -- Provider-specific model formats
proxy_model_override TEXT, -- Limited to proxy providers

Required Schema Refactoring:

-- New provider-agnostic schema (+15 new columns)
ALTER TABLE sessions ADD COLUMN provider_type TEXT;
ALTER TABLE sessions ADD COLUMN provider_config JSON;
ALTER TABLE sessions ADD COLUMN provider_session_id TEXT;
ALTER TABLE sessions ADD COLUMN provider_metadata JSON;
ALTER TABLE sessions ADD COLUMN context_tokens_used INTEGER;
ALTER TABLE sessions ADD COLUMN context_tokens_limit INTEGER;
-- ... additional provider-neutral fields

State Management Complexity​

1. Stateful vs Stateless Provider Abstraction​

type SessionStateManager interface {
// 3 implementations: Local Process, HTTP API, Hybrid
CreateSession(SessionConfig) (ProviderSession, error)
RestoreSession(string) (ProviderSession, error)
PersistState(*SessionSnapshot) error
}

type SessionSnapshot struct {
Messages []UniversalMessage // Variable size per provider
ToolCalls []UniversalToolCall // Cross-provider tool tracking
ApprovalState map[string]Approval // Approval correlation
ProviderState map[string]interface{} // Provider-specific state
ContextTokens int // Provider-specific limits
}

// Estimated: 4,000 lines for state management abstractions

2. Context Window Management​

Challenge: Handle different context limits and formats

  • Claude: 200K tokens, structured messages
  • GPT-4: 128K tokens, OpenAI format
  • Gemini: 2M tokens, conversation checkpoints
  • Grok: Variable limits, multi-round execution
type ContextManager struct {
summarizers map[string]ContextSummarizer // Provider-specific
transformers map[string]MessageTransformer // Format conversion
limiters map[string]TokenLimiter // Context enforcement
}

// Estimated: 3,500 lines for context management

3. Cross-Provider Event Sourcing​

type EventStore interface {
StoreEvent(*UniversalEvent) error
GetEvents(sessionID string, filters EventFilters) ([]UniversalEvent, error)
ReconstructSession(sessionID string, provider string) (*SessionState, error)
}

type UniversalEvent struct {
ID string
SessionID string
Provider string // Event attribution
Type UniversalEventType // Normalized event types
Data map[string]interface{} // Provider-neutral data
ProviderRaw json.RawMessage // Original provider format
Timestamp time.Time
}

// Estimated: 2,500 lines for event sourcing system

Memory and Performance Impact​

Memory Requirements​

  • Current per session: ~10MB
  • Multi-provider overhead: +15MB per session
  • State caching: +50MB for 100 concurrent sessions
  • Protocol adapters: +100MB for adapter instances

Concurrent Session Handling​

type ConcurrentSessionManager struct {
sessions map[string]ProviderSession // Mixed session types
pools map[string]*ResourcePool // Provider-specific pools
scheduler *SessionScheduler // Load balancing
monitor *HealthMonitor // Cross-provider health
}

// Estimated: 2,000 lines for concurrency management

Part 3: Investment Quantification​

Development Timeline​

Phase 1: Core Abstraction (6 months, 3 engineers)

  • Universal message format: 4 weeks
  • Protocol adapter interface: 6 weeks
  • Session state abstraction: 8 weeks
  • Database schema migration: 4 weeks

Phase 2: Provider Implementations (12 months, 4 engineers)

  • Claude adapter (maintain compatibility): 6 weeks
  • OpenAI/Grok adapter: 8 weeks each
  • Gemini adapter: 10 weeks
  • Testing and integration: 16 weeks

Phase 3: Production Hardening (6 months, 2 engineers)

  • Performance optimization: 12 weeks
  • Error handling and monitoring: 8 weeks
  • Documentation and migration tools: 4 weeks

Code Volume Estimates​

New Code:
- Protocol adapters: 12,000 lines
- State management: 8,000 lines
- Testing infrastructure: 10,000 lines
- Migration and tooling: 5,000 lines
Total New: ~35,000 lines

Refactored Code:
- Session management: 5,000 lines
- Database layer: 3,000 lines
- API handlers: 4,000 lines
- UI components: 3,000 lines
Total Refactored: ~15,000 lines

Infrastructure Requirements​

  • Development environments: 4 provider test setups
  • CI/CD expansion: 3x increase in test matrix
  • Monitoring systems: Provider-specific dashboards
  • Documentation: Provider integration guides

Risk Factors and Mitigation​

Technical Risks​

  1. Protocol evolution: Providers change APIs independently

    • Mitigation: Version management and backward compatibility layers
  2. Performance degradation: Translation overhead impacts user experience

    • Mitigation: Caching, connection pooling, async processing
  3. Complexity explosion: Maintenance burden becomes unsustainable

    • Mitigation: Clear abstraction boundaries, automated testing

Business Risks​

  1. Development timeline: 18-24 months before multi-llm value delivery
  2. Resource allocation: 4-6 engineers dedicated to this effort
  3. Opportunity cost: Delayed feature development in other areas

Success Criteria​

  • Feature parity: 95% of Claude Code features work across all providers
  • Performance: <2x latency increase from current baseline
  • Reliability: 99.9% approval delivery success rate
  • Maintainability: <20% increase in bug report volume

Conclusion​

The architectural investment for multi-llm support represents a fundamental platform evolution requiring:

  • 18-24 months development time
  • 4-6 engineer team
  • ~50,000 lines of code (new + refactored)
  • Significant infrastructure expansion

While technically feasible, this investment should be weighed against alternative strategies like provider-specific implementations or gradual migration approaches. The complexity and timeline suggest this is a major architectural decision requiring substantial organizational commitment.