Architectural Investment Deep Dive: Protocol Translation & Session State Management
Analysis Date: 2025-10-14
Focus: Detailed breakdown of architectural investment requirements for multi-llm support
Investment Scope Overview​
Total Estimated Effort: 18-24 months, 4-6 engineers
Code Impact: ~40,000 lines of new code, ~15,000 lines refactored
Infrastructure: New abstraction layers, protocol adapters, state management systems
Part 1: Protocol Translation Layer Investment​
Current Protocol Coupling Depth​
Critical Dependencies across 8 core components:
hlyr/src/mcp.ts- 192 lines of MCP-specific codehld/mcp/server.go- 261 lines of daemon MCP integrationhld/api/handlers/proxy_transform.go- 370 lines of message transformationhld/session/manager.go- Claude binary integration and MCP injection
Refactoring Scope: ~2,500 lines directly MCP-coupled, requiring complete abstraction
Required Translation Architecture​
1. Universal Protocol Interface​
type ProtocolAdapter interface {
// Core transformations (5 methods × 4 providers = 20 implementations)
TransformInbound([]byte, string) (*UniversalMessage, error)
TransformOutbound(*UniversalMessage, string) ([]byte, error)
TranslateToolCall([]byte, string) (*UniversalToolCall, error)
TranslateApprovalResponse(*ApprovalDecision, string) ([]byte, error)
InjectApprovalMechanism(interface{}, *ApprovalConfig) error
}
// 4 adapters × 2,000 lines each = 8,000 lines
// MCP, OpenAI, Gemini Extensions, Custom Protocol adapters
2. Message Transformation Pipeline​
Complexity: Handle 6 different message formats with validation and error handling
type MessageTransformer struct {
validators map[string]*jsonschema.Schema // 6 schemas
cache map[string][]byte // Transform cache
metrics *TransformMetrics // Performance monitoring
}
// Estimated: 3,000 lines for transformation logic
// Additional: 1,500 lines for validation and caching
3. Protocol-Specific Error Handling​
Challenge: Map error codes across providers with different semantics
- MCP: JSON-RPC 2.0 error codes
- OpenAI: HTTP status + error types
- Gemini: Extension-specific errors
- Custom: Provider-defined formats
type ProtocolErrorTranslator struct {
mappings map[string]map[string]UniversalErrorCode
circuit map[string]*CircuitBreaker
}
// Estimated: 2,500 lines for error mapping and circuit breakers
Performance Investment Requirements​
Latency Overhead Analysis​
- Current Claude MCP: 50ms baseline
- Protocol translation: +15-25ms per transformation
- Validation overhead: +5-10ms per message
- Total impact: 40-85ms additional latency per interaction
Caching and Optimization​
type ProtocolCache struct {
transformCache map[string][]byte // 500MB memory allocation
schemaCache map[string]*jsonschema.Schema
connectionPools map[string]*ConnectionPool // Provider-specific pools
metrics *CacheMetrics
}
// Estimated: 2,000 lines for caching infrastructure
Testing Investment​
Test Matrix: 4 providers × 6 message types × 3 error scenarios = 72 test cases
- Unit tests: ~3,000 lines
- Integration tests: ~4,000 lines
- Performance benchmarks: ~1,500 lines
- Total testing: ~8,500 lines
Part 2: Session State Management Investment​
Current State Architecture Limitations​
Database Schema Issues:
-- Claude-specific fields throughout schema
claude_session_id TEXT, -- Provider-agnostic needed
model TEXT, -- Provider-specific model formats
proxy_model_override TEXT, -- Limited to proxy providers
Required Schema Refactoring:
-- New provider-agnostic schema (+15 new columns)
ALTER TABLE sessions ADD COLUMN provider_type TEXT;
ALTER TABLE sessions ADD COLUMN provider_config JSON;
ALTER TABLE sessions ADD COLUMN provider_session_id TEXT;
ALTER TABLE sessions ADD COLUMN provider_metadata JSON;
ALTER TABLE sessions ADD COLUMN context_tokens_used INTEGER;
ALTER TABLE sessions ADD COLUMN context_tokens_limit INTEGER;
-- ... additional provider-neutral fields
State Management Complexity​
1. Stateful vs Stateless Provider Abstraction​
type SessionStateManager interface {
// 3 implementations: Local Process, HTTP API, Hybrid
CreateSession(SessionConfig) (ProviderSession, error)
RestoreSession(string) (ProviderSession, error)
PersistState(*SessionSnapshot) error
}
type SessionSnapshot struct {
Messages []UniversalMessage // Variable size per provider
ToolCalls []UniversalToolCall // Cross-provider tool tracking
ApprovalState map[string]Approval // Approval correlation
ProviderState map[string]interface{} // Provider-specific state
ContextTokens int // Provider-specific limits
}
// Estimated: 4,000 lines for state management abstractions
2. Context Window Management​
Challenge: Handle different context limits and formats
- Claude: 200K tokens, structured messages
- GPT-4: 128K tokens, OpenAI format
- Gemini: 2M tokens, conversation checkpoints
- Grok: Variable limits, multi-round execution
type ContextManager struct {
summarizers map[string]ContextSummarizer // Provider-specific
transformers map[string]MessageTransformer // Format conversion
limiters map[string]TokenLimiter // Context enforcement
}
// Estimated: 3,500 lines for context management
3. Cross-Provider Event Sourcing​
type EventStore interface {
StoreEvent(*UniversalEvent) error
GetEvents(sessionID string, filters EventFilters) ([]UniversalEvent, error)
ReconstructSession(sessionID string, provider string) (*SessionState, error)
}
type UniversalEvent struct {
ID string
SessionID string
Provider string // Event attribution
Type UniversalEventType // Normalized event types
Data map[string]interface{} // Provider-neutral data
ProviderRaw json.RawMessage // Original provider format
Timestamp time.Time
}
// Estimated: 2,500 lines for event sourcing system
Memory and Performance Impact​
Memory Requirements​
- Current per session: ~10MB
- Multi-provider overhead: +15MB per session
- State caching: +50MB for 100 concurrent sessions
- Protocol adapters: +100MB for adapter instances
Concurrent Session Handling​
type ConcurrentSessionManager struct {
sessions map[string]ProviderSession // Mixed session types
pools map[string]*ResourcePool // Provider-specific pools
scheduler *SessionScheduler // Load balancing
monitor *HealthMonitor // Cross-provider health
}
// Estimated: 2,000 lines for concurrency management
Part 3: Investment Quantification​
Development Timeline​
Phase 1: Core Abstraction (6 months, 3 engineers)
- Universal message format: 4 weeks
- Protocol adapter interface: 6 weeks
- Session state abstraction: 8 weeks
- Database schema migration: 4 weeks
Phase 2: Provider Implementations (12 months, 4 engineers)
- Claude adapter (maintain compatibility): 6 weeks
- OpenAI/Grok adapter: 8 weeks each
- Gemini adapter: 10 weeks
- Testing and integration: 16 weeks
Phase 3: Production Hardening (6 months, 2 engineers)
- Performance optimization: 12 weeks
- Error handling and monitoring: 8 weeks
- Documentation and migration tools: 4 weeks
Code Volume Estimates​
New Code:
- Protocol adapters: 12,000 lines
- State management: 8,000 lines
- Testing infrastructure: 10,000 lines
- Migration and tooling: 5,000 lines
Total New: ~35,000 lines
Refactored Code:
- Session management: 5,000 lines
- Database layer: 3,000 lines
- API handlers: 4,000 lines
- UI components: 3,000 lines
Total Refactored: ~15,000 lines
Infrastructure Requirements​
- Development environments: 4 provider test setups
- CI/CD expansion: 3x increase in test matrix
- Monitoring systems: Provider-specific dashboards
- Documentation: Provider integration guides
Risk Factors and Mitigation​
Technical Risks​
-
Protocol evolution: Providers change APIs independently
- Mitigation: Version management and backward compatibility layers
-
Performance degradation: Translation overhead impacts user experience
- Mitigation: Caching, connection pooling, async processing
-
Complexity explosion: Maintenance burden becomes unsustainable
- Mitigation: Clear abstraction boundaries, automated testing
Business Risks​
- Development timeline: 18-24 months before multi-llm value delivery
- Resource allocation: 4-6 engineers dedicated to this effort
- Opportunity cost: Delayed feature development in other areas
Success Criteria​
- Feature parity: 95% of Claude Code features work across all providers
- Performance: <2x latency increase from current baseline
- Reliability: 99.9% approval delivery success rate
- Maintainability: <20% increase in bug report volume
Conclusion​
The architectural investment for multi-llm support represents a fundamental platform evolution requiring:
- 18-24 months development time
- 4-6 engineer team
- ~50,000 lines of code (new + refactored)
- Significant infrastructure expansion
While technically feasible, this investment should be weighed against alternative strategies like provider-specific implementations or gradual migration approaches. The complexity and timeline suggest this is a major architectural decision requiring substantial organizational commitment.