ADR-017: WebSocket Backend Architecture
Status: Accepted Date: 2025-10-06 Deciders: Development Team, Backend Team Related: ADR-016 (NGINX), ADR-014 (theia), ADR-010 (MCP)
Context
The AZ1.AI llm IDE requires real-time bidirectional communication between the browser frontend and multiple backend services:
- Local filesystem operations (read, write, watch)
- MCP protocol (tool calls, resource access, prompts)
- Agent communication (A2A protocol)
- terminal I/O (xterm.js backend)
- File system events (file changes, directory updates)
- Session state (multi-session synchronization)
Current State
- Eclipse theia uses WebSocket for terminal and file operations
- No unified WebSocket architecture for custom services
- MCP currently uses HTTP polling (inefficient)
- File watchers use separate event streams
- No real-time agent status updates
Requirements
- Real-time: Sub-100ms latency for file operations, llm responses
- Scalable: Handle 1000+ concurrent WebSocket connections
- Reliable: Automatic reconnection, message queuing
- Secure: Authentication, encryption, authorization
- Efficient: Message compression, batching, delta updates
- Multi-service: Single WebSocket for filesystem, MCP, agents, terminal
Decision
We will implement a unified WebSocket backend architecture using the following stack:
Architecture
┌─────────────────────────────────────────────────────────────┐
│ Browser (theia Frontend) │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ WebSocket Client Manager │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │Filesystem│ │ MCP │ │ Agent │ │terminal │ │ │
│ │ │ Client │ │ Client │ │ Client │ │ Client │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ │ WebSocket (wss://) │
└──────────────────────────┼──────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ NGINX (Load Balancer) │
│ WebSocket Proxy (/ws) │
└──────────────────────────┬──────────────────────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Node.js │ │ Node.js │ │ Node.js │
│ Backend │ │ Backend │ │ Backend │
│ :4000 │ │ :4001 │ │ :4002 │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────────┼──────────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Local │ │ MCP │ │ Agent │
│Filesystem│ │ Server │ │ System │
└──────────┘ └──────────┘ └──────────┘
Technology Stack
WebSocket Server: ws (Node.js library)
Message Format: JSON-RPC 2.0 (same as MCP)
Compression: zlib (gzip compression for messages >1KB)
Authentication: JWT tokens (passed in initial handshake)
Protocol: WSS (WebSocket Secure) over TLS 1.3
Implementation
1. WebSocket Server (Node.js)
// src/backend/websocket-server.ts
import WebSocket from 'ws';
import { createServer } from 'http';
import { verify } from 'jsonwebtoken';
import zlib from 'zlib';
interface WebSocketMessage {
jsonrpc: '2.0';
id?: string | number;
method?: string;
params?: any;
result?: any;
error?: {
code: number;
message: string;
data?: any;
};
}
export class WebSocketServer {
private wss: WebSocket.Server;
private connections = new Map<string, WebSocket>();
constructor(port: number) {
const server = createServer();
this.wss = new WebSocket.Server({ server });
this.wss.on('connection', (ws, req) => {
this.handleConnection(ws, req);
});
server.listen(port, () => {
console.log(`WebSocket server listening on port ${port}`);
});
}
private async handleConnection(ws: WebSocket, req: any) {
// 1. Authenticate
const token = this.extractToken(req);
if (!token) {
ws.close(1008, 'Authentication required');
return;
}
try {
const user = verify(token, process.env.JWT_SECRET!) as { userId: string };
const sessionId = req.headers['x-session-id'] || 'default';
const connectionId = `${user.userId}:${sessionId}`;
this.connections.set(connectionId, ws);
// 2. Setup message handler
ws.on('message', async (data) => {
await this.handleMessage(ws, connectionId, data);
});
// 3. Setup error handler
ws.on('error', (error) => {
console.error(`WebSocket error for ${connectionId}:`, error);
});
// 4. Setup close handler
ws.on('close', () => {
this.connections.delete(connectionId);
console.log(`Connection closed: ${connectionId}`);
});
// 5. Send welcome message
this.sendMessage(ws, {
jsonrpc: '2.0',
method: 'server/connected',
params: { sessionId, timestamp: Date.now() }
});
} catch (error) {
ws.close(1008, 'Invalid authentication token');
}
}
private async handleMessage(ws: WebSocket, connectionId: string, data: WebSocket.Data) {
try {
// Decompress if needed
let messageData = data;
if (Buffer.isBuffer(data) && data[0] === 0x1f && data[1] === 0x8b) {
messageData = zlib.gunzipSync(data);
}
const message: WebSocketMessage = JSON.parse(messageData.toString());
// Route message based on method
if (message.method?.startsWith('filesystem/')) {
await this.handleFilesystemMessage(ws, message);
} else if (message.method?.startsWith('mcp/')) {
await this.handleMCPMessage(ws, message);
} else if (message.method?.startsWith('agent/')) {
await this.handleAgentMessage(ws, message);
} else if (message.method?.startsWith('terminal/')) {
await this.handleterminalMessage(ws, message);
} else {
this.sendError(ws, message.id, -32601, 'Method not found');
}
} catch (error: any) {
console.error('Message handling error:', error);
this.sendError(ws, undefined, -32603, error.message);
}
}
private async handleFilesystemMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;
switch (method) {
case 'filesystem/read':
const content = await this.readFile(params.path);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { content } });
break;
case 'filesystem/write':
await this.writeFile(params.path, params.content);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { success: true } });
break;
case 'filesystem/watch':
this.watchFile(ws, params.path);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { watching: true } });
break;
case 'filesystem/list':
const files = await this.listDirectory(params.path);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { files } });
break;
default:
this.sendError(ws, id, -32601, 'Filesystem method not found');
}
}
private async handleMCPMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;
switch (method) {
case 'mcp/tools/list':
const tools = await this.mcpClient.listTools();
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { tools } });
break;
case 'mcp/tools/call':
const result = await this.mcpClient.callTool(params.name, params.arguments);
this.sendMessage(ws, { jsonrpc: '2.0', id, result });
break;
case 'mcp/resources/list':
const resources = await this.mcpClient.listResources();
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { resources } });
break;
case 'mcp/resources/read':
const resource = await this.mcpClient.readResource(params.uri);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: resource });
break;
default:
this.sendError(ws, id, -32601, 'MCP method not found');
}
}
private async handleAgentMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;
switch (method) {
case 'agent/execute':
// Stream agent responses
const stream = this.agentSystem.execute(params.agentId, params.task);
for await (const chunk of stream) {
this.sendMessage(ws, {
jsonrpc: '2.0',
method: 'agent/progress',
params: { taskId: id, progress: chunk }
});
}
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { completed: true } });
break;
case 'agent/status':
const status = await this.agentSystem.getStatus(params.agentId);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: status });
break;
default:
this.sendError(ws, id, -32601, 'Agent method not found');
}
}
private async handleterminalMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;
switch (method) {
case 'terminal/input':
this.terminalManager.write(params.terminalId, params.data);
break;
case 'terminal/resize':
this.terminalManager.resize(params.terminalId, params.cols, params.rows);
break;
default:
this.sendError(ws, id, -32601, 'terminal method not found');
}
}
private sendMessage(ws: WebSocket, message: WebSocketMessage) {
const json = JSON.stringify(message);
// Compress if message is large
if (json.length > 1024) {
const compressed = zlib.gzipSync(json);
ws.send(compressed);
} else {
ws.send(json);
}
}
private sendError(ws: WebSocket, id: any, code: number, message: string) {
this.sendMessage(ws, {
jsonrpc: '2.0',
id,
error: { code, message }
});
}
private extractToken(req: any): string | null {
const auth = req.headers.authorization;
if (auth && auth.startsWith('Bearer ')) {
return auth.substring(7);
}
return null;
}
// Filesystem operations
private async readFile(path: string): Promise<string> {
// Implementation
}
private async writeFile(path: string, content: string): Promise<void> {
// Implementation
}
private watchFile(ws: WebSocket, path: string): void {
// Setup file watcher
// Send notifications on changes
}
private async listDirectory(path: string): Promise<string[]> {
// Implementation
}
}
2. WebSocket Client (theia Frontend)
// src/browser/services/websocket-client.ts
import { injectable, inject } from '@theia/core/shared/inversify';
import { Emitter, Event } from '@theia/core';
interface WebSocketMessage {
jsonrpc: '2.0';
id?: string | number;
method?: string;
params?: any;
result?: any;
error?: any;
}
@injectable()
export class WebSocketClient {
private ws: WebSocket | null = null;
private messageId = 0;
private pendingRequests = new Map<string | number, {
resolve: (result: any) => void;
reject: (error: any) => void;
}>();
private readonly onMessageEmitter = new Emitter<WebSocketMessage>();
readonly onMessage: Event<WebSocketMessage> = this.onMessageEmitter.event;
async connect(url: string, token: string, sessionId: string): Promise<void> {
return new Promise((resolve, reject) => {
this.ws = new WebSocket(url, {
headers: {
'Authorization': `Bearer ${token}`,
'X-Session-Id': sessionId
}
});
this.ws.onopen = () => {
console.log('WebSocket connected');
resolve();
};
this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
reject(error);
};
this.ws.onmessage = async (event) => {
await this.handleMessage(event.data);
};
this.ws.onclose = () => {
console.log('WebSocket closed, reconnecting...');
setTimeout(() => this.connect(url, token, sessionId), 5000);
};
});
}
private async handleMessage(data: any) {
let message: WebSocketMessage;
// Handle compressed messages
if (data instanceof Blob) {
const arrayBuffer = await data.arrayBuffer();
const buffer = new Uint8Array(arrayBuffer);
// Check for gzip header
if (buffer[0] === 0x1f && buffer[1] === 0x8b) {
const decompressed = await this.decompress(buffer);
message = JSON.parse(new TextDecoder().decode(decompressed));
} else {
message = JSON.parse(new TextDecoder().decode(buffer));
}
} else {
message = JSON.parse(data);
}
// Handle response to request
if (message.id && this.pendingRequests.has(message.id)) {
const pending = this.pendingRequests.get(message.id)!;
this.pendingRequests.delete(message.id);
if (message.error) {
pending.reject(message.error);
} else {
pending.resolve(message.result);
}
return;
}
// Handle notification (no id)
if (message.method && !message.id) {
this.onMessageEmitter.fire(message);
}
}
async request(method: string, params?: any): Promise<any> {
const id = ++this.messageId;
return new Promise((resolve, reject) => {
this.pendingRequests.set(id, { resolve, reject });
const message: WebSocketMessage = {
jsonrpc: '2.0',
id,
method,
params
};
this.ws!.send(JSON.stringify(message));
// Timeout after 30 seconds
setTimeout(() => {
if (this.pendingRequests.has(id)) {
this.pendingRequests.delete(id);
reject(new Error('Request timeout'));
}
}, 30000);
});
}
notify(method: string, params?: any): void {
const message: WebSocketMessage = {
jsonrpc: '2.0',
method,
params
};
this.ws!.send(JSON.stringify(message));
}
private async decompress(data: Uint8Array): Promise<Uint8Array> {
const ds = new DecompressionStream('gzip');
const writer = ds.writable.getWriter();
writer.write(data);
writer.close();
const decompressed = new Response(ds.readable).arrayBuffer();
return new Uint8Array(await decompressed);
}
}
3. Filesystem Service (Using WebSocket)
// src/browser/services/filesystem-service.ts
@injectable()
export class FilesystemService {
@inject(WebSocketClient)
protected readonly ws!: WebSocketClient;
async readFile(path: string): Promise<string> {
const result = await this.ws.request('filesystem/read', { path });
return result.content;
}
async writeFile(path: string, content: string): Promise<void> {
await this.ws.request('filesystem/write', { path, content });
}
async listDirectory(path: string): Promise<string[]> {
const result = await this.ws.request('filesystem/list', { path });
return result.files;
}
watchFile(path: string, callback: (event: any) => void): void {
this.ws.request('filesystem/watch', { path });
this.ws.onMessage((message) => {
if (message.method === 'filesystem/changed' && message.params.path === path) {
callback(message.params);
}
});
}
}
4. NGINX WebSocket Proxy Configuration
# Add to /etc/nginx/nginx.conf
# WebSocket upgrade map
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream websocket_backend {
least_conn;
server 127.0.0.1:4000 max_fails=3 fail_timeout=30s;
server 127.0.0.1:4001 max_fails=3 fail_timeout=30s;
server 127.0.0.1:4002 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name ide.az1.ai;
# WebSocket endpoint
location /ws {
proxy_pass http://websocket_backend;
# WebSocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
# Standard proxy headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts for WebSocket (long-lived connections)
proxy_read_timeout 86400s; # 24 hours
proxy_send_timeout 86400s; # 24 hours
proxy_connect_timeout 10s;
# Buffer settings
proxy_buffering off;
}
}
5. Docker Compose
# docker-compose.yml
version: '3.8'
services:
nginx:
image: nginx:alpine
ports:
- "443:443"
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/letsencrypt:ro
depends_on:
- websocket1
- websocket2
- websocket3
networks:
- ide-network
websocket1:
build: ./backend
ports:
- "4000:4000"
environment:
- PORT=4000
- JWT_SECRET=${JWT_SECRET}
- NODE_ENV=production
volumes:
- workspace:/workspace
networks:
- ide-network
websocket2:
build: ./backend
ports:
- "4001:4000"
environment:
- PORT=4001
- JWT_SECRET=${JWT_SECRET}
- NODE_ENV=production
volumes:
- workspace:/workspace
networks:
- ide-network
websocket3:
build: ./backend
ports:
- "4002:4000"
environment:
- PORT=4002
- JWT_SECRET=${JWT_SECRET}
- NODE_ENV=production
volumes:
- workspace:/workspace
networks:
- ide-network
networks:
ide-network:
driver: bridge
volumes:
workspace:
driver: local
Rationale
Why WebSocket?
Real-time Communication:
- ✅ Sub-100ms latency (vs HTTP polling at 1-5s)
- ✅ Bidirectional (server can push updates)
- ✅ Persistent connection (no reconnection overhead)
Efficiency:
- ✅ Low overhead (single TCP connection)
- ✅ No HTTP headers on every message (saves bandwidth)
- ✅ Built-in compression support
Ecosystem:
- ✅ Native browser support
- ✅ Mature Node.js libraries (
ws) - ✅ NGINX has excellent WebSocket proxy support
Why JSON-RPC 2.0?
Compatibility:
- ✅ Same as MCP protocol (consistency)
- ✅ Standard request/response pattern
- ✅ Error handling built-in
Simplicity:
- ✅ Easy to debug (human-readable)
- ✅ Language-agnostic
- ✅ Extensible
Why Single WebSocket?
Efficiency:
- ✅ One connection vs multiple (saves resources)
- ✅ Shared authentication
- ✅ Unified error handling
Simplicity:
- ✅ Single reconnection logic
- ✅ One place to handle compression
- ✅ Easier to monitor/debug
Alternatives Considered
Alternative 1: HTTP/2 Server-Sent Events (SSE)
Pros:
- Built on HTTP (easier to proxy)
- Automatic reconnection
- Simple API
Cons:
- ❌ Unidirectional (server → client only)
- ❌ No binary data support
- ❌ Limited browser support
Rejected: Need bidirectional communication
Alternative 2: gRPC with HTTP/2
Pros:
- Type-safe with Protobuf
- High performance
- Streaming support
Cons:
- ❌ Complex setup
- ❌ Browser support requires grpc-web proxy
- ❌ Overkill for simple messages
Rejected: Too complex for current needs
Alternative 3: Socket.io
Pros:
- Automatic reconnection
- Room/namespace support
- Fallback to polling
Cons:
- ❌ Heavier (larger bundle size)
- ❌ Custom protocol (not standard WebSocket)
- ❌ Additional abstraction layer
Rejected: Native WebSocket is sufficient
Alternative 4: Multiple WebSocket Connections
Pros:
- Isolated error handling per service
- Easier to scale services independently
Cons:
- ❌ More connections (higher resource usage)
- ❌ Multiple authentication flows
- ❌ Complex reconnection logic
Rejected: Single connection is more efficient
Consequences
Positive
✅ Real-time: Sub-100ms latency for all operations ✅ Efficient: Single persistent connection, compression ✅ Scalable: Load balanced across multiple backends ✅ Reliable: Automatic reconnection, message queuing ✅ Secure: JWT authentication, TLS encryption ✅ Unified: Single protocol for all backend services ✅ Standard: JSON-RPC 2.0 (compatible with MCP)
Negative
❌ Complexity: More complex than simple HTTP ❌ State Management: Need to track connections, sessions ❌ Debugging: Harder to debug than REST API ❌ Scaling: Need sticky sessions or shared state
Mitigation
Complexity:
- Use battle-tested libraries (
ws, theia'sMessageConnection) - Comprehensive documentation and examples
- Abstract complexity in service layer
State Management:
- Use Redis for shared session state
- Connection ID → Session ID mapping
- Graceful degradation on connection loss
Debugging:
- Structured logging (all messages logged)
- WebSocket inspector in browser DevTools
- Monitoring dashboard (active connections, message throughput)
Scaling:
- NGINX
ip_hashfor sticky sessions - Or use Redis pub/sub for cross-backend communication
- Horizontal scaling with multiple NGINX instances
Implementation Plan
Phase 1: Core WebSocket Server ✅
- Node.js WebSocket server setup
- JSON-RPC 2.0 message handling
- JWT authentication
- Basic error handling
- Connection management
Phase 2: Service Integration 🔲
- Filesystem operations (read, write, watch, list)
- MCP protocol integration (tools, resources, prompts)
- Agent system integration (execute, status, progress)
- terminal I/O integration
Phase 3: Frontend Client 🔲
- WebSocket client in theia
- Reconnection logic
- Message compression/decompression
- Service layer (filesystem, MCP, agents)
Phase 4: Load Balancing 🔲
- NGINX WebSocket proxy configuration
- Multiple backend instances
- Sticky session support
- Health checks
Phase 5: Production Hardening 🔲
- Redis for session state
- Message queue for reliability
- Monitoring (Prometheus, Grafana)
- Load testing (10K concurrent connections)
Success Metrics
Performance:
- < 100ms latency for file operations
- < 50ms latency for WebSocket messages
- 1000+ concurrent connections per backend instance
Reliability:
- < 5s reconnection time
- 99.9% message delivery
- Zero data loss on connection drop
Scalability:
- Linear scaling with backend instances
- 10K concurrent users with 10 backend instances
- < 10MB memory per connection
Related Decisions
- ADR-016: NGINX Load Balancer - Frontend load balancing
- ADR-014: Eclipse theia - IDE framework
- ADR-010: MCP Protocol - Tool/resource access
- ADR-009: xterm.js - terminal I/O
References
WebSocket:
JSON-RPC:
Best Practices:
Status: ✅ Accepted Next Review: 2025-11-06 (1 month) Last Updated: 2025-10-06