Skip to main content

ADR-017: WebSocket Backend Architecture

Status: Accepted Date: 2025-10-06 Deciders: Development Team, Backend Team Related: ADR-016 (NGINX), ADR-014 (theia), ADR-010 (MCP)


Context

The AZ1.AI llm IDE requires real-time bidirectional communication between the browser frontend and multiple backend services:

  • Local filesystem operations (read, write, watch)
  • MCP protocol (tool calls, resource access, prompts)
  • Agent communication (A2A protocol)
  • terminal I/O (xterm.js backend)
  • File system events (file changes, directory updates)
  • Session state (multi-session synchronization)

Current State

  • Eclipse theia uses WebSocket for terminal and file operations
  • No unified WebSocket architecture for custom services
  • MCP currently uses HTTP polling (inefficient)
  • File watchers use separate event streams
  • No real-time agent status updates

Requirements

  1. Real-time: Sub-100ms latency for file operations, llm responses
  2. Scalable: Handle 1000+ concurrent WebSocket connections
  3. Reliable: Automatic reconnection, message queuing
  4. Secure: Authentication, encryption, authorization
  5. Efficient: Message compression, batching, delta updates
  6. Multi-service: Single WebSocket for filesystem, MCP, agents, terminal

Decision

We will implement a unified WebSocket backend architecture using the following stack:

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Browser (theia Frontend) │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ WebSocket Client Manager │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │ │
│ │ │Filesystem│ │ MCP │ │ Agent │ │terminal │ │ │
│ │ │ Client │ │ Client │ │ Client │ │ Client │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ │ WebSocket (wss://) │
└──────────────────────────┼──────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ NGINX (Load Balancer) │
│ WebSocket Proxy (/ws) │
└──────────────────────────┬──────────────────────────────────┘

┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Node.js │ │ Node.js │ │ Node.js │
│ Backend │ │ Backend │ │ Backend │
│ :4000 │ │ :4001 │ │ :4002 │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└──────────────────┼──────────────────┘

┌──────────────────┼──────────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Local │ │ MCP │ │ Agent │
│Filesystem│ │ Server │ │ System │
└──────────┘ └──────────┘ └──────────┘

Technology Stack

WebSocket Server: ws (Node.js library) Message Format: JSON-RPC 2.0 (same as MCP) Compression: zlib (gzip compression for messages >1KB) Authentication: JWT tokens (passed in initial handshake) Protocol: WSS (WebSocket Secure) over TLS 1.3


Implementation

1. WebSocket Server (Node.js)

// src/backend/websocket-server.ts

import WebSocket from 'ws';
import { createServer } from 'http';
import { verify } from 'jsonwebtoken';
import zlib from 'zlib';

interface WebSocketMessage {
jsonrpc: '2.0';
id?: string | number;
method?: string;
params?: any;
result?: any;
error?: {
code: number;
message: string;
data?: any;
};
}

export class WebSocketServer {
private wss: WebSocket.Server;
private connections = new Map<string, WebSocket>();

constructor(port: number) {
const server = createServer();
this.wss = new WebSocket.Server({ server });

this.wss.on('connection', (ws, req) => {
this.handleConnection(ws, req);
});

server.listen(port, () => {
console.log(`WebSocket server listening on port ${port}`);
});
}

private async handleConnection(ws: WebSocket, req: any) {
// 1. Authenticate
const token = this.extractToken(req);
if (!token) {
ws.close(1008, 'Authentication required');
return;
}

try {
const user = verify(token, process.env.JWT_SECRET!) as { userId: string };
const sessionId = req.headers['x-session-id'] || 'default';
const connectionId = `${user.userId}:${sessionId}`;

this.connections.set(connectionId, ws);

// 2. Setup message handler
ws.on('message', async (data) => {
await this.handleMessage(ws, connectionId, data);
});

// 3. Setup error handler
ws.on('error', (error) => {
console.error(`WebSocket error for ${connectionId}:`, error);
});

// 4. Setup close handler
ws.on('close', () => {
this.connections.delete(connectionId);
console.log(`Connection closed: ${connectionId}`);
});

// 5. Send welcome message
this.sendMessage(ws, {
jsonrpc: '2.0',
method: 'server/connected',
params: { sessionId, timestamp: Date.now() }
});

} catch (error) {
ws.close(1008, 'Invalid authentication token');
}
}

private async handleMessage(ws: WebSocket, connectionId: string, data: WebSocket.Data) {
try {
// Decompress if needed
let messageData = data;
if (Buffer.isBuffer(data) && data[0] === 0x1f && data[1] === 0x8b) {
messageData = zlib.gunzipSync(data);
}

const message: WebSocketMessage = JSON.parse(messageData.toString());

// Route message based on method
if (message.method?.startsWith('filesystem/')) {
await this.handleFilesystemMessage(ws, message);
} else if (message.method?.startsWith('mcp/')) {
await this.handleMCPMessage(ws, message);
} else if (message.method?.startsWith('agent/')) {
await this.handleAgentMessage(ws, message);
} else if (message.method?.startsWith('terminal/')) {
await this.handleterminalMessage(ws, message);
} else {
this.sendError(ws, message.id, -32601, 'Method not found');
}

} catch (error: any) {
console.error('Message handling error:', error);
this.sendError(ws, undefined, -32603, error.message);
}
}

private async handleFilesystemMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;

switch (method) {
case 'filesystem/read':
const content = await this.readFile(params.path);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { content } });
break;

case 'filesystem/write':
await this.writeFile(params.path, params.content);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { success: true } });
break;

case 'filesystem/watch':
this.watchFile(ws, params.path);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { watching: true } });
break;

case 'filesystem/list':
const files = await this.listDirectory(params.path);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { files } });
break;

default:
this.sendError(ws, id, -32601, 'Filesystem method not found');
}
}

private async handleMCPMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;

switch (method) {
case 'mcp/tools/list':
const tools = await this.mcpClient.listTools();
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { tools } });
break;

case 'mcp/tools/call':
const result = await this.mcpClient.callTool(params.name, params.arguments);
this.sendMessage(ws, { jsonrpc: '2.0', id, result });
break;

case 'mcp/resources/list':
const resources = await this.mcpClient.listResources();
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { resources } });
break;

case 'mcp/resources/read':
const resource = await this.mcpClient.readResource(params.uri);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: resource });
break;

default:
this.sendError(ws, id, -32601, 'MCP method not found');
}
}

private async handleAgentMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;

switch (method) {
case 'agent/execute':
// Stream agent responses
const stream = this.agentSystem.execute(params.agentId, params.task);
for await (const chunk of stream) {
this.sendMessage(ws, {
jsonrpc: '2.0',
method: 'agent/progress',
params: { taskId: id, progress: chunk }
});
}
this.sendMessage(ws, { jsonrpc: '2.0', id, result: { completed: true } });
break;

case 'agent/status':
const status = await this.agentSystem.getStatus(params.agentId);
this.sendMessage(ws, { jsonrpc: '2.0', id, result: status });
break;

default:
this.sendError(ws, id, -32601, 'Agent method not found');
}
}

private async handleterminalMessage(ws: WebSocket, message: WebSocketMessage) {
const { method, params, id } = message;

switch (method) {
case 'terminal/input':
this.terminalManager.write(params.terminalId, params.data);
break;

case 'terminal/resize':
this.terminalManager.resize(params.terminalId, params.cols, params.rows);
break;

default:
this.sendError(ws, id, -32601, 'terminal method not found');
}
}

private sendMessage(ws: WebSocket, message: WebSocketMessage) {
const json = JSON.stringify(message);

// Compress if message is large
if (json.length > 1024) {
const compressed = zlib.gzipSync(json);
ws.send(compressed);
} else {
ws.send(json);
}
}

private sendError(ws: WebSocket, id: any, code: number, message: string) {
this.sendMessage(ws, {
jsonrpc: '2.0',
id,
error: { code, message }
});
}

private extractToken(req: any): string | null {
const auth = req.headers.authorization;
if (auth && auth.startsWith('Bearer ')) {
return auth.substring(7);
}
return null;
}

// Filesystem operations
private async readFile(path: string): Promise<string> {
// Implementation
}

private async writeFile(path: string, content: string): Promise<void> {
// Implementation
}

private watchFile(ws: WebSocket, path: string): void {
// Setup file watcher
// Send notifications on changes
}

private async listDirectory(path: string): Promise<string[]> {
// Implementation
}
}

2. WebSocket Client (theia Frontend)

// src/browser/services/websocket-client.ts

import { injectable, inject } from '@theia/core/shared/inversify';
import { Emitter, Event } from '@theia/core';

interface WebSocketMessage {
jsonrpc: '2.0';
id?: string | number;
method?: string;
params?: any;
result?: any;
error?: any;
}

@injectable()
export class WebSocketClient {
private ws: WebSocket | null = null;
private messageId = 0;
private pendingRequests = new Map<string | number, {
resolve: (result: any) => void;
reject: (error: any) => void;
}>();

private readonly onMessageEmitter = new Emitter<WebSocketMessage>();
readonly onMessage: Event<WebSocketMessage> = this.onMessageEmitter.event;

async connect(url: string, token: string, sessionId: string): Promise<void> {
return new Promise((resolve, reject) => {
this.ws = new WebSocket(url, {
headers: {
'Authorization': `Bearer ${token}`,
'X-Session-Id': sessionId
}
});

this.ws.onopen = () => {
console.log('WebSocket connected');
resolve();
};

this.ws.onerror = (error) => {
console.error('WebSocket error:', error);
reject(error);
};

this.ws.onmessage = async (event) => {
await this.handleMessage(event.data);
};

this.ws.onclose = () => {
console.log('WebSocket closed, reconnecting...');
setTimeout(() => this.connect(url, token, sessionId), 5000);
};
});
}

private async handleMessage(data: any) {
let message: WebSocketMessage;

// Handle compressed messages
if (data instanceof Blob) {
const arrayBuffer = await data.arrayBuffer();
const buffer = new Uint8Array(arrayBuffer);

// Check for gzip header
if (buffer[0] === 0x1f && buffer[1] === 0x8b) {
const decompressed = await this.decompress(buffer);
message = JSON.parse(new TextDecoder().decode(decompressed));
} else {
message = JSON.parse(new TextDecoder().decode(buffer));
}
} else {
message = JSON.parse(data);
}

// Handle response to request
if (message.id && this.pendingRequests.has(message.id)) {
const pending = this.pendingRequests.get(message.id)!;
this.pendingRequests.delete(message.id);

if (message.error) {
pending.reject(message.error);
} else {
pending.resolve(message.result);
}
return;
}

// Handle notification (no id)
if (message.method && !message.id) {
this.onMessageEmitter.fire(message);
}
}

async request(method: string, params?: any): Promise<any> {
const id = ++this.messageId;

return new Promise((resolve, reject) => {
this.pendingRequests.set(id, { resolve, reject });

const message: WebSocketMessage = {
jsonrpc: '2.0',
id,
method,
params
};

this.ws!.send(JSON.stringify(message));

// Timeout after 30 seconds
setTimeout(() => {
if (this.pendingRequests.has(id)) {
this.pendingRequests.delete(id);
reject(new Error('Request timeout'));
}
}, 30000);
});
}

notify(method: string, params?: any): void {
const message: WebSocketMessage = {
jsonrpc: '2.0',
method,
params
};

this.ws!.send(JSON.stringify(message));
}

private async decompress(data: Uint8Array): Promise<Uint8Array> {
const ds = new DecompressionStream('gzip');
const writer = ds.writable.getWriter();
writer.write(data);
writer.close();

const decompressed = new Response(ds.readable).arrayBuffer();
return new Uint8Array(await decompressed);
}
}

3. Filesystem Service (Using WebSocket)

// src/browser/services/filesystem-service.ts

@injectable()
export class FilesystemService {
@inject(WebSocketClient)
protected readonly ws!: WebSocketClient;

async readFile(path: string): Promise<string> {
const result = await this.ws.request('filesystem/read', { path });
return result.content;
}

async writeFile(path: string, content: string): Promise<void> {
await this.ws.request('filesystem/write', { path, content });
}

async listDirectory(path: string): Promise<string[]> {
const result = await this.ws.request('filesystem/list', { path });
return result.files;
}

watchFile(path: string, callback: (event: any) => void): void {
this.ws.request('filesystem/watch', { path });

this.ws.onMessage((message) => {
if (message.method === 'filesystem/changed' && message.params.path === path) {
callback(message.params);
}
});
}
}

4. NGINX WebSocket Proxy Configuration

# Add to /etc/nginx/nginx.conf

# WebSocket upgrade map
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}

upstream websocket_backend {
least_conn;
server 127.0.0.1:4000 max_fails=3 fail_timeout=30s;
server 127.0.0.1:4001 max_fails=3 fail_timeout=30s;
server 127.0.0.1:4002 max_fails=3 fail_timeout=30s;
}

server {
listen 443 ssl http2;
server_name ide.az1.ai;

# WebSocket endpoint
location /ws {
proxy_pass http://websocket_backend;

# WebSocket headers
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;

# Standard proxy headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;

# Timeouts for WebSocket (long-lived connections)
proxy_read_timeout 86400s; # 24 hours
proxy_send_timeout 86400s; # 24 hours
proxy_connect_timeout 10s;

# Buffer settings
proxy_buffering off;
}
}

5. Docker Compose

# docker-compose.yml

version: '3.8'

services:
nginx:
image: nginx:alpine
ports:
- "443:443"
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
- ./ssl:/etc/letsencrypt:ro
depends_on:
- websocket1
- websocket2
- websocket3
networks:
- ide-network

websocket1:
build: ./backend
ports:
- "4000:4000"
environment:
- PORT=4000
- JWT_SECRET=${JWT_SECRET}
- NODE_ENV=production
volumes:
- workspace:/workspace
networks:
- ide-network

websocket2:
build: ./backend
ports:
- "4001:4000"
environment:
- PORT=4001
- JWT_SECRET=${JWT_SECRET}
- NODE_ENV=production
volumes:
- workspace:/workspace
networks:
- ide-network

websocket3:
build: ./backend
ports:
- "4002:4000"
environment:
- PORT=4002
- JWT_SECRET=${JWT_SECRET}
- NODE_ENV=production
volumes:
- workspace:/workspace
networks:
- ide-network

networks:
ide-network:
driver: bridge

volumes:
workspace:
driver: local

Rationale

Why WebSocket?

Real-time Communication:

  • ✅ Sub-100ms latency (vs HTTP polling at 1-5s)
  • ✅ Bidirectional (server can push updates)
  • ✅ Persistent connection (no reconnection overhead)

Efficiency:

  • ✅ Low overhead (single TCP connection)
  • ✅ No HTTP headers on every message (saves bandwidth)
  • ✅ Built-in compression support

Ecosystem:

  • ✅ Native browser support
  • ✅ Mature Node.js libraries (ws)
  • ✅ NGINX has excellent WebSocket proxy support

Why JSON-RPC 2.0?

Compatibility:

  • ✅ Same as MCP protocol (consistency)
  • ✅ Standard request/response pattern
  • ✅ Error handling built-in

Simplicity:

  • ✅ Easy to debug (human-readable)
  • ✅ Language-agnostic
  • ✅ Extensible

Why Single WebSocket?

Efficiency:

  • ✅ One connection vs multiple (saves resources)
  • ✅ Shared authentication
  • ✅ Unified error handling

Simplicity:

  • ✅ Single reconnection logic
  • ✅ One place to handle compression
  • ✅ Easier to monitor/debug

Alternatives Considered

Alternative 1: HTTP/2 Server-Sent Events (SSE)

Pros:

  • Built on HTTP (easier to proxy)
  • Automatic reconnection
  • Simple API

Cons:

  • ❌ Unidirectional (server → client only)
  • ❌ No binary data support
  • ❌ Limited browser support

Rejected: Need bidirectional communication

Alternative 2: gRPC with HTTP/2

Pros:

  • Type-safe with Protobuf
  • High performance
  • Streaming support

Cons:

  • ❌ Complex setup
  • ❌ Browser support requires grpc-web proxy
  • ❌ Overkill for simple messages

Rejected: Too complex for current needs

Alternative 3: Socket.io

Pros:

  • Automatic reconnection
  • Room/namespace support
  • Fallback to polling

Cons:

  • ❌ Heavier (larger bundle size)
  • ❌ Custom protocol (not standard WebSocket)
  • ❌ Additional abstraction layer

Rejected: Native WebSocket is sufficient

Alternative 4: Multiple WebSocket Connections

Pros:

  • Isolated error handling per service
  • Easier to scale services independently

Cons:

  • ❌ More connections (higher resource usage)
  • ❌ Multiple authentication flows
  • ❌ Complex reconnection logic

Rejected: Single connection is more efficient


Consequences

Positive

Real-time: Sub-100ms latency for all operations ✅ Efficient: Single persistent connection, compression ✅ Scalable: Load balanced across multiple backends ✅ Reliable: Automatic reconnection, message queuing ✅ Secure: JWT authentication, TLS encryption ✅ Unified: Single protocol for all backend services ✅ Standard: JSON-RPC 2.0 (compatible with MCP)

Negative

Complexity: More complex than simple HTTP ❌ State Management: Need to track connections, sessions ❌ Debugging: Harder to debug than REST API ❌ Scaling: Need sticky sessions or shared state

Mitigation

Complexity:

  • Use battle-tested libraries (ws, theia's MessageConnection)
  • Comprehensive documentation and examples
  • Abstract complexity in service layer

State Management:

  • Use Redis for shared session state
  • Connection ID → Session ID mapping
  • Graceful degradation on connection loss

Debugging:

  • Structured logging (all messages logged)
  • WebSocket inspector in browser DevTools
  • Monitoring dashboard (active connections, message throughput)

Scaling:

  • NGINX ip_hash for sticky sessions
  • Or use Redis pub/sub for cross-backend communication
  • Horizontal scaling with multiple NGINX instances

Implementation Plan

Phase 1: Core WebSocket Server ✅

  • Node.js WebSocket server setup
  • JSON-RPC 2.0 message handling
  • JWT authentication
  • Basic error handling
  • Connection management

Phase 2: Service Integration 🔲

  • Filesystem operations (read, write, watch, list)
  • MCP protocol integration (tools, resources, prompts)
  • Agent system integration (execute, status, progress)
  • terminal I/O integration

Phase 3: Frontend Client 🔲

  • WebSocket client in theia
  • Reconnection logic
  • Message compression/decompression
  • Service layer (filesystem, MCP, agents)

Phase 4: Load Balancing 🔲

  • NGINX WebSocket proxy configuration
  • Multiple backend instances
  • Sticky session support
  • Health checks

Phase 5: Production Hardening 🔲

  • Redis for session state
  • Message queue for reliability
  • Monitoring (Prometheus, Grafana)
  • Load testing (10K concurrent connections)

Success Metrics

Performance:

  • < 100ms latency for file operations
  • < 50ms latency for WebSocket messages
  • 1000+ concurrent connections per backend instance

Reliability:

  • < 5s reconnection time
  • 99.9% message delivery
  • Zero data loss on connection drop

Scalability:

  • Linear scaling with backend instances
  • 10K concurrent users with 10 backend instances
  • < 10MB memory per connection


References

WebSocket:

JSON-RPC:

Best Practices:


Status: ✅ Accepted Next Review: 2025-11-06 (1 month) Last Updated: 2025-10-06