Skip to main content

ADR-019: MCP Protocol Backend Architecture

Status: Accepted Date: 2025-10-06 Deciders: Development Team, AI/ML Team Related: ADR-010 (MCP Protocol), ADR-017 (WebSocket), ADR-015 (Multi-llm)


Context

The AZ1.AI llm IDE uses the Model Context Protocol (MCP) created by Anthropic to enable llms to access tools, resources, and prompts in a standardized way.

MCP Overview

MCP defines three core primitives:

  1. Tools (Model-controlled):

    • llm decides when to invoke
    • Examples: lmstudio_chat, file_read, web_search
  2. Resources (App-controlled):

    • Application provides context
    • Examples: file:///workspace/src/main.ts, session://current
  3. Prompts (User-controlled):

    • User-triggered templates
    • Examples: code-review, explain-code, refactor

Current State

  • MCP LM Studio server implemented (mcp-lmstudio/)
  • LM Studio models accessible via MCP
  • No integration with other llm providers
  • No unified MCP backend for all services
  • Client-side MCP integration incomplete

Requirements

  1. Multi-Provider: Support all 6 llm providers via MCP
  2. Tool Discovery: Dynamic tool registration and discovery
  3. Resource Management: Unified resource access (files, sessions, agents)
  4. Prompt Templates: User-customizable prompt library
  5. Session Isolation: Per-session MCP context
  6. Streaming: Support streaming responses
  7. Caching: Cache tool results, resources

Decision

We will implement a unified MCP backend that acts as a gateway for all llm providers and services:

Architecture

┌─────────────────────────────────────────────────────────────┐
│ Browser (theia Frontend) │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ MCP Client │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ Tool Registry │ Resource Cache │ Prompts │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ │ WebSocket (mcp/* methods) │
└─────────────────────────┼───────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ MCP Gateway (Node.js) │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ MCP Protocol Handler │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ tools/list │ tools/call │ resources/list │ etc. │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Tool │ │ Resource │ │ Prompt │ │
│ │ Providers│ │ Providers │ │ Registry │ │
│ └──────────┘ └─────────────┘ └────────────┘ │
│ │ │ │ │
│ │ │ │ │
└─────────┼───────────────┼───────────────┼──────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ llm Providers│ │ Filesystem │ │FoundationDB │
│ (6 providers)│ │ Service │ │ Service │
└──────────────┘ └──────────────┘ └──────────────┘

MCP Server Components

mcp-gateway/
├── index.ts # Main MCP server
├── tools/ # Tool providers
│ ├── llm-tools.ts # llm chat tools (6 providers)
│ ├── filesystem-tools.ts # File operations
│ ├── agent-tools.ts # Agent execution
│ └── web-tools.ts # Web search, fetch
├── resources/ # Resource providers
│ ├── file-resources.ts # File content access
│ ├── session-resources.ts # Session data
│ └── agent-resources.ts # Agent state
├── prompts/ # Prompt templates
│ ├── code-review.ts
│ ├── explain-code.ts
│ └── refactor.ts
└── services/ # Backend services
├── cache-service.ts # Result caching
├── session-service.ts # Session management
└── auth-service.ts # Authentication

Implementation

1. MCP Gateway Server

// mcp-gateway/index.ts

import { Server } from '@modelcontextprotocol/sdk/server/index.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import {
CallToolRequestSchema,
ListToolsRequestSchema,
ListResourcesRequestSchema,
ReadResourceRequestSchema,
ListPromptsRequestSchema,
GetPromptRequestSchema
} from '@modelcontextprotocol/sdk/types.js';

// Tool providers
import { llmTools } from './tools/llm-tools.js';
import { FilesystemTools } from './tools/filesystem-tools.js';
import { AgentTools } from './tools/agent-tools.js';

// Resource providers
import { FileResources } from './resources/file-resources.js';
import { SessionResources } from './resources/session-resources.js';

// Prompt registry
import { PromptRegistry } from './prompts/index.js';

// Services
import { CacheService } from './services/cache-service.js';
import { SessionService } from './services/session-service.js';

class MCPGateway {
private server: Server;
private llmTools: llmTools;
private filesystemTools: FilesystemTools;
private agentTools: AgentTools;
private fileResources: FileResources;
private sessionResources: SessionResources;
private promptRegistry: PromptRegistry;
private cacheService: CacheService;
private sessionService: SessionService;

constructor() {
this.server = new Server(
{
name: 'az1ai-mcp-gateway',
version: '1.0.0',
},
{
capabilities: {
tools: {},
resources: {},
prompts: {},
},
}
);

// Initialize services
this.cacheService = new CacheService();
this.sessionService = new SessionService();

// Initialize providers
this.llmTools = new llmTools(this.cacheService);
this.filesystemTools = new FilesystemTools();
this.agentTools = new AgentTools(this.sessionService);
this.fileResources = new FileResources();
this.sessionResources = new SessionResources(this.sessionService);
this.promptRegistry = new PromptRegistry();

this.setupHandlers();
}

private setupHandlers() {
// Tools
this.server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
...this.llmTools.listTools(),
...this.filesystemTools.listTools(),
...this.agentTools.listTools(),
],
}));

this.server.setRequestHandler(CallToolRequestSchema, async (request) => {
const { name, arguments: args } = request.params;

// Route to appropriate tool provider
if (name.startsWith('llm_')) {
return this.llmTools.callTool(name, args);
} else if (name.startsWith('fs_')) {
return this.filesystemTools.callTool(name, args);
} else if (name.startsWith('agent_')) {
return this.agentTools.callTool(name, args);
}

throw new Error(`Unknown tool: ${name}`);
});

// Resources
this.server.setRequestHandler(ListResourcesRequestSchema, async () => ({
resources: [
...this.fileResources.listResources(),
...this.sessionResources.listResources(),
],
}));

this.server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
const { uri } = request.params;

if (uri.startsWith('file://')) {
return this.fileResources.readResource(uri);
} else if (uri.startsWith('session://')) {
return this.sessionResources.readResource(uri);
}

throw new Error(`Unknown resource scheme: ${uri}`);
});

// Prompts
this.server.setRequestHandler(ListPromptsRequestSchema, async () => ({
prompts: this.promptRegistry.listPrompts(),
}));

this.server.setRequestHandler(GetPromptRequestSchema, async (request) => {
const { name, arguments: args } = request.params;
return this.promptRegistry.getPrompt(name, args);
});
}

async run() {
const transport = new StdioServerTransport();
await this.server.connect(transport);
console.error('MCP Gateway server running on stdio');
}
}

// Start server
const gateway = new MCPGateway();
gateway.run().catch(console.error);

2. llm Tool Provider

// mcp-gateway/tools/llm-tools.ts

import { Tool } from '@modelcontextprotocol/sdk/types.js';
import { llmService } from '../../src/browser/llm-integration/services/llm-service.js';

export class llmTools {
private llmService: llmService;
private cacheService: CacheService;

constructor(cacheService: CacheService) {
this.llmService = new llmService();
this.cacheService = cacheService;
}

listTools(): Tool[] {
return [
{
name: 'llm_chat',
description: 'Send a chat message to any llm provider',
inputSchema: {
type: 'object',
properties: {
provider: {
type: 'string',
enum: ['lmstudio', 'claude', 'ollama', 'openai', 'gemini', 'grok'],
description: 'llm provider to use'
},
model: {
type: 'string',
description: 'Model ID (e.g., "meta-llama-3.3-70b-instruct")'
},
messages: {
type: 'array',
items: {
type: 'object',
properties: {
role: { type: 'string', enum: ['user', 'assistant', 'system'] },
content: { type: 'string' }
},
required: ['role', 'content']
},
description: 'Conversation messages'
},
temperature: {
type: 'number',
minimum: 0,
maximum: 2,
default: 0.7,
description: 'Sampling temperature'
},
maxTokens: {
type: 'number',
default: 1000,
description: 'Maximum tokens to generate'
}
},
required: ['provider', 'model', 'messages']
}
},
{
name: 'llm_list_models',
description: 'List available models for a provider',
inputSchema: {
type: 'object',
properties: {
provider: {
type: 'string',
enum: ['lmstudio', 'claude', 'ollama', 'openai', 'gemini', 'grok'],
description: 'llm provider'
}
},
required: ['provider']
}
},
{
name: 'llm_parallel',
description: 'Run same prompt on multiple models in parallel',
inputSchema: {
type: 'object',
properties: {
models: {
type: 'array',
items: { type: 'string' },
description: 'Model IDs to query'
},
messages: {
type: 'array',
items: {
type: 'object',
properties: {
role: { type: 'string' },
content: { type: 'string' }
}
}
}
},
required: ['models', 'messages']
}
},
{
name: 'llm_consensus',
description: 'Get consensus answer from multiple models',
inputSchema: {
type: 'object',
properties: {
models: {
type: 'array',
items: { type: 'string' },
minItems: 3,
description: 'At least 3 models for consensus'
},
prompt: {
type: 'string',
description: 'Prompt to send to all models'
}
},
required: ['models', 'prompt']
}
}
];
}

async callTool(name: string, args: any): Promise<any> {
switch (name) {
case 'llm_chat':
return this.chatCompletion(args);

case 'llm_list_models':
return this.listModels(args.provider);

case 'llm_parallel':
return this.parallelCompletion(args);

case 'llm_consensus':
return this.consensusCompletion(args);

default:
throw new Error(`Unknown llm tool: ${name}`);
}
}

private async chatCompletion(args: any) {
const { provider, model, messages, temperature, maxTokens } = args;

// Check cache first
const cacheKey = `llm:${provider}:${model}:${JSON.stringify(messages)}`;
const cached = await this.cacheService.get(cacheKey);
if (cached) {
return {
content: [{
type: 'text',
text: cached.response,
annotations: { cached: true }
}]
};
}

// Call llm service
const response = await this.llmService.chatCompletion(
model,
messages,
temperature,
maxTokens
);

// Cache result
await this.cacheService.set(cacheKey, { response }, 3600); // 1 hour TTL

return {
content: [{
type: 'text',
text: response
}]
};
}

private async listModels(provider: string) {
const models = await this.llmService.getAvailableModels();
const filtered = models.filter(m => m.provider === provider);

return {
content: [{
type: 'text',
text: JSON.stringify(filtered, null, 2)
}]
};
}

private async parallelCompletion(args: any) {
const { models, messages } = args;

const results = await Promise.all(
models.map(model =>
this.llmService.chatCompletion(model, messages)
)
);

return {
content: [{
type: 'text',
text: JSON.stringify(
results.map((result, i) => ({
model: models[i],
response: result
})),
null,
2
)
}]
};
}

private async consensusCompletion(args: any) {
const { models, prompt } = args;

const messages = [{ role: 'user', content: prompt }];
const results = await Promise.all(
models.map(model =>
this.llmService.chatCompletion(model, messages)
)
);

// Simple consensus: most common response
const responseCounts = new Map<string, number>();
for (const result of results) {
responseCounts.set(result, (responseCounts.get(result) || 0) + 1);
}

let consensus = '';
let maxCount = 0;
for (const [response, count] of responseCounts.entries()) {
if (count > maxCount) {
consensus = response;
maxCount = count;
}
}

return {
content: [{
type: 'text',
text: consensus,
annotations: {
agreementCount: maxCount,
totalModels: models.length,
allResponses: results
}
}]
};
}
}

3. Resource Providers

// mcp-gateway/resources/file-resources.ts

import { Resource } from '@modelcontextprotocol/sdk/types.js';
import { promises as fs } from 'fs';
import path from 'path';

export class FileResources {
private workspaceRoot: string;

constructor(workspaceRoot = '/workspace') {
this.workspaceRoot = workspaceRoot;
}

listResources(): Resource[] {
// In production, dynamically list workspace files
return [
{
uri: 'file:///workspace/src/main.ts',
name: 'main.ts',
description: 'Application entry point',
mimeType: 'text/typescript'
},
{
uri: 'file:///workspace/package.json',
name: 'package.json',
description: 'Package configuration',
mimeType: 'application/json'
}
];
}

async readResource(uri: string) {
// Parse file:// URI
const filePath = uri.replace('file://', '');
const fullPath = path.resolve(this.workspaceRoot, filePath);

// Security: prevent path traversal
if (!fullPath.startsWith(this.workspaceRoot)) {
throw new Error('Invalid file path');
}

const content = await fs.readFile(fullPath, 'utf-8');
const mimeType = this.getMimeType(filePath);

return {
contents: [{
uri,
mimeType,
text: content
}]
};
}

private getMimeType(filePath: string): string {
const ext = path.extname(filePath);
const mimeTypes: Record<string, string> = {
'.ts': 'text/typescript',
'.tsx': 'text/typescript',
'.js': 'text/javascript',
'.jsx': 'text/javascript',
'.json': 'application/json',
'.md': 'text/markdown',
'.txt': 'text/plain'
};
return mimeTypes[ext] || 'text/plain';
}
}

4. Prompt Registry

// mcp-gateway/prompts/index.ts

import { Prompt } from '@modelcontextprotocol/sdk/types.js';

export class PromptRegistry {
private prompts: Map<string, Prompt> = new Map();

constructor() {
this.registerDefaultPrompts();
}

private registerDefaultPrompts() {
this.prompts.set('code-review', {
name: 'code-review',
description: 'Review code for quality, security, and best practices',
arguments: [
{
name: 'code',
description: 'Code to review',
required: true
},
{
name: 'level',
description: 'Review level: basic, intermediate, advanced',
required: false
}
]
});

this.prompts.set('explain-code', {
name: 'explain-code',
description: 'Explain how code works',
arguments: [
{
name: 'code',
description: 'Code to explain',
required: true
},
{
name: 'audience',
description: 'Target audience: beginner, intermediate, expert',
required: false
}
]
});

this.prompts.set('refactor', {
name: 'refactor',
description: 'Suggest refactoring improvements',
arguments: [
{
name: 'code',
description: 'Code to refactor',
required: true
},
{
name: 'goals',
description: 'Refactoring goals: performance, readability, maintainability',
required: false
}
]
});
}

listPrompts(): Prompt[] {
return Array.from(this.prompts.values());
}

getPrompt(name: string, args?: any) {
const prompt = this.prompts.get(name);
if (!prompt) {
throw new Error(`Prompt not found: ${name}`);
}

let messages: Array<{ role: string; content: { type: string; text: string } }> = [];

switch (name) {
case 'code-review':
messages = [{
role: 'user',
content: {
type: 'text',
text: `Review the following code (${args?.level || 'intermediate'} level):\n\n${args.code}\n\nProvide feedback on:\n- Code quality\n- Security issues\n- Best practices\n- Potential bugs`
}
}];
break;

case 'explain-code':
messages = [{
role: 'user',
content: {
type: 'text',
text: `Explain how this code works for a ${args?.audience || 'intermediate'} developer:\n\n${args.code}`
}
}];
break;

case 'refactor':
messages = [{
role: 'user',
content: {
type: 'text',
text: `Suggest refactoring improvements for this code (goals: ${args?.goals || 'general improvement'}):\n\n${args.code}\n\nProvide:\n1. Specific issues\n2. Suggested changes\n3. Refactored code`
}
}];
break;
}

return { messages };
}
}

5. Cache Service

// mcp-gateway/services/cache-service.ts

import { createClient } from 'redis';

export class CacheService {
private client: any;

constructor() {
this.client = createClient({
url: process.env.REDIS_URL || 'redis://localhost:6379'
});
this.client.connect();
}

async get(key: string): Promise<any | null> {
const value = await this.client.get(key);
return value ? JSON.parse(value) : null;
}

async set(key: string, value: any, ttl = 3600): Promise<void> {
await this.client.setEx(key, ttl, JSON.stringify(value));
}

async delete(key: string): Promise<void> {
await this.client.del(key);
}

async clear(): Promise<void> {
await this.client.flushAll();
}
}

Rationale

Why Unified MCP Gateway?

Single Integration Point:

  • ✅ All llm providers accessible via one interface
  • ✅ Consistent tool/resource/prompt access
  • ✅ Easier to add new providers

Caching & Optimization:

  • ✅ Cache llm responses (save API costs)
  • ✅ Batch similar requests
  • ✅ Rate limiting

Security & Control:

  • ✅ Centralized auth/permission checks
  • ✅ Request logging/auditing
  • ✅ Resource access control

Why Separate Tool Providers?

Modularity:

  • ✅ Easy to add/remove tool providers
  • ✅ Each provider focused on one domain
  • ✅ Testable in isolation

Performance:

  • ✅ Lazy-load providers as needed
  • ✅ Parallel tool execution
  • ✅ Provider-specific optimizations

Why Redis for Caching?

Performance:

  • ✅ In-memory, sub-millisecond latency
  • ✅ TTL support (auto-expiration)
  • ✅ Pub/sub for cache invalidation

Scalability:

  • ✅ Horizontal scaling (Redis Cluster)
  • ✅ High throughput (100K+ ops/sec)
  • ✅ Low memory overhead

Alternatives Considered

Alternative 1: Direct MCP Servers per Provider

Pros:

  • Simple (no gateway)
  • Provider isolation

Cons:

  • ❌ Client must manage multiple servers
  • ❌ No unified caching
  • ❌ Duplicate auth/logging

Rejected: Too complex for client

Alternative 2: HTTP REST API

Pros:

  • Simple HTTP requests
  • Wide tooling support

Cons:

  • ❌ Not MCP-compliant
  • ❌ No standard for tools/resources
  • ❌ Less efficient than stdio/WebSocket

Rejected: MCP is the standard

Alternative 3: In-Memory Cache (No Redis)

Pros:

  • Simple (no external dependency)
  • Fast (in-process)

Cons:

  • ❌ Lost on restart
  • ❌ Can't scale horizontally
  • ❌ Memory limited

Rejected: Need persistence and scaling


Consequences

Positive

Unified Interface: Single MCP gateway for all providers ✅ Efficient: Caching reduces API costs and latency ✅ Extensible: Easy to add new tools/resources/prompts ✅ Secure: Centralized auth and access control ✅ Standard: MCP-compliant (works with any MCP client) ✅ Scalable: Redis caching, horizontal scaling

Negative

Complexity: Extra layer between client and providers ❌ Single Point of Failure: Gateway down = no MCP access ❌ Latency: Extra hop adds ~10-20ms ❌ Cache Invalidation: Need strategy for stale data

Mitigation

Complexity:

  • Comprehensive documentation
  • Example integrations
  • Testing framework

SPOF:

  • Deploy multiple gateway instances
  • Health checks + auto-restart
  • Fallback to direct provider access

Latency:

  • Optimize gateway code
  • Use local Redis (same machine)
  • Monitor and profile

Cache Invalidation:

  • TTL on all cache entries
  • Manual invalidation API
  • Cache versioning

Implementation Plan

Phase 1: Core Gateway ✅

  • MCP server setup
  • Tool/resource/prompt handlers
  • Basic llm tool provider

Phase 2: Multi-Provider Support 🔲

  • All 6 llm providers integrated
  • Provider-specific optimizations
  • Model listing per provider

Phase 3: Resource Providers 🔲

  • File resources
  • Session resources
  • Agent resources

Phase 4: Caching 🔲

  • Redis integration
  • Cache key strategy
  • TTL configuration
  • Cache invalidation API

Phase 5: Advanced Features 🔲

  • Prompt library (user-customizable)
  • Parallel/consensus tools
  • Streaming responses
  • Rate limiting

Phase 6: Production Hardening 🔲

  • Monitoring (Prometheus)
  • Logging (structured)
  • Error tracking (Sentry)
  • Performance testing

Success Metrics

Performance:

  • < 50ms gateway overhead
  • > 80% cache hit rate (for repeated queries)
  • 1000+ requests/second throughput

Reliability:

  • 99.9% uptime
  • < 0.1% error rate
  • Graceful degradation on provider failures

Developer Experience:

  • < 5 minutes to add new tool
  • Clear error messages
  • Comprehensive examples


References

MCP Specification:

Redis:


Status: ✅ Accepted Next Review: 2025-11-06 (1 month) Last Updated: 2025-10-06