Multi-llm Provider Support

Status: ✅ Implemented Version: 1.0 Last Updated: 2025-10-06

Overview

AZ1.AI llm IDE supports 6 llm providers with a unified interface, allowing you to use local and cloud models seamlessly.

Supported Providers

Provider	Type	API Key Required	Default URL
Claude Code ⭐	Local CLI	❌ No (with Max account)	N/A (CLI)
LM Studio	Local	❌ No	`http://host.docker.internal:1234`
Ollama	Local	❌ No	`http://host.docker.internal:11434`
OpenAI	Cloud	✅ Yes	`https://api.openai.com/v1`
Anthropic Claude	Cloud	✅ Yes	`https://api.anthropic.com/v1`
Google Gemini	Cloud	✅ Yes	`https://generativelanguage.googleapis.com/v1beta`
xAI Grok	Cloud	✅ Yes	`https://api.x.ai/v1`

Provider Details

1. Claude Code (Default) ⭐

Why Default?

No API key needed with Claude Max account
Built-in tool access (Read, Write, Bash, etc.)
Best integration with theia IDE
Local execution via CLI

Configuration:

# No configuration needed!
# Works out of the box with Claude Max subscription

Models:

claude-code - Claude Code with full tool access

Features:

✅ Tool execution
✅ File operations
✅ Bash commands
✅ Code generation
✅ Code review

2. LM Studio (Local)

Description: Run 70B+ models locally on your hardware

Configuration:

# Environment variable (optional)
export LM_STUDIO_API="http://host.docker.internal:1234/v1"

# Or in .env
LM_STUDIO_API=http://host.docker.internal:1234/v1

Available Models (16+):

qwen/qwq-32b - Reasoning
qwen/qwen3-coder-30b - Coding
nousresearch/hermes-4-70b - General
meta-llama/llama-3.3-70b - Meta's Llama
Plus 12+ more models

Recommended for:

Privacy-sensitive work
Offline development
High-throughput tasks
Cost-free inference

3. Ollama (Local)

Description: Lightweight local llm runner

Configuration:

# Install Ollama on Windows host
# Download from: https://ollama.ai

# Start Ollama server
ollama serve

# Environment variable (optional)
export OLLAMA_API="http://host.docker.internal:11434/v1"

Available Models: Automatically detected from Ollama. Common models:

llama2
codellama
mistral
mixtral
phi

Pull models:

ollama pull llama2
ollama pull codellama
ollama pull mistral

Recommended for:

Quick local experiments
Lightweight models
Fast iteration
Low resource usage

4. OpenAI (Cloud)

Description: GPT-4, GPT-3.5, and other OpenAI models

Configuration:

# Get API key from: https://platform.openai.com/api-keys

# Environment variable
export OPENAI_API_KEY="sk-..."

# Or in .env
OPENAI_API_KEY=sk-...

Available Models:

gpt-4-turbo-preview - Latest GPT-4
gpt-4 - GPT-4 (8K context)
gpt-3.5-turbo - Fast and cost-effective

Pricing (approximate):

GPT-4 Turbo: $0.01/1K input, $0.03/1K output
GPT-4: $0.03/1K input, $0.06/1K output
GPT-3.5 Turbo: $0.0005/1K input, $0.0015/1K output

Recommended for:

Production applications
Complex reasoning tasks
When quality matters most

5. Anthropic Claude (Cloud)

Description: Claude 3 Opus, Sonnet models via API

Configuration:

# Get API key from: https://console.anthropic.com/

# Environment variable
export ANTHROPIC_API_KEY="sk-ant-..."

# Or in .env
ANTHROPIC_API_KEY=sk-ant-...

Available Models:

claude-3-5-sonnet-20241022 - Latest Sonnet (best balance)
claude-3-opus-20240229 - Most capable
claude-3-sonnet-20240229 - Fast and efficient

Pricing (approximate):

Opus: $15/MTok input, $75/MTok output
Sonnet 3.5: $3/MTok input, $15/MTok output

Recommended for:

Long-context tasks (200K tokens)
Nuanced understanding
Complex analysis

6. Google Gemini (Cloud)

Description: Google's multimodal AI models

Configuration:

# Get API key from: https://makersuite.google.com/app/apikey

# Environment variable
export GEMINI_API_KEY="..."

# Or in .env
GEMINI_API_KEY=...

Available Models:

gemini-pro - Text generation
gemini-ultra - Most capable (when available)

Pricing (approximate):

Gemini Pro: Free tier available, then $0.00025/1K chars

Recommended for:

Multimodal tasks (future)
Cost-effective cloud inference
Google ecosystem integration

7. xAI Grok (Cloud)

Description: Grok models from xAI

Configuration:

# Get API key from: https://x.ai

# Environment variable
export GROK_API_KEY="..."

# Or in .env
GROK_API_KEY=...

Available Models:

grok-beta - Latest Grok model

Recommended for:

Real-time information (when connected)
Experimental features
X platform integration

Usage Examples

Single Mode (One Provider)

// Use Claude Code (default, no API key)
const response = await llmService.chatCompletion(
  'claude-code',
  [{ role: 'user', content: 'Explain async/await' }]
);

// Use LM Studio (local)
const response = await llmService.chatCompletion(
  'qwen/qwq-32b',
  [{ role: 'user', content: 'Explain async/await' }]
);

// Use OpenAI (cloud, requires API key)
const response = await llmService.chatCompletion(
  'gpt-4-turbo-preview',
  [{ role: 'user', content: 'Explain async/await' }]
);

Parallel Mode (Compare Providers)

// Compare local vs cloud
const [localResponse, cloudResponse] = await Promise.all([
  llmService.chatCompletion('qwen/qwq-32b', messages),      // LM Studio
  llmService.chatCompletion('gpt-4-turbo-preview', messages) // OpenAI
]);

// Compare multiple local providers
const [lmStudioResponse, ollamaResponse, claudeResponse] = await Promise.all([
  llmService.chatCompletion('qwen/qwq-32b', messages),     // LM Studio
  llmService.chatCompletion('mistral', messages),          // Ollama
  llmService.chatCompletion('claude-code', messages)       // Claude Code
]);

Sequential Mode (Chain Providers)

// LM Studio generates → Claude Code reviews
const code = await llmService.chatCompletion(
  'qwen/qwen3-coder-30b',
  [{ role: 'user', content: 'Write a React component' }]
);

const review = await llmService.chatCompletion(
  'claude-code',
  [{ role: 'user', content: `Review this code:\n${code}` }]
);

// Ollama drafts → OpenAI refines
const draft = await llmService.chatCompletion(
  'llama2',
  [{ role: 'user', content: 'Draft a product description' }]
);

const refined = await llmService.chatCompletion(
  'gpt-4',
  [{ role: 'user', content: `Improve:\n${draft}` }]
);

Consensus Mode (Multi-Provider Synthesis)

// Get opinions from multiple providers
const responses = await Promise.all([
  llmService.chatCompletion('qwen/qwq-32b', messages),        // LM Studio
  llmService.chatCompletion('claude-3-5-sonnet', messages),   // Anthropic
  llmService.chatCompletion('gpt-4-turbo-preview', messages)  // OpenAI
]);

// Synthesize with Claude Code
const synthesis = await llmService.chatCompletion(
  'claude-code',
  [{
    role: 'user',
    content: `Synthesize best answer from:\n\n${responses.join('\n\n---\n\n')}`
  }]
);

Provider Selection Strategy

By Use Case

Privacy-Sensitive Work:

✅ Claude Code (local CLI)
✅ LM Studio (local inference)
✅ Ollama (local lightweight)

Cost Optimization:

✅ Claude Code (free with Max)
✅ LM Studio (free local)
✅ Ollama (free local)
✅ Gemini Pro (free tier)
OpenAI GPT-3.5 (cheapest cloud)

Quality Priority:

✅ Claude Code (with tools)
Claude 3.5 Sonnet
GPT-4 Turbo
Claude 3 Opus

Speed Priority:

✅ Claude Code (local)
✅ Ollama (lightweight)
GPT-3.5 Turbo
✅ LM Studio (if GPU)

Offline Capability:

✅ Claude Code
✅ LM Studio
✅ Ollama

Configuration Best Practices

Environment Variables

Create .env file in project root:

# Local providers (optional overrides)
LM_STUDIO_API=http://host.docker.internal:1234/v1
OLLAMA_API=http://host.docker.internal:11434/v1

# Cloud providers (required for cloud models)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
GROK_API_KEY=...

Security

⚠️ Never commit API keys to git!

# .gitignore
.env
.env.local
.env.*.local

✅ Use environment variables:

# Linux/Mac
export OPENAI_API_KEY="sk-..."

# Windows
set OPENAI_API_KEY=sk-...

Cost Management

Free Options:

✅ Claude Code (with Max subscription)
✅ LM Studio (100% free)
✅ Ollama (100% free)
✅ Gemini Pro (free tier)

Paid Options (use wisely):

Set budget alerts in provider dashboards
Monitor usage in IDE
Use cheaper models for drafts
Use expensive models for final output

Provider Comparison

Feature	Claude Code	LM Studio	Ollama	OpenAI	Anthropic	Gemini
Cost	Free*	Free	Free	Paid	Paid	Free tier
Privacy	✅ Local	✅ Local	✅ Local	❌ Cloud	❌ Cloud	❌ Cloud
Offline	✅ Yes	✅ Yes	✅ Yes	❌ No	❌ No	❌ No
Tools	✅ Yes	❌ No	❌ No	✅ Yes	✅ Yes	❌ Limited
Max Context	200K	Varies	Varies	128K	200K	32K
Speed	Fast	Fast**	Very Fast	Medium	Medium	Fast
Quality	Excellent	Good	Good	Excellent	Excellent	Good

*With Claude Max subscription **Depends on hardware (GPU recommended)

Troubleshooting

Claude Code Not Working

# Check Claude Code CLI
claude --version

# Verify Max subscription
claude account

# Test connection
claude chat "Hello"

LM Studio Connection Failed

# Check LM Studio is running
curl http://host.docker.internal:1234/v1/models

# Verify environment variable
echo $LM_STUDIO_API

# Check Docker network
docker network ls

Ollama Not Detected

# Check Ollama is running
ollama list

# Verify API endpoint
curl http://host.docker.internal:11434/api/tags

# Pull models if needed
ollama pull llama2

Cloud Provider API Errors

# Check API key is set
echo $OPENAI_API_KEY

# Verify key format
# OpenAI: sk-...
# Anthropic: sk-ant-...
# Gemini: AIza...

# Test API directly
curl https://api.openai.com/v1/models \
  -H "Authorization: Bearer $OPENAI_API_KEY"

Future Enhancements

Last Updated: 2025-10-06 Maintained By: AZ1.AI Development Team

Overview​

Supported Providers​

Provider Details​

1. Claude Code (Default) ⭐​

2. LM Studio (Local)​

3. Ollama (Local)​

4. OpenAI (Cloud)​

5. Anthropic Claude (Cloud)​

6. Google Gemini (Cloud)​

7. xAI Grok (Cloud)​

Usage Examples​

Single Mode (One Provider)​

Parallel Mode (Compare Providers)​

Sequential Mode (Chain Providers)​

Consensus Mode (Multi-Provider Synthesis)​

Provider Selection Strategy​

By Use Case​

Configuration Best Practices​

Environment Variables​

Security​

Cost Management​

Provider Comparison​

Troubleshooting​

Claude Code Not Working​

LM Studio Connection Failed​

Ollama Not Detected​

Cloud Provider API Errors​

Future Enhancements​

Related Documentation​

Overview

Supported Providers

Provider Details

1. Claude Code (Default) ⭐

2. LM Studio (Local)

3. Ollama (Local)

4. OpenAI (Cloud)

5. Anthropic Claude (Cloud)

6. Google Gemini (Cloud)

7. xAI Grok (Cloud)

Usage Examples

Single Mode (One Provider)

Parallel Mode (Compare Providers)

Sequential Mode (Chain Providers)

Consensus Mode (Multi-Provider Synthesis)

Provider Selection Strategy

By Use Case

Configuration Best Practices

Environment Variables

Security

Cost Management

Provider Comparison

Troubleshooting

Claude Code Not Working

LM Studio Connection Failed

Ollama Not Detected

Cloud Provider API Errors

Future Enhancements

Related Documentation