Multi-llm Provider Support
Status: ✅ Implemented Version: 1.0 Last Updated: 2025-10-06
Overview
AZ1.AI llm IDE supports 6 llm providers with a unified interface, allowing you to use local and cloud models seamlessly.
Supported Providers
| Provider | Type | API Key Required | Default URL |
|---|---|---|---|
| Claude Code ⭐ | Local CLI | ❌ No (with Max account) | N/A (CLI) |
| LM Studio | Local | ❌ No | http://host.docker.internal:1234 |
| Ollama | Local | ❌ No | http://host.docker.internal:11434 |
| OpenAI | Cloud | ✅ Yes | https://api.openai.com/v1 |
| Anthropic Claude | Cloud | ✅ Yes | https://api.anthropic.com/v1 |
| Google Gemini | Cloud | ✅ Yes | https://generativelanguage.googleapis.com/v1beta |
| xAI Grok | Cloud | ✅ Yes | https://api.x.ai/v1 |
Provider Details
1. Claude Code (Default) ⭐
Why Default?
- No API key needed with Claude Max account
- Built-in tool access (Read, Write, Bash, etc.)
- Best integration with theia IDE
- Local execution via CLI
Configuration:
# No configuration needed!
# Works out of the box with Claude Max subscription
Models:
claude-code- Claude Code with full tool access
Features:
- ✅ Tool execution
- ✅ File operations
- ✅ Bash commands
- ✅ Code generation
- ✅ Code review
2. LM Studio (Local)
Description: Run 70B+ models locally on your hardware
Configuration:
# Environment variable (optional)
export LM_STUDIO_API="http://host.docker.internal:1234/v1"
# Or in .env
LM_STUDIO_API=http://host.docker.internal:1234/v1
Available Models (16+):
qwen/qwq-32b- Reasoningqwen/qwen3-coder-30b- Codingnousresearch/hermes-4-70b- Generalmeta-llama/llama-3.3-70b- Meta's Llama- Plus 12+ more models
Recommended for:
- Privacy-sensitive work
- Offline development
- High-throughput tasks
- Cost-free inference
3. Ollama (Local)
Description: Lightweight local llm runner
Configuration:
# Install Ollama on Windows host
# Download from: https://ollama.ai
# Start Ollama server
ollama serve
# Environment variable (optional)
export OLLAMA_API="http://host.docker.internal:11434/v1"
Available Models: Automatically detected from Ollama. Common models:
llama2codellamamistralmixtralphi
Pull models:
ollama pull llama2
ollama pull codellama
ollama pull mistral
Recommended for:
- Quick local experiments
- Lightweight models
- Fast iteration
- Low resource usage
4. OpenAI (Cloud)
Description: GPT-4, GPT-3.5, and other OpenAI models
Configuration:
# Get API key from: https://platform.openai.com/api-keys
# Environment variable
export OPENAI_API_KEY="sk-..."
# Or in .env
OPENAI_API_KEY=sk-...
Available Models:
gpt-4-turbo-preview- Latest GPT-4gpt-4- GPT-4 (8K context)gpt-3.5-turbo- Fast and cost-effective
Pricing (approximate):
- GPT-4 Turbo: $0.01/1K input, $0.03/1K output
- GPT-4: $0.03/1K input, $0.06/1K output
- GPT-3.5 Turbo: $0.0005/1K input, $0.0015/1K output
Recommended for:
- Production applications
- Complex reasoning tasks
- When quality matters most
5. Anthropic Claude (Cloud)
Description: Claude 3 Opus, Sonnet models via API
Configuration:
# Get API key from: https://console.anthropic.com/
# Environment variable
export ANTHROPIC_API_KEY="sk-ant-..."
# Or in .env
ANTHROPIC_API_KEY=sk-ant-...
Available Models:
claude-3-5-sonnet-20241022- Latest Sonnet (best balance)claude-3-opus-20240229- Most capableclaude-3-sonnet-20240229- Fast and efficient
Pricing (approximate):
- Opus: $15/MTok input, $75/MTok output
- Sonnet 3.5: $3/MTok input, $15/MTok output
Recommended for:
- Long-context tasks (200K tokens)
- Nuanced understanding
- Complex analysis
6. Google Gemini (Cloud)
Description: Google's multimodal AI models
Configuration:
# Get API key from: https://makersuite.google.com/app/apikey
# Environment variable
export GEMINI_API_KEY="..."
# Or in .env
GEMINI_API_KEY=...
Available Models:
gemini-pro- Text generationgemini-ultra- Most capable (when available)
Pricing (approximate):
- Gemini Pro: Free tier available, then $0.00025/1K chars
Recommended for:
- Multimodal tasks (future)
- Cost-effective cloud inference
- Google ecosystem integration
7. xAI Grok (Cloud)
Description: Grok models from xAI
Configuration:
# Get API key from: https://x.ai
# Environment variable
export GROK_API_KEY="..."
# Or in .env
GROK_API_KEY=...
Available Models:
grok-beta- Latest Grok model
Recommended for:
- Real-time information (when connected)
- Experimental features
- X platform integration
Usage Examples
Single Mode (One Provider)
// Use Claude Code (default, no API key)
const response = await llmService.chatCompletion(
'claude-code',
[{ role: 'user', content: 'Explain async/await' }]
);
// Use LM Studio (local)
const response = await llmService.chatCompletion(
'qwen/qwq-32b',
[{ role: 'user', content: 'Explain async/await' }]
);
// Use OpenAI (cloud, requires API key)
const response = await llmService.chatCompletion(
'gpt-4-turbo-preview',
[{ role: 'user', content: 'Explain async/await' }]
);
Parallel Mode (Compare Providers)
// Compare local vs cloud
const [localResponse, cloudResponse] = await Promise.all([
llmService.chatCompletion('qwen/qwq-32b', messages), // LM Studio
llmService.chatCompletion('gpt-4-turbo-preview', messages) // OpenAI
]);
// Compare multiple local providers
const [lmStudioResponse, ollamaResponse, claudeResponse] = await Promise.all([
llmService.chatCompletion('qwen/qwq-32b', messages), // LM Studio
llmService.chatCompletion('mistral', messages), // Ollama
llmService.chatCompletion('claude-code', messages) // Claude Code
]);
Sequential Mode (Chain Providers)
// LM Studio generates → Claude Code reviews
const code = await llmService.chatCompletion(
'qwen/qwen3-coder-30b',
[{ role: 'user', content: 'Write a React component' }]
);
const review = await llmService.chatCompletion(
'claude-code',
[{ role: 'user', content: `Review this code:\n${code}` }]
);
// Ollama drafts → OpenAI refines
const draft = await llmService.chatCompletion(
'llama2',
[{ role: 'user', content: 'Draft a product description' }]
);
const refined = await llmService.chatCompletion(
'gpt-4',
[{ role: 'user', content: `Improve:\n${draft}` }]
);
Consensus Mode (Multi-Provider Synthesis)
// Get opinions from multiple providers
const responses = await Promise.all([
llmService.chatCompletion('qwen/qwq-32b', messages), // LM Studio
llmService.chatCompletion('claude-3-5-sonnet', messages), // Anthropic
llmService.chatCompletion('gpt-4-turbo-preview', messages) // OpenAI
]);
// Synthesize with Claude Code
const synthesis = await llmService.chatCompletion(
'claude-code',
[{
role: 'user',
content: `Synthesize best answer from:\n\n${responses.join('\n\n---\n\n')}`
}]
);
Provider Selection Strategy
By Use Case
Privacy-Sensitive Work:
- ✅ Claude Code (local CLI)
- ✅ LM Studio (local inference)
- ✅ Ollama (local lightweight)
Cost Optimization:
- ✅ Claude Code (free with Max)
- ✅ LM Studio (free local)
- ✅ Ollama (free local)
- ✅ Gemini Pro (free tier)
- OpenAI GPT-3.5 (cheapest cloud)
Quality Priority:
- ✅ Claude Code (with tools)
- Claude 3.5 Sonnet
- GPT-4 Turbo
- Claude 3 Opus
Speed Priority:
- ✅ Claude Code (local)
- ✅ Ollama (lightweight)
- GPT-3.5 Turbo
- ✅ LM Studio (if GPU)
Offline Capability:
- ✅ Claude Code
- ✅ LM Studio
- ✅ Ollama
Configuration Best Practices
Environment Variables
Create .env file in project root:
# Local providers (optional overrides)
LM_STUDIO_API=http://host.docker.internal:1234/v1
OLLAMA_API=http://host.docker.internal:11434/v1
# Cloud providers (required for cloud models)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
GROK_API_KEY=...
Security
⚠️ Never commit API keys to git!
# .gitignore
.env
.env.local
.env.*.local
✅ Use environment variables:
# Linux/Mac
export OPENAI_API_KEY="sk-..."
# Windows
set OPENAI_API_KEY=sk-...
Cost Management
Free Options:
- ✅ Claude Code (with Max subscription)
- ✅ LM Studio (100% free)
- ✅ Ollama (100% free)
- ✅ Gemini Pro (free tier)
Paid Options (use wisely):
- Set budget alerts in provider dashboards
- Monitor usage in IDE
- Use cheaper models for drafts
- Use expensive models for final output
Provider Comparison
| Feature | Claude Code | LM Studio | Ollama | OpenAI | Anthropic | Gemini |
|---|---|---|---|---|---|---|
| Cost | Free* | Free | Free | Paid | Paid | Free tier |
| Privacy | ✅ Local | ✅ Local | ✅ Local | ❌ Cloud | ❌ Cloud | ❌ Cloud |
| Offline | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No | ❌ No | ❌ No |
| Tools | ✅ Yes | ❌ No | ❌ No | ✅ Yes | ✅ Yes | ❌ Limited |
| Max Context | 200K | Varies | Varies | 128K | 200K | 32K |
| Speed | Fast | Fast** | Very Fast | Medium | Medium | Fast |
| Quality | Excellent | Good | Good | Excellent | Excellent | Good |
*With Claude Max subscription **Depends on hardware (GPU recommended)
Troubleshooting
Claude Code Not Working
# Check Claude Code CLI
claude --version
# Verify Max subscription
claude account
# Test connection
claude chat "Hello"
LM Studio Connection Failed
# Check LM Studio is running
curl http://host.docker.internal:1234/v1/models
# Verify environment variable
echo $LM_STUDIO_API
# Check Docker network
docker network ls
Ollama Not Detected
# Check Ollama is running
ollama list
# Verify API endpoint
curl http://host.docker.internal:11434/api/tags
# Pull models if needed
ollama pull llama2
Cloud Provider API Errors
# Check API key is set
echo $OPENAI_API_KEY
# Verify key format
# OpenAI: sk-...
# Anthropic: sk-ant-...
# Gemini: AIza...
# Test API directly
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"
Future Enhancements
- Provider performance metrics
- Cost tracking per provider
- Automatic provider failover
- Provider-specific settings UI
- Model recommendations
- Usage analytics dashboard
Related Documentation
Last Updated: 2025-10-06 Maintained By: AZ1.AI Development Team