Skip to main content

Multi-llm Provider Support

Status: ✅ Implemented Version: 1.0 Last Updated: 2025-10-06


Overview

AZ1.AI llm IDE supports 6 llm providers with a unified interface, allowing you to use local and cloud models seamlessly.

Supported Providers

ProviderTypeAPI Key RequiredDefault URL
Claude CodeLocal CLI❌ No (with Max account)N/A (CLI)
LM StudioLocal❌ Nohttp://host.docker.internal:1234
OllamaLocal❌ Nohttp://host.docker.internal:11434
OpenAICloud✅ Yeshttps://api.openai.com/v1
Anthropic ClaudeCloud✅ Yeshttps://api.anthropic.com/v1
Google GeminiCloud✅ Yeshttps://generativelanguage.googleapis.com/v1beta
xAI GrokCloud✅ Yeshttps://api.x.ai/v1

Provider Details

1. Claude Code (Default) ⭐

Why Default?

  • No API key needed with Claude Max account
  • Built-in tool access (Read, Write, Bash, etc.)
  • Best integration with theia IDE
  • Local execution via CLI

Configuration:

# No configuration needed!
# Works out of the box with Claude Max subscription

Models:

  • claude-code - Claude Code with full tool access

Features:

  • ✅ Tool execution
  • ✅ File operations
  • ✅ Bash commands
  • ✅ Code generation
  • ✅ Code review

2. LM Studio (Local)

Description: Run 70B+ models locally on your hardware

Configuration:

# Environment variable (optional)
export LM_STUDIO_API="http://host.docker.internal:1234/v1"

# Or in .env
LM_STUDIO_API=http://host.docker.internal:1234/v1

Available Models (16+):

  • qwen/qwq-32b - Reasoning
  • qwen/qwen3-coder-30b - Coding
  • nousresearch/hermes-4-70b - General
  • meta-llama/llama-3.3-70b - Meta's Llama
  • Plus 12+ more models

Recommended for:

  • Privacy-sensitive work
  • Offline development
  • High-throughput tasks
  • Cost-free inference

3. Ollama (Local)

Description: Lightweight local llm runner

Configuration:

# Install Ollama on Windows host
# Download from: https://ollama.ai

# Start Ollama server
ollama serve

# Environment variable (optional)
export OLLAMA_API="http://host.docker.internal:11434/v1"

Available Models: Automatically detected from Ollama. Common models:

  • llama2
  • codellama
  • mistral
  • mixtral
  • phi

Pull models:

ollama pull llama2
ollama pull codellama
ollama pull mistral

Recommended for:

  • Quick local experiments
  • Lightweight models
  • Fast iteration
  • Low resource usage

4. OpenAI (Cloud)

Description: GPT-4, GPT-3.5, and other OpenAI models

Configuration:

# Get API key from: https://platform.openai.com/api-keys

# Environment variable
export OPENAI_API_KEY="sk-..."

# Or in .env
OPENAI_API_KEY=sk-...

Available Models:

  • gpt-4-turbo-preview - Latest GPT-4
  • gpt-4 - GPT-4 (8K context)
  • gpt-3.5-turbo - Fast and cost-effective

Pricing (approximate):

  • GPT-4 Turbo: $0.01/1K input, $0.03/1K output
  • GPT-4: $0.03/1K input, $0.06/1K output
  • GPT-3.5 Turbo: $0.0005/1K input, $0.0015/1K output

Recommended for:

  • Production applications
  • Complex reasoning tasks
  • When quality matters most

5. Anthropic Claude (Cloud)

Description: Claude 3 Opus, Sonnet models via API

Configuration:

# Get API key from: https://console.anthropic.com/

# Environment variable
export ANTHROPIC_API_KEY="sk-ant-..."

# Or in .env
ANTHROPIC_API_KEY=sk-ant-...

Available Models:

  • claude-3-5-sonnet-20241022 - Latest Sonnet (best balance)
  • claude-3-opus-20240229 - Most capable
  • claude-3-sonnet-20240229 - Fast and efficient

Pricing (approximate):

  • Opus: $15/MTok input, $75/MTok output
  • Sonnet 3.5: $3/MTok input, $15/MTok output

Recommended for:

  • Long-context tasks (200K tokens)
  • Nuanced understanding
  • Complex analysis

6. Google Gemini (Cloud)

Description: Google's multimodal AI models

Configuration:

# Get API key from: https://makersuite.google.com/app/apikey

# Environment variable
export GEMINI_API_KEY="..."

# Or in .env
GEMINI_API_KEY=...

Available Models:

  • gemini-pro - Text generation
  • gemini-ultra - Most capable (when available)

Pricing (approximate):

  • Gemini Pro: Free tier available, then $0.00025/1K chars

Recommended for:

  • Multimodal tasks (future)
  • Cost-effective cloud inference
  • Google ecosystem integration

7. xAI Grok (Cloud)

Description: Grok models from xAI

Configuration:

# Get API key from: https://x.ai

# Environment variable
export GROK_API_KEY="..."

# Or in .env
GROK_API_KEY=...

Available Models:

  • grok-beta - Latest Grok model

Recommended for:

  • Real-time information (when connected)
  • Experimental features
  • X platform integration

Usage Examples

Single Mode (One Provider)

// Use Claude Code (default, no API key)
const response = await llmService.chatCompletion(
'claude-code',
[{ role: 'user', content: 'Explain async/await' }]
);

// Use LM Studio (local)
const response = await llmService.chatCompletion(
'qwen/qwq-32b',
[{ role: 'user', content: 'Explain async/await' }]
);

// Use OpenAI (cloud, requires API key)
const response = await llmService.chatCompletion(
'gpt-4-turbo-preview',
[{ role: 'user', content: 'Explain async/await' }]
);

Parallel Mode (Compare Providers)

// Compare local vs cloud
const [localResponse, cloudResponse] = await Promise.all([
llmService.chatCompletion('qwen/qwq-32b', messages), // LM Studio
llmService.chatCompletion('gpt-4-turbo-preview', messages) // OpenAI
]);

// Compare multiple local providers
const [lmStudioResponse, ollamaResponse, claudeResponse] = await Promise.all([
llmService.chatCompletion('qwen/qwq-32b', messages), // LM Studio
llmService.chatCompletion('mistral', messages), // Ollama
llmService.chatCompletion('claude-code', messages) // Claude Code
]);

Sequential Mode (Chain Providers)

// LM Studio generates → Claude Code reviews
const code = await llmService.chatCompletion(
'qwen/qwen3-coder-30b',
[{ role: 'user', content: 'Write a React component' }]
);

const review = await llmService.chatCompletion(
'claude-code',
[{ role: 'user', content: `Review this code:\n${code}` }]
);

// Ollama drafts → OpenAI refines
const draft = await llmService.chatCompletion(
'llama2',
[{ role: 'user', content: 'Draft a product description' }]
);

const refined = await llmService.chatCompletion(
'gpt-4',
[{ role: 'user', content: `Improve:\n${draft}` }]
);

Consensus Mode (Multi-Provider Synthesis)

// Get opinions from multiple providers
const responses = await Promise.all([
llmService.chatCompletion('qwen/qwq-32b', messages), // LM Studio
llmService.chatCompletion('claude-3-5-sonnet', messages), // Anthropic
llmService.chatCompletion('gpt-4-turbo-preview', messages) // OpenAI
]);

// Synthesize with Claude Code
const synthesis = await llmService.chatCompletion(
'claude-code',
[{
role: 'user',
content: `Synthesize best answer from:\n\n${responses.join('\n\n---\n\n')}`
}]
);

Provider Selection Strategy

By Use Case

Privacy-Sensitive Work:

  1. ✅ Claude Code (local CLI)
  2. ✅ LM Studio (local inference)
  3. ✅ Ollama (local lightweight)

Cost Optimization:

  1. ✅ Claude Code (free with Max)
  2. ✅ LM Studio (free local)
  3. ✅ Ollama (free local)
  4. ✅ Gemini Pro (free tier)
  5. OpenAI GPT-3.5 (cheapest cloud)

Quality Priority:

  1. ✅ Claude Code (with tools)
  2. Claude 3.5 Sonnet
  3. GPT-4 Turbo
  4. Claude 3 Opus

Speed Priority:

  1. ✅ Claude Code (local)
  2. ✅ Ollama (lightweight)
  3. GPT-3.5 Turbo
  4. ✅ LM Studio (if GPU)

Offline Capability:

  1. ✅ Claude Code
  2. ✅ LM Studio
  3. ✅ Ollama

Configuration Best Practices

Environment Variables

Create .env file in project root:

# Local providers (optional overrides)
LM_STUDIO_API=http://host.docker.internal:1234/v1
OLLAMA_API=http://host.docker.internal:11434/v1

# Cloud providers (required for cloud models)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=...
GROK_API_KEY=...

Security

⚠️ Never commit API keys to git!

# .gitignore
.env
.env.local
.env.*.local

Use environment variables:

# Linux/Mac
export OPENAI_API_KEY="sk-..."

# Windows
set OPENAI_API_KEY=sk-...

Cost Management

Free Options:

  • ✅ Claude Code (with Max subscription)
  • ✅ LM Studio (100% free)
  • ✅ Ollama (100% free)
  • ✅ Gemini Pro (free tier)

Paid Options (use wisely):

  • Set budget alerts in provider dashboards
  • Monitor usage in IDE
  • Use cheaper models for drafts
  • Use expensive models for final output

Provider Comparison

FeatureClaude CodeLM StudioOllamaOpenAIAnthropicGemini
CostFree*FreeFreePaidPaidFree tier
Privacy✅ Local✅ Local✅ Local❌ Cloud❌ Cloud❌ Cloud
Offline✅ Yes✅ Yes✅ Yes❌ No❌ No❌ No
Tools✅ Yes❌ No❌ No✅ Yes✅ Yes❌ Limited
Max Context200KVariesVaries128K200K32K
SpeedFastFast**Very FastMediumMediumFast
QualityExcellentGoodGoodExcellentExcellentGood

*With Claude Max subscription **Depends on hardware (GPU recommended)


Troubleshooting

Claude Code Not Working

# Check Claude Code CLI
claude --version

# Verify Max subscription
claude account

# Test connection
claude chat "Hello"

LM Studio Connection Failed

# Check LM Studio is running
curl http://host.docker.internal:1234/v1/models

# Verify environment variable
echo $LM_STUDIO_API

# Check Docker network
docker network ls

Ollama Not Detected

# Check Ollama is running
ollama list

# Verify API endpoint
curl http://host.docker.internal:11434/api/tags

# Pull models if needed
ollama pull llama2

Cloud Provider API Errors

# Check API key is set
echo $OPENAI_API_KEY

# Verify key format
# OpenAI: sk-...
# Anthropic: sk-ant-...
# Gemini: AIza...

# Test API directly
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer $OPENAI_API_KEY"

Future Enhancements

  • Provider performance metrics
  • Cost tracking per provider
  • Automatic provider failover
  • Provider-specific settings UI
  • Model recommendations
  • Usage analytics dashboard


Last Updated: 2025-10-06 Maintained By: AZ1.AI Development Team