ADR-201: Multi-Provider LLM Gateway
Status: Accepted Date: 2026-02-15 Author: Hal Casteel Track: H.3.10
Context
Problem Statement
Claude Code is locked to Anthropic's API by default. However, multiple LLM providers now expose Anthropic Messages API-compatible endpoints, meaning the same anthropic SDK works with only a base_url change. Additionally, proxy services like OpenRouter provide Anthropic-compatible access to 400+ models from providers that don't natively support the Anthropic API format.
Users need a zero-friction way to switch between providers for:
- Cost optimization — DeepSeek input tokens cost $0.28/M vs Anthropic's ~$3/M (10x cheaper)
- Capability matching — MiniMax scores 80.2% on SWE-Bench; DeepSeek excels at reasoning
- Redundancy — Provider outages shouldn't block development
- Experimentation — Compare model quality on the same task across providers
Current State
- ADR-200 added MiniMax as the 5th provider in coditect-core's MoE system (backend)
- Claude Code supports
ANTHROPIC_BASE_URLandANTHROPIC_AUTH_TOKENenv var overrides ANTHROPIC_AUTH_TOKENtakes priority overANTHROPIC_API_KEYin the auth chain- No built-in Claude Code mechanism for provider switching (no
--providerflag)
Requirements
- Switch providers with a single command (no file edits)
- Maintain existing Anthropic API key for default Claude usage
- Securely store per-provider API keys
- Support both direct providers and proxy gateways
- Allow model override at invocation time
Decision
Implement a shell alias-based multi-provider gateway using Claude Code's native environment variable override mechanism. Each provider gets a dedicated alias (claude-{provider}) that sets the required env vars inline.
Architecture
┌──────────────────────────────────────────────────────────┐
│ User's Terminal │
│ │
│ claude → Anthropic (default, ANTHROPIC_API_KEY)│
│ claude-minimax → MiniMax (api.minimax.io/anthropic) │
│ claude-deepseek → DeepSeek (api.deepseek.com/anthropic)│
│ claude-kimi → Kimi (api.kimi.com/coding) │
│ claude-glm → GLM/z.ai (api.z.ai/api/anthropic) │
│ claude-openrouter → OpenRouter (openrouter.ai/api) │
│ │
│ All aliases → same `claude` binary │
│ Different env vars → different LLM backend │
└──────────────────────────────────────────────────────────┘
Environment Variable Chain
Claude Code resolves credentials in this priority order:
ANTHROPIC_AUTH_TOKEN(highest — used by aliases)ANTHROPIC_API_KEY(default — set in.zshrc)- Settings file (
~/.claude/settings.json)
This means aliases using ANTHROPIC_AUTH_TOKEN naturally override the default ANTHROPIC_API_KEY without unsetting it.
Alias Pattern
Each alias follows this template:
# Standard pattern (API key from file)
alias claude-{provider}='\
ANTHROPIC_BASE_URL="{endpoint}" \
ANTHROPIC_AUTH_TOKEN="$(cat ~/.{Provider}AZ1.api.key 2>/dev/null | tr -d '"'"'[:space:]'"'"')" \
ANTHROPIC_MODEL="{primary_model}" \
ANTHROPIC_SMALL_FAST_MODEL="{fast_model}" \
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 \
claude'
# Function pattern (Kimi — auto-starts token-refreshing proxy)
# Note: zsh requires `function name-with-hyphens { }` syntax, not `name() { }`
function claude-kimi {
kimi-proxy --check >/dev/null 2>&1 || kimi-proxy
ANTHROPIC_BASE_URL="http://127.0.0.1:18462" \
ANTHROPIC_AUTH_TOKEN="$(kimi-token 2>/dev/null)" \
ANTHROPIC_MODEL="kimi-k2.5" \
ANTHROPIC_SMALL_FAST_MODEL="kimi-k2.5" \
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 \
claude "$@"
}
Authentication
API Key Pattern (MiniMax, DeepSeek, GLM, OpenRouter)
API keys are stored in home directory files following the pattern ~/.{Provider}AZ1.api.key:
| Provider | Key File | Key Source |
|---|---|---|
| MiniMax | ~/.MiniMaxAZ1.api.key | https://platform.minimax.io |
| DeepSeek | ~/.DeepSeekAZ1.api.key | https://platform.deepseek.com |
| GLM | ~/.GLMAZ1.api.key | https://open.z.ai |
| OpenRouter | ~/.OpenRouterAZ1.api.key | https://openrouter.ai/settings/keys |
Security: Key files should be chmod 600 (owner-read-only). The cat ... | tr -d '[:space:]' pipeline strips any trailing newlines or whitespace from key files.
OAuth Pattern (Kimi)
Kimi uses OAuth device flow authentication (NOT platform API keys):
- Login: Run
kimiCLI → triggers browser-based OAuth login - Credentials: OAuth tokens stored at
~/.kimi/credentials/kimi-code.json - Token refresh:
kimi-tokenhelper script (~/.local/bin/kimi-token) auto-refreshes expired tokens using the refresh_token (30-day validity) - Access token: ~15 minute validity, refreshed on each
claude-kimiinvocation
Why OAuth? Kimi's api.kimi.com/coding endpoint is Anthropic Messages API-compatible but only accepts OAuth access tokens, not platform API keys (sk-kimi-* keys are for the separate api.moonshot.ai platform).
Token lifecycle:
kimi login → OAuth device flow → access_token (15m) + refresh_token (30d)
↓
claude-kimi → starts kimi-proxy (if not running) → ANTHROPIC_BASE_URL=localhost:18462
↓
kimi-proxy (daemon on port 18462) ←── Claude Code sends requests
↓
reads fresh token from ~/.kimi/credentials/kimi-code.json
↓ (refreshes via auth.kimi.com if <5m remaining)
forwards request to api.kimi.com/coding with fresh token
↓
streams response back to Claude Code
Token-refreshing proxy (kimi-proxy):
~/.local/bin/kimi-proxy— local HTTP proxy that injects fresh OAuth tokens per-request- Auto-started by
claude-kimifunction if not already running - Daemon on
127.0.0.1:18462, PID file at~/.local/run/kimi-proxy.pid - Supports SSE streaming for real-time responses
- Sessions run indefinitely — no 15-minute token expiry limit
- Stop with:
kimi-proxy --stop
Provider Configuration
Direct Providers (Anthropic Messages API-compatible)
| Provider | Base URL | Primary Model | Fast Model | Context | Input $/M | Output $/M |
|---|---|---|---|---|---|---|
| MiniMax | https://api.minimax.io/anthropic | MiniMax-M2.5 | MiniMax-M2.5 | 204K | $0.15 | $1.20 |
| DeepSeek | https://api.deepseek.com/anthropic | deepseek-reasoner | deepseek-chat | 128K | $0.28 | $0.42 |
| Kimi | https://api.kimi.com/coding | kimi-k2.5 | kimi-k2.5 | 128K | $0.60 | $2.50 |
| GLM | https://api.z.ai/api/anthropic | glm-4.6 | glm-4.5-air | 128K | $0.55 | $2.20 |
Proxy Provider (Universal Gateway)
| Provider | Base URL | Default Model | Access To |
|---|---|---|---|
| OpenRouter | https://openrouter.ai/api | anthropic/claude-sonnet-4 | 400+ models |
OpenRouter model override at invocation:
ANTHROPIC_MODEL="google/gemini-2.5-pro" claude-openrouter
ANTHROPIC_MODEL="openai/gpt-4o" claude-openrouter
ANTHROPIC_MODEL="meta-llama/llama-4-maverick" claude-openrouter
Provider Quirks
| Provider | Quirk | Mitigation |
|---|---|---|
| MiniMax | Returns ThinkingBlock + TextBlock by default (no extended thinking requested) | Iterate response.content for type == "text", never assume content[0] |
| MiniMax | Temperature must be > 0.0 (exclusive range) | Guard with max(0.01, temperature) |
| MiniMax | No vision/image input | Text-only tasks |
| DeepSeek | Cache hits at $0.028/M (10x cheaper than uncached) | Prefer for repetitive tasks |
| Kimi | OAuth-only auth — sk-kimi-* platform keys rejected on api.kimi.com/coding | Use kimi-token helper for OAuth access tokens |
| Kimi | Access tokens expire in ~15 minutes | kimi-proxy auto-refreshes per-request; sessions run indefinitely |
| Kimi | Accepts kimi-k2.5 model name but returns kimi-for-coding in response | Parse model field accordingly |
| OpenRouter | 5.5% fee on credit purchases (not per-token) | Buy credits in bulk |
| OpenRouter | Model names prefixed with provider/ | Use anthropic/claude-sonnet-4 format |
SDK URL Construction
Critical: The Anthropic SDK appends /v1/messages to the base URL. Therefore, ANTHROPIC_BASE_URL must NOT include /v1:
ANTHROPIC_BASE_URL="https://api.kimi.com/coding"
↓ SDK appends /v1/messages
Final URL: https://api.kimi.com/coding/v1/messages ✓
ANTHROPIC_BASE_URL="https://api.kimi.com/coding/v1" ← WRONG
↓ SDK appends /v1/messages
Final URL: https://api.kimi.com/coding/v1/v1/messages ✗ (404)
Env Var Reference
| Variable | Purpose | Example |
|---|---|---|
ANTHROPIC_BASE_URL | Provider API endpoint | https://api.deepseek.com/anthropic |
ANTHROPIC_AUTH_TOKEN | Provider API key (overrides ANTHROPIC_API_KEY) | Read from key file |
ANTHROPIC_MODEL | Primary model for main tasks | deepseek-reasoner |
ANTHROPIC_SMALL_FAST_MODEL | Fast model for subtasks/subagents | deepseek-chat |
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC | Disable telemetry to Anthropic | 1 |
Consequences
Positive
- Zero friction — switch providers with a single command, no file edits
- No interference — default
claudecommand unchanged, aliases are additive - Secure — keys in separate files,
ANTHROPIC_AUTH_TOKENoverrides without unsetting base key - Extensible — new providers added by creating one alias + one key file
- Cost savings — DeepSeek is ~10x cheaper than Anthropic for many tasks
- Redundancy — 5 independent providers, any outage has 4 fallbacks
Negative
- Manual key management — users must obtain and save API keys per provider
- No auto-routing — user must choose provider (unlike MoE backend which routes automatically)
- Provider-specific quirks — ThinkingBlock, temperature ranges, model naming vary
- Shell-specific — aliases work in zsh/bash but not in IDE integrations or scripts
Neutral
- Telemetry disabled —
CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1prevents Anthropic usage tracking for non-Anthropic providers - Same Claude Code binary — all aliases invoke the same
claudeCLI, only the backend differs - OpenRouter double-hop — requests to OpenRouter add latency vs direct provider connections
Alternatives Considered
1. Claude Code settings.json Profiles
Configure multiple profiles in ~/.claude/settings.json with a --profile flag.
Rejected: Claude Code doesn't support profiles. Would require upstream changes.
2. LiteLLM Proxy
Run a local LiteLLM proxy that translates between API formats. Rejected: Adds infrastructure (local proxy process), complexity, and latency. Unnecessary since target providers already support Anthropic API format.
3. Wrapper Script (claude-provider)
A shell script that reads a config file and sets env vars dynamically. Rejected: Over-engineered for 5 providers. Shell aliases are simpler, more transparent, and easier to debug.
4. Environment Module System
Use direnv or autoenv to set provider per project directory.
Rejected: Project-level provider locking is too rigid. Users want to choose per-session.
Related Decisions
| ADR | Relationship |
|---|---|
| ADR-073 | MoE Provider Flexibility — backend multi-provider support |
| ADR-122 | Unified LLM Component Architecture — per-provider directory structure |
| ADR-190 | Cross-LLM Bridge Architecture — vendor-agnostic orchestration layer |
| ADR-200 | MiniMax Provider Integration — first Anthropic-compatible third-party provider |
Implementation
Completed
- 5 shell aliases/functions added to
~/.zshrc -
claude-minimaxtested and working (MiniMax-M2.5) - ThinkingBlock fix deployed (
f638bb56) - Key file convention established (
~/.{Provider}AZ1.api.key) - Kimi OAuth auth via
kimi-tokenhelper (~/.local/bin/kimi-token) -
kimi-proxytoken-refreshing proxy (~/.local/bin/kimi-proxy) — infinite session support -
claude-kimitested and working (kimi-k2.5 via proxy)
Pending (User Action)
- Obtain DeepSeek API key → save to
~/.DeepSeekAZ1.api.key - Obtain OpenRouter API key → save to
~/.OpenRouterAZ1.api.key - Obtain GLM API key → save to
~/.GLMAZ1.api.key
Track: H.3.10 (Multi-Provider LLM Gateway) Author: Hal Casteel Updated: 2026-02-15