ADR-201: Multi-Provider LLM Gateway

Status: Accepted Date: 2026-02-15 Author: Hal Casteel Track: H.3.10

Context

Problem Statement

Claude Code is locked to Anthropic's API by default. However, multiple LLM providers now expose Anthropic Messages API-compatible endpoints, meaning the same anthropic SDK works with only a base_url change. Additionally, proxy services like OpenRouter provide Anthropic-compatible access to 400+ models from providers that don't natively support the Anthropic API format.

Users need a zero-friction way to switch between providers for:

Cost optimization — DeepSeek input tokens cost $0.28/M vs Anthropic's ~$3/M (10x cheaper)
Capability matching — MiniMax scores 80.2% on SWE-Bench; DeepSeek excels at reasoning
Redundancy — Provider outages shouldn't block development
Experimentation — Compare model quality on the same task across providers

Current State

ADR-200 added MiniMax as the 5th provider in coditect-core's MoE system (backend)
Claude Code supports ANTHROPIC_BASE_URL and ANTHROPIC_AUTH_TOKEN env var overrides
ANTHROPIC_AUTH_TOKEN takes priority over ANTHROPIC_API_KEY in the auth chain
No built-in Claude Code mechanism for provider switching (no --provider flag)

Requirements

Switch providers with a single command (no file edits)
Maintain existing Anthropic API key for default Claude usage
Securely store per-provider API keys
Support both direct providers and proxy gateways
Allow model override at invocation time

Decision

Implement a shell alias-based multi-provider gateway using Claude Code's native environment variable override mechanism. Each provider gets a dedicated alias (claude-{provider}) that sets the required env vars inline.

Architecture

┌──────────────────────────────────────────────────────────┐
│                    User's Terminal                        │
│                                                          │
│  claude          → Anthropic (default, ANTHROPIC_API_KEY)│
│  claude-minimax  → MiniMax   (api.minimax.io/anthropic)  │
│  claude-deepseek → DeepSeek  (api.deepseek.com/anthropic)│
│  claude-kimi     → Kimi      (api.kimi.com/coding)    │
│  claude-glm      → GLM/z.ai  (api.z.ai/api/anthropic)   │
│  claude-openrouter → OpenRouter (openrouter.ai/api)      │
│                                                          │
│  All aliases → same `claude` binary                      │
│  Different env vars → different LLM backend              │
└──────────────────────────────────────────────────────────┘

Environment Variable Chain

Claude Code resolves credentials in this priority order:

ANTHROPIC_AUTH_TOKEN (highest — used by aliases)
ANTHROPIC_API_KEY (default — set in .zshrc)
Settings file (~/.claude/settings.json)

This means aliases using ANTHROPIC_AUTH_TOKEN naturally override the default ANTHROPIC_API_KEY without unsetting it.

Alias Pattern

Each alias follows this template:

# Standard pattern (API key from file)
alias claude-{provider}='\
  ANTHROPIC_BASE_URL="{endpoint}" \
  ANTHROPIC_AUTH_TOKEN="$(cat ~/.{Provider}AZ1.api.key 2>/dev/null | tr -d '"'"'[:space:]'"'"')" \
  ANTHROPIC_MODEL="{primary_model}" \
  ANTHROPIC_SMALL_FAST_MODEL="{fast_model}" \
  CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 \
  claude'

# Function pattern (Kimi — auto-starts token-refreshing proxy)
# Note: zsh requires `function name-with-hyphens { }` syntax, not `name() { }`
function claude-kimi {
  kimi-proxy --check >/dev/null 2>&1 || kimi-proxy
  ANTHROPIC_BASE_URL="http://127.0.0.1:18462" \
  ANTHROPIC_AUTH_TOKEN="$(kimi-token 2>/dev/null)" \
  ANTHROPIC_MODEL="kimi-k2.5" \
  ANTHROPIC_SMALL_FAST_MODEL="kimi-k2.5" \
  CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 \
  claude "$@"
}

Authentication

API Key Pattern (MiniMax, DeepSeek, GLM, OpenRouter)

API keys are stored in home directory files following the pattern ~/.{Provider}AZ1.api.key:

Provider	Key File	Key Source
MiniMax	`~/.MiniMaxAZ1.api.key`	https://platform.minimax.io
DeepSeek	`~/.DeepSeekAZ1.api.key`	https://platform.deepseek.com
GLM	`~/.GLMAZ1.api.key`	https://open.z.ai
OpenRouter	`~/.OpenRouterAZ1.api.key`	https://openrouter.ai/settings/keys

Security: Key files should be chmod 600 (owner-read-only). The cat ... | tr -d '[:space:]' pipeline strips any trailing newlines or whitespace from key files.

OAuth Pattern (Kimi)

Kimi uses OAuth device flow authentication (NOT platform API keys):

Login: Run kimi CLI → triggers browser-based OAuth login
Credentials: OAuth tokens stored at ~/.kimi/credentials/kimi-code.json
Token refresh: kimi-token helper script (~/.local/bin/kimi-token) auto-refreshes expired tokens using the refresh_token (30-day validity)
Access token: ~15 minute validity, refreshed on each claude-kimi invocation

Why OAuth? Kimi's api.kimi.com/coding endpoint is Anthropic Messages API-compatible but only accepts OAuth access tokens, not platform API keys (sk-kimi-* keys are for the separate api.moonshot.ai platform).

Token lifecycle:

kimi login → OAuth device flow → access_token (15m) + refresh_token (30d)
                                          ↓
claude-kimi → starts kimi-proxy (if not running) → ANTHROPIC_BASE_URL=localhost:18462
                                          ↓
kimi-proxy (daemon on port 18462) ←── Claude Code sends requests
     ↓
reads fresh token from ~/.kimi/credentials/kimi-code.json
     ↓ (refreshes via auth.kimi.com if <5m remaining)
forwards request to api.kimi.com/coding with fresh token
     ↓
streams response back to Claude Code

Token-refreshing proxy (kimi-proxy):

~/.local/bin/kimi-proxy — local HTTP proxy that injects fresh OAuth tokens per-request
Auto-started by claude-kimi function if not already running
Daemon on 127.0.0.1:18462, PID file at ~/.local/run/kimi-proxy.pid
Supports SSE streaming for real-time responses
Sessions run indefinitely — no 15-minute token expiry limit
Stop with: kimi-proxy --stop

Provider Configuration

Direct Providers (Anthropic Messages API-compatible)

Provider	Base URL	Primary Model	Fast Model	Context	Input $/M	Output $/M
MiniMax	`https://api.minimax.io/anthropic`	MiniMax-M2.5	MiniMax-M2.5	204K	$0.15	$1.20
DeepSeek	`https://api.deepseek.com/anthropic`	deepseek-reasoner	deepseek-chat	128K	$0.28	$0.42
Kimi	`https://api.kimi.com/coding`	kimi-k2.5	kimi-k2.5	128K	$0.60	$2.50
GLM	`https://api.z.ai/api/anthropic`	glm-4.6	glm-4.5-air	128K	$0.55	$2.20

Proxy Provider (Universal Gateway)

Provider	Base URL	Default Model	Access To
OpenRouter	`https://openrouter.ai/api`	anthropic/claude-sonnet-4	400+ models

OpenRouter model override at invocation:

ANTHROPIC_MODEL="google/gemini-2.5-pro" claude-openrouter
ANTHROPIC_MODEL="openai/gpt-4o" claude-openrouter
ANTHROPIC_MODEL="meta-llama/llama-4-maverick" claude-openrouter

Provider Quirks

Provider	Quirk	Mitigation
MiniMax	Returns `ThinkingBlock` + `TextBlock` by default (no extended thinking requested)	Iterate `response.content` for `type == "text"`, never assume `content[0]`
MiniMax	Temperature must be `> 0.0` (exclusive range)	Guard with `max(0.01, temperature)`
MiniMax	No vision/image input	Text-only tasks
DeepSeek	Cache hits at $0.028/M (10x cheaper than uncached)	Prefer for repetitive tasks
Kimi	OAuth-only auth — `sk-kimi-*` platform keys rejected on `api.kimi.com/coding`	Use `kimi-token` helper for OAuth access tokens
Kimi	Access tokens expire in ~15 minutes	`kimi-proxy` auto-refreshes per-request; sessions run indefinitely
Kimi	Accepts `kimi-k2.5` model name but returns `kimi-for-coding` in response	Parse `model` field accordingly
OpenRouter	5.5% fee on credit purchases (not per-token)	Buy credits in bulk
OpenRouter	Model names prefixed with `provider/`	Use `anthropic/claude-sonnet-4` format

SDK URL Construction

Critical: The Anthropic SDK appends /v1/messages to the base URL. Therefore, ANTHROPIC_BASE_URL must NOT include /v1:

ANTHROPIC_BASE_URL="https://api.kimi.com/coding"
                                                 ↓ SDK appends /v1/messages
Final URL: https://api.kimi.com/coding/v1/messages  ✓

ANTHROPIC_BASE_URL="https://api.kimi.com/coding/v1"  ← WRONG
                                                       ↓ SDK appends /v1/messages
Final URL: https://api.kimi.com/coding/v1/v1/messages  ✗ (404)

Env Var Reference

Variable	Purpose	Example
`ANTHROPIC_BASE_URL`	Provider API endpoint	`https://api.deepseek.com/anthropic`
`ANTHROPIC_AUTH_TOKEN`	Provider API key (overrides ANTHROPIC_API_KEY)	Read from key file
`ANTHROPIC_MODEL`	Primary model for main tasks	`deepseek-reasoner`
`ANTHROPIC_SMALL_FAST_MODEL`	Fast model for subtasks/subagents	`deepseek-chat`
`CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC`	Disable telemetry to Anthropic	`1`

Consequences

Positive

Zero friction — switch providers with a single command, no file edits
No interference — default claude command unchanged, aliases are additive
Secure — keys in separate files, ANTHROPIC_AUTH_TOKEN overrides without unsetting base key
Extensible — new providers added by creating one alias + one key file
Cost savings — DeepSeek is ~10x cheaper than Anthropic for many tasks
Redundancy — 5 independent providers, any outage has 4 fallbacks

Negative

Manual key management — users must obtain and save API keys per provider
No auto-routing — user must choose provider (unlike MoE backend which routes automatically)
Provider-specific quirks — ThinkingBlock, temperature ranges, model naming vary
Shell-specific — aliases work in zsh/bash but not in IDE integrations or scripts

Neutral

Telemetry disabled — CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1 prevents Anthropic usage tracking for non-Anthropic providers
Same Claude Code binary — all aliases invoke the same claude CLI, only the backend differs
OpenRouter double-hop — requests to OpenRouter add latency vs direct provider connections

Alternatives Considered

1. Claude Code `settings.json` Profiles

Configure multiple profiles in ~/.claude/settings.json with a --profile flag. Rejected: Claude Code doesn't support profiles. Would require upstream changes.

2. LiteLLM Proxy

Run a local LiteLLM proxy that translates between API formats. Rejected: Adds infrastructure (local proxy process), complexity, and latency. Unnecessary since target providers already support Anthropic API format.

3. Wrapper Script (`claude-provider`)

A shell script that reads a config file and sets env vars dynamically. Rejected: Over-engineered for 5 providers. Shell aliases are simpler, more transparent, and easier to debug.

4. Environment Module System

Use direnv or autoenv to set provider per project directory. Rejected: Project-level provider locking is too rigid. Users want to choose per-session.

ADR	Relationship
ADR-073	MoE Provider Flexibility — backend multi-provider support
ADR-122	Unified LLM Component Architecture — per-provider directory structure
ADR-190	Cross-LLM Bridge Architecture — vendor-agnostic orchestration layer
ADR-200	MiniMax Provider Integration — first Anthropic-compatible third-party provider

Implementation

Completed

5 shell aliases/functions added to ~/.zshrc
claude-minimax tested and working (MiniMax-M2.5)
ThinkingBlock fix deployed (f638bb56)
Key file convention established (~/.{Provider}AZ1.api.key)
Kimi OAuth auth via kimi-token helper (~/.local/bin/kimi-token)
kimi-proxy token-refreshing proxy (~/.local/bin/kimi-proxy) — infinite session support
claude-kimi tested and working (kimi-k2.5 via proxy)

Pending (User Action)

Obtain DeepSeek API key → save to ~/.DeepSeekAZ1.api.key
Obtain OpenRouter API key → save to ~/.OpenRouterAZ1.api.key
Obtain GLM API key → save to ~/.GLMAZ1.api.key

Track: H.3.10 (Multi-Provider LLM Gateway) Author: Hal Casteel Updated: 2026-02-15

Context​

Problem Statement​

Current State​

Requirements​

Decision​

Architecture​

Environment Variable Chain​

Alias Pattern​

Authentication​

API Key Pattern (MiniMax, DeepSeek, GLM, OpenRouter)​

OAuth Pattern (Kimi)​

Provider Configuration​

Direct Providers (Anthropic Messages API-compatible)​

Proxy Provider (Universal Gateway)​

Provider Quirks​

SDK URL Construction​

Env Var Reference​

Consequences​

Positive​

Negative​

Neutral​

Alternatives Considered​

1. Claude Code settings.json Profiles​

2. LiteLLM Proxy​

3. Wrapper Script (claude-provider)​

4. Environment Module System​

Related Decisions​

Implementation​

Completed​

Pending (User Action)​