ADR-167: Claude Code CLI Integration for Zero-Cost LLM Access

Status: Proposed Date: 2026-02-09 Author: Claude (Opus 4.6) Deciders: Hal Casteel, Engineering Team Tags: llm, claude-code, cli, cost-optimization, sidecar

Context

The codestoryai/sidecar supports 12 LLM providers (OpenAI, Anthropic, Ollama, OpenRouter, Groq, Azure, etc.) via its llm_client crate. Each requires API keys and incurs per-token costs.

CODITECT's development workflow already relies on Claude Code CLI (claude), which provides access to Claude models through the user's existing Anthropic subscription at no additional per-token API cost. The serve.py API server for the UDOM Pipeline Navigator already demonstrated this pattern: claude --print --message <prompt> subprocess invocation with 5s rate limiting.

We need to integrate Claude Code CLI as an LLM provider in the sidecar to eliminate API costs during development.

Decision

Add ClaudeCodeCLI as a new LLM provider in the sidecar's llm_client crate, using subprocess invocation of the locally-installed claude CLI binary.

Implementation

Add ClaudeCodeCLI variant to LLMProvider enum in llm_client/src/provider.rs
Create ClaudeCodeCLIClient implementing the LLMClient trait
Use tokio::process::Command for async subprocess management
Implement rate limiting (5s minimum between calls) to respect CLI SLA
Support streaming by reading stdout line-by-line
Fall back to API providers if CLI is not installed or fails

Provider Priority Chain

ClaudeCodeCLI (zero cost) -> Ollama (free, local) -> OpenRouter (cheap) -> Anthropic API (direct)

Rate Limiting

Parameter	Value	Rationale
Min delay between calls	5 seconds	Respect CLI rate limits
Max concurrent calls	1	CLI is single-threaded
Timeout per call	120 seconds	Long reasoning tasks
Retry on failure	2 attempts	CLI can be flaky
Queue depth	10	Backpressure limit

Alternatives Considered

Alternative 1: Anthropic API Direct

Use Anthropic API with API keys stored in sidecar config.

Rejected as default because:

Per-token cost accumulates rapidly during development
User already pays for Claude Code subscription
API key management is an extra burden
Retained as fallback when CLI unavailable

Alternative 2: MCP Server Bridge

Create an MCP server that wraps Claude Code CLI and connect via mcp_client_rs.

Rejected because:

Over-engineered for subprocess invocation
MCP adds protocol overhead for a simple request-response
No benefit over direct subprocess for single-provider use

Alternative 3: Ollama with Claude-Compatible Model

Use Ollama with a local model (e.g., Llama 3) instead of Claude.

Rejected as primary because:

Significantly lower quality for code editing tasks
Retained as secondary fallback in the priority chain

Consequences

Positive

Zero incremental LLM cost for all sidecar operations
No API key configuration required
Uses same Claude model quality as direct API
Proven pattern: serve.py already validated this approach
Transparent fallback to API providers if CLI unavailable

Negative

5s rate limit constrains throughput (1 request per 5 seconds max)
Single-threaded: no concurrent LLM requests
Depends on claude binary being installed and authenticated
No streaming support initially (stdout capture is batch)
CLI subprocess is ~500ms slower startup than direct API call

Mitigations

Queue with backpressure prevents overloading
MCTS planning batches multiple decisions into single prompts
Agent loop can pre-fetch context while waiting for rate limit
Streaming support can be added later by reading stdout incrementally

ADR-165: WASM Split Architecture
SDD: docs/architecture/browser-ide/SDD-CODITECT-BROWSER-IDE.md
serve.py: analyze-new-artifacts/udom-batch-runs/serve.py (reference implementation)

Context​

Decision​

Implementation​

Provider Priority Chain​

Rate Limiting​

Alternatives Considered​

Alternative 1: Anthropic API Direct​

Alternative 2: MCP Server Bridge​

Alternative 3: Ollama with Claude-Compatible Model​

Consequences​

Positive​

Negative​

Mitigations​

Related​