Skip to main content

ADR-167: Claude Code CLI Integration for Zero-Cost LLM Access

Status: Proposed Date: 2026-02-09 Author: Claude (Opus 4.6) Deciders: Hal Casteel, Engineering Team Tags: llm, claude-code, cli, cost-optimization, sidecar

Context

The codestoryai/sidecar supports 12 LLM providers (OpenAI, Anthropic, Ollama, OpenRouter, Groq, Azure, etc.) via its llm_client crate. Each requires API keys and incurs per-token costs.

CODITECT's development workflow already relies on Claude Code CLI (claude), which provides access to Claude models through the user's existing Anthropic subscription at no additional per-token API cost. The serve.py API server for the UDOM Pipeline Navigator already demonstrated this pattern: claude --print --message <prompt> subprocess invocation with 5s rate limiting.

We need to integrate Claude Code CLI as an LLM provider in the sidecar to eliminate API costs during development.

Decision

Add ClaudeCodeCLI as a new LLM provider in the sidecar's llm_client crate, using subprocess invocation of the locally-installed claude CLI binary.

Implementation

  1. Add ClaudeCodeCLI variant to LLMProvider enum in llm_client/src/provider.rs
  2. Create ClaudeCodeCLIClient implementing the LLMClient trait
  3. Use tokio::process::Command for async subprocess management
  4. Implement rate limiting (5s minimum between calls) to respect CLI SLA
  5. Support streaming by reading stdout line-by-line
  6. Fall back to API providers if CLI is not installed or fails

Provider Priority Chain

ClaudeCodeCLI (zero cost) -> Ollama (free, local) -> OpenRouter (cheap) -> Anthropic API (direct)

Rate Limiting

ParameterValueRationale
Min delay between calls5 secondsRespect CLI rate limits
Max concurrent calls1CLI is single-threaded
Timeout per call120 secondsLong reasoning tasks
Retry on failure2 attemptsCLI can be flaky
Queue depth10Backpressure limit

Alternatives Considered

Alternative 1: Anthropic API Direct

Use Anthropic API with API keys stored in sidecar config.

Rejected as default because:

  • Per-token cost accumulates rapidly during development
  • User already pays for Claude Code subscription
  • API key management is an extra burden
  • Retained as fallback when CLI unavailable

Alternative 2: MCP Server Bridge

Create an MCP server that wraps Claude Code CLI and connect via mcp_client_rs.

Rejected because:

  • Over-engineered for subprocess invocation
  • MCP adds protocol overhead for a simple request-response
  • No benefit over direct subprocess for single-provider use

Alternative 3: Ollama with Claude-Compatible Model

Use Ollama with a local model (e.g., Llama 3) instead of Claude.

Rejected as primary because:

  • Significantly lower quality for code editing tasks
  • Retained as secondary fallback in the priority chain

Consequences

Positive

  • Zero incremental LLM cost for all sidecar operations
  • No API key configuration required
  • Uses same Claude model quality as direct API
  • Proven pattern: serve.py already validated this approach
  • Transparent fallback to API providers if CLI unavailable

Negative

  • 5s rate limit constrains throughput (1 request per 5 seconds max)
  • Single-threaded: no concurrent LLM requests
  • Depends on claude binary being installed and authenticated
  • No streaming support initially (stdout capture is batch)
  • CLI subprocess is ~500ms slower startup than direct API call

Mitigations

  • Queue with backpressure prevents overloading
  • MCTS planning batches multiple decisions into single prompts
  • Agent loop can pre-fetch context while waiting for rate limit
  • Streaming support can be added later by reading stdout incrementally
  • ADR-165: WASM Split Architecture
  • SDD: docs/architecture/browser-ide/SDD-CODITECT-BROWSER-IDE.md
  • serve.py: analyze-new-artifacts/udom-batch-runs/serve.py (reference implementation)