Skip to main content

ADR 005: Token Accounting System for Cost Analysis and Budget Management

ADR-005: Token Accounting System for Cost Analysis and Budget Management

Document: ADR-005-token-accounting
Version: 1.0.0
Purpose: Document architectural decisions for token usage tracking, cost analysis, and budget management via /cx and /cxq integration
Audience: Framework contributors, developers, project managers
Date Created: 2025-12-11
Status: ACCEPTED
Related ADRs:
- ADR-004-context-extraction-command (/cx)
- ADR-CXQ-context-query-system (/cxq)
Related Documents:
- commands/cx.md
- commands/cxq.md
- scripts/unified-message-extractor.py
- scripts/context-db.py

Context and Problem Statement

The Token Visibility Problem

AI-assisted development with Claude Code generates significant token usage across sessions, but developers and organizations lack visibility into:

  1. Per-session costs - How much each development session actually costs
  2. Per-project costs - Total token expenditure across all sessions for a project
  3. Usage patterns - Which types of work consume the most tokens
  4. Cache efficiency - How effective prompt caching is at reducing costs
  5. Budget tracking - Whether spending is within acceptable limits
  6. ROI analysis - Value delivered relative to token costs

Business Impact:

  • Unpredictable API costs for organizations
  • No data-driven optimization of AI usage patterns
  • Inability to justify AI tooling expenditure to stakeholders
  • Missing cost allocation for project budgeting
  • No visibility into cache effectiveness

Existing Data (Already Available)

Claude Code JSONL Session Files: Every assistant message in ~/.claude/projects/{project}/sessions/*.jsonl contains a usage field:

{
"type": "assistant",
"message": {
"id": "msg_...",
"type": "message",
"role": "assistant",
"content": [...],
"usage": {
"input_tokens": 12847,
"output_tokens": 3521,
"cache_read_input_tokens": 89234,
"cache_creation_input_tokens": 45123
}
},
"costUSD": 0.082437,
"durationMs": 4523
}

Available Metrics:

  • input_tokens - Fresh tokens sent to API (not cached)
  • output_tokens - Response tokens generated
  • cache_read_input_tokens - Tokens read from prompt cache (90% discount)
  • cache_creation_input_tokens - Tokens written to cache (25% premium)
  • costUSD - Pre-calculated cost (Claude Code calculates this)
  • durationMs - API call duration

Requirements

Must-Have (Critical):

  • Extract token usage from JSONL sessions during /cx processing
  • Store usage data in context.db for querying via /cxq
  • Per-session, per-project, and aggregate cost reporting
  • Cache efficiency metrics (read/create ratio)
  • Date range filtering for budget periods

Should-Have (Important):

  • Cost estimation with multiple pricing tiers (Opus/Sonnet/Haiku)
  • Comparison with pre-calculated costUSD values
  • Export to CSV/JSON for external analysis
  • Historical trend visualization data

Nice-to-Have:

  • Budget alerts when thresholds exceeded
  • Automatic cost optimization recommendations
  • Integration with billing systems

Decision Drivers

Technical Constraints

  • Must integrate with existing /cx extraction pipeline
  • Must use existing SQLite database (context.db)
  • Must work offline (no API calls for accounting)
  • Python-only implementation (stdlib preference)

User Experience Goals

  • Simple query commands (/cxq --tokens, /cxq --cost)
  • Default daily/weekly/monthly/project views
  • Human-readable output with formatting
  • JSON output for programmatic access

Data Quality Requirements

  • Handle sessions with missing usage data gracefully
  • Validate token counts against message content length
  • Preserve original timestamps for accurate date filtering
  • Support incremental updates as new sessions added

Decision Outcome

CHOSEN: Extend /cx and /cxq with Token Accounting Capabilities

Implementation Architecture

┌─────────────────────────────────────────────────────────────┐
│ Token Accounting Data Flow │
└─────────────────────────────────────────────────────────────┘

┌───────────────────┴───────────────────┐
│ ~/.claude/projects/{project}/sessions/│
│ *.jsonl (Raw Claude Code sessions) │
└───────────────────┬───────────────────┘


┌──────────────────────────────────────────┐
│ /cx (unified-message-extractor.py) │
│ - Extract messages (existing) │
│ - NEW: Extract usage field per message │
│ - NEW: Extract costUSD per message │
│ - NEW: Extract durationMs per call │
│ - Write to unified_messages.jsonl │
└──────────────────────────────────────────┘


┌──────────────────────────────────────────┐
│ context-db.py --index │
│ ┌────────────────────────────────────┐ │
│ │ NEW: token_usage table │ │
│ ├────────────────────────────────────┤ │
│ │ message_id - Link to messages │ │
│ │ session_id - Session reference │ │
│ │ project_path - Project reference │ │
│ │ input_tokens - Fresh input │ │
│ │ output_tokens - Response output │ │
│ │ cache_read - Cached input │ │
│ │ cache_create - Cache writes │ │
│ │ cost_usd - Calculated cost │ │
│ │ duration_ms - API latency │ │
│ │ timestamp - When occurred │ │
│ │ model - claude-opus-4, etc│ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘


┌──────────────────────────────────────────┐
│ /cxq Query Interface │
│ --tokens Summary by session │
│ --cost Cost breakdown │
│ --by-session Per-session detail │
│ --by-project Per-project totals │
│ --by-date Daily aggregation │
│ --cache-efficiency Cache effectiveness │
│ --budget N Alert if > $N │
│ --since/--until Date filtering │
│ --json Programmatic output │
└──────────────────────────────────────────┘

Core Components

1. Token Usage Extraction (unified-message-extractor.py extension)

New Fields to Extract:

def extract_token_usage(message: Dict[str, Any]) -> Dict[str, Any]:
"""Extract token usage from JSONL assistant message."""
usage = message.get('message', {}).get('usage', {})

return {
'input_tokens': usage.get('input_tokens', 0),
'output_tokens': usage.get('output_tokens', 0),
'cache_read_input_tokens': usage.get('cache_read_input_tokens', 0),
'cache_creation_input_tokens': usage.get('cache_creation_input_tokens', 0),
'cost_usd': message.get('costUSD', 0.0),
'duration_ms': message.get('durationMs', 0),
'model': extract_model_from_message(message) # claude-opus-4-5-20251101
}

Enhanced JSONL Message Format:

{
"hash": "a1b2c3d4...",
"content": "Full message text...",
"role": "assistant",
"provenance": {
"source_type": "jsonl",
"source_file": "/path/to/session.jsonl",
"session_id": "abc123"
},
"timestamps": {
"occurred": "2025-12-11T12:00:00Z",
"extracted_at": "2025-12-11T19:00:00Z"
},
"token_usage": {
"input_tokens": 12847,
"output_tokens": 3521,
"cache_read_input_tokens": 89234,
"cache_creation_input_tokens": 45123,
"cost_usd": 0.082437,
"duration_ms": 4523,
"model": "claude-opus-4-5-20251101"
}
}

2. Database Schema Extension (context-db.py)

-- Token usage tracking table
CREATE TABLE token_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id INTEGER, -- FK to messages table (nullable for aggregate entries)
session_id TEXT NOT NULL, -- Session identifier
project_path TEXT, -- Project root path

-- Token counts
input_tokens INTEGER DEFAULT 0, -- Fresh tokens (not cached)
output_tokens INTEGER DEFAULT 0, -- Generated response tokens
cache_read_input_tokens INTEGER DEFAULT 0, -- Tokens read from cache
cache_creation_input_tokens INTEGER DEFAULT 0, -- Tokens written to cache

-- Cost tracking
cost_usd REAL DEFAULT 0.0, -- Pre-calculated cost from Claude Code
estimated_cost_usd REAL, -- Our calculated estimate (for validation)

-- Performance metrics
duration_ms INTEGER DEFAULT 0, -- API call duration

-- Metadata
model TEXT, -- Model used (claude-opus-4-5-20251101)
timestamp TEXT NOT NULL, -- When the API call occurred
indexed_at TEXT DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE SET NULL
);

-- Indexes for common query patterns
CREATE INDEX idx_token_usage_session ON token_usage(session_id);
CREATE INDEX idx_token_usage_project ON token_usage(project_path);
CREATE INDEX idx_token_usage_timestamp ON token_usage(timestamp);
CREATE INDEX idx_token_usage_model ON token_usage(model);

-- Aggregate views for performance
CREATE VIEW token_usage_by_session AS
SELECT
session_id,
project_path,
COUNT(*) as api_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_input_tokens) as total_cache_read,
SUM(cache_creation_input_tokens) as total_cache_create,
SUM(cost_usd) as total_cost,
SUM(duration_ms) as total_duration,
MIN(timestamp) as first_call,
MAX(timestamp) as last_call
FROM token_usage
GROUP BY session_id, project_path;

CREATE VIEW token_usage_by_project AS
SELECT
project_path,
COUNT(DISTINCT session_id) as sessions,
COUNT(*) as api_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_input_tokens) as total_cache_read,
SUM(cache_creation_input_tokens) as total_cache_create,
SUM(cost_usd) as total_cost,
MIN(timestamp) as first_call,
MAX(timestamp) as last_call
FROM token_usage
GROUP BY project_path;

CREATE VIEW token_usage_by_date AS
SELECT
DATE(timestamp) as date,
COUNT(DISTINCT session_id) as sessions,
COUNT(*) as api_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_input_tokens) as total_cache_read,
SUM(cost_usd) as total_cost
FROM token_usage
GROUP BY DATE(timestamp)
ORDER BY date DESC;

3. Query Commands (/cxq extensions)

New Command Options:

# Token usage summary
/cxq --tokens # Overall token summary
/cxq --tokens --by-session # Per-session breakdown
/cxq --tokens --by-project # Per-project breakdown
/cxq --tokens --by-date # Daily breakdown

# Cost analysis
/cxq --cost # Total cost summary
/cxq --cost --by-session # Per-session costs
/cxq --cost --by-project # Per-project costs
/cxq --cost --by-date # Daily costs
/cxq --cost --by-model # Costs per model tier

# Cache efficiency
/cxq --cache-efficiency # Cache hit ratio, savings
/cxq --cache-efficiency --by-session # Per-session cache stats

# Date filtering (works with all above)
/cxq --cost --today # Today's costs
/cxq --cost --week # Last 7 days
/cxq --cost --month # Last 30 days
/cxq --cost --since 2025-12-01 # Since specific date
/cxq --cost --until 2025-12-31 # Until specific date

# Budget alerts
/cxq --budget 100 # Alert if monthly cost > $100
/cxq --budget 50 --by-project # Per-project budget check

# Output formats
/cxq --cost --json # JSON output for scripting
/cxq --cost --csv # CSV for spreadsheet import

Example Output:

$ /cxq --cost --today

Token Cost Summary (2025-12-11)
============================================================

Sessions: 3
API Calls: 127
Total Duration: 23.4 min

Token Breakdown:
Input Tokens: 185,432 (fresh)
Output Tokens: 67,891
Cache Read: 2,341,567 (saved $35.12 via caching)
Cache Creation: 123,456

Cost Breakdown:
Input Cost: $2.78
Output Cost: $10.18
Cache Read Cost: $3.51 (90% discount applied)
Cache Create Cost: $0.46 (25% premium)
─────────────────────
Total Cost: $16.93
Pre-calculated: $16.89 (difference: $0.04)

Cache Efficiency:
Cache Hit Rate: 92.7%
Estimated Savings: $35.12 (67.5%)

============================================================

$ /cxq --cost --by-project --json
{
"projects": [
{
"project_path": "/Users/hal/PROJECTS/coditect-rollout-master",
"sessions": 459,
"api_calls": 47841,
"input_tokens": 6824731,
"output_tokens": 19247382,
"cache_read_tokens": 4200000000,
"cache_create_tokens": 558000000,
"total_cost_usd": 9902.47,
"first_call": "2025-10-15T08:23:41Z",
"last_call": "2025-12-11T19:45:23Z"
}
],
"summary": {
"total_projects": 1,
"total_sessions": 459,
"total_cost_usd": 9902.47,
"generated_at": "2025-12-11T20:00:00Z"
}
}

4. Cost Calculation Logic

Pricing Model (Claude Opus 4.5 as of Dec 2025):

PRICING = {
'claude-opus-4-5-20251101': {
'input_per_1m': 15.00, # $15/M input tokens
'output_per_1m': 75.00, # $75/M output tokens
'cache_read_per_1m': 1.50, # $1.50/M (90% discount)
'cache_create_per_1m': 18.75, # $18.75/M (25% premium)
},
'claude-sonnet-4-20250514': {
'input_per_1m': 3.00,
'output_per_1m': 15.00,
'cache_read_per_1m': 0.30,
'cache_create_per_1m': 3.75,
},
'claude-3-5-haiku-20241022': {
'input_per_1m': 0.80,
'output_per_1m': 4.00,
'cache_read_per_1m': 0.08,
'cache_create_per_1m': 1.00,
}
}

def calculate_cost(usage: Dict[str, int], model: str) -> float:
"""Calculate estimated cost from token usage."""
rates = PRICING.get(model, PRICING['claude-opus-4-5-20251101'])

cost = (
(usage['input_tokens'] / 1_000_000) * rates['input_per_1m'] +
(usage['output_tokens'] / 1_000_000) * rates['output_per_1m'] +
(usage['cache_read_input_tokens'] / 1_000_000) * rates['cache_read_per_1m'] +
(usage['cache_creation_input_tokens'] / 1_000_000) * rates['cache_create_per_1m']
)

return round(cost, 6)

def calculate_cache_savings(usage: Dict[str, int], model: str) -> float:
"""Calculate how much was saved via prompt caching."""
rates = PRICING.get(model, PRICING['claude-opus-4-5-20251101'])

# What we would have paid without caching
full_price = (usage['cache_read_input_tokens'] / 1_000_000) * rates['input_per_1m']
# What we actually paid (90% discount)
cached_price = (usage['cache_read_input_tokens'] / 1_000_000) * rates['cache_read_per_1m']

return round(full_price - cached_price, 6)

Storage Specifications

Token Usage in unified_messages.jsonl:

{
"hash": "sha256...",
"content": "...",
"role": "assistant",
"provenance": {...},
"timestamps": {...},
"token_usage": {
"input_tokens": 12847,
"output_tokens": 3521,
"cache_read_input_tokens": 89234,
"cache_creation_input_tokens": 45123,
"cost_usd": 0.082437,
"duration_ms": 4523,
"model": "claude-opus-4-5-20251101"
}
}

Note: Only assistant messages contain usage data. User messages and tool calls do not have token counts.

Consequences

Positive Consequences

Cost Visibility:

  • Full transparency into AI development costs
  • Per-project cost tracking for budgeting
  • Historical trends for forecasting
  • Cache efficiency optimization opportunities

Budget Management:

  • Alerts when spending exceeds thresholds
  • Data for cost allocation to projects/teams
  • Evidence for ROI calculations
  • Justification for AI tooling investment

Performance Insights:

  • API latency tracking (duration_ms)
  • Cache hit ratio optimization
  • Model usage patterns
  • Session efficiency comparison

Integration Benefits:

  • Builds on existing /cx and /cxq infrastructure
  • No additional dependencies required
  • Offline analysis (no API calls)
  • JSON export for external tools

Negative Consequences

Storage Overhead:

  • Additional ~200 bytes per assistant message
  • token_usage table grows with usage
  • Database size increase ~10-15%

Processing Time:

  • Slightly longer /cx extraction
  • Additional indexing pass for token data
  • Mitigated by incremental processing

Data Accuracy:

  • Only captures assistant messages (not streaming chunks)
  • Cost estimates may drift if pricing changes
  • Some sessions may have missing usage data (older formats)

Risk Mitigation

Missing Usage Data:

  • Graceful handling of messages without usage field
  • Default to 0 for missing values
  • Log warnings for incomplete data

Pricing Changes:

  • Store raw token counts (not just costs)
  • Configurable pricing table
  • Can recalculate costs retroactively

Database Growth:

  • Views provide aggregated data efficiently
  • Optional archival of old detailed records
  • Indexes optimize query performance

Implementation Plan

Phase 1: Extraction (Priority 1)

  1. Extend unified-message-extractor.py to extract usage fields
  2. Add token_usage to JSONL output format
  3. Backward compatible with existing data

Phase 2: Storage (Priority 1)

  1. Add token_usage table to context.db schema
  2. Create indexes and aggregation views
  3. Migrate existing data (re-run /cx)

Phase 3: Queries (Priority 1)

  1. Add --tokens, --cost options to /cxq
  2. Implement date filtering
  3. Add JSON/CSV output formats

Phase 4: Analysis (Priority 2)

  1. Cache efficiency reporting
  2. Budget alert functionality
  3. Trend visualization data

Validation Criteria

Functional:

  • /cx extracts token usage from JSONL sessions (via unified-message-extractor.py)
  • token_usage table populated during --index (context-db.py index_messages())
  • /cxq --tokens returns session summaries
  • /cxq --cost returns cost breakdowns
  • Date filtering works correctly (--since, --until)
  • JSON output is valid and complete (--json flag)

Additional Commands Implemented:

  • /cxq --by-session - Token usage breakdown by session
  • /cxq --by-project - Token usage breakdown by project
  • /cxq --by-date - Token usage breakdown by date
  • /cxq --by-model - Token usage breakdown by model
  • /cxq --cache-efficiency - Prompt cache statistics
  • /cxq --token-stats - Full token statistics

Performance:

  • Extraction adds <10% overhead to /cx
  • Queries complete in <1s for 50K messages
  • Aggregate views pre-compute common queries (4 views created)

Accuracy:

  • Uses pre-calculated costUSD from Claude Code (most accurate)
  • Token counts match content length proportionally
  • Cache efficiency calculations implemented (cache hit rate, estimated savings)

Upstream (Data Sources)

  • Claude Code JSONL sessions: ~/.claude/projects/
  • costUSD pre-calculated by Claude Code CLI

Downstream (Consumers)

  • /cxq query interface
  • Budget management workflows
  • External analytics tools (via JSON/CSV export)

Integration Points

  • Extends existing /cx extraction pipeline
  • Uses existing context.db database
  • Compatible with existing /cxq commands

Alternatives Considered

Alternative 1: Separate Token Accounting Tool

Pros: Clean separation of concerns Cons: Duplicate data extraction, different storage Rejected: Violates DRY principle, fragments tooling

Alternative 2: External Analytics Service

Pros: Advanced visualization, real-time updates Cons: Cloud dependency, privacy concerns, cost Rejected: Violates offline-first requirement

Alternative 3: Spreadsheet Export Only

Pros: Simple implementation, familiar tool Cons: No query capability, manual process Rejected: Insufficient for programmatic access

References

Documentation

External References


Status: ACCEPTED Decision Date: 2025-12-11 Implementation Status: IMPLEMENTED (Phase 1-3 complete as of 2025-12-12) Maintainer: CODITECT Core Team Review Date: 2026-03-11 (quarterly review)


Document Version: 1.1.0 Last Updated: 2025-12-12 Author: CODITECT Architecture Team Copyright: 2025 AZ1.AI INC. All rights reserved.

Implementation Summary (2025-12-12)

Files Modified:

  • scripts/context-db.py - Added token_usage table, views, indexes, query functions, CLI handlers
  • scripts/unified-message-extractor.py - Added extract_token_usage() and token_usage field extraction

Schema Added:

  • token_usage table with 14 columns (message_id, session_id, project_path, input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, cost_usd, estimated_cost_usd, duration_ms, model, timestamp, indexed_at)
  • 4 indexes for common query patterns
  • 4 aggregate views (by_session, by_project, by_date, by_model)

CLI Commands Added:

  • --tokens / --cost - Token usage summary
  • --by-session - Per-session breakdown
  • --by-project - Per-project breakdown
  • --by-date - Daily usage trend
  • --by-model - Per-model breakdown
  • --cache-efficiency - Cache hit rate and savings
  • --token-stats - Full statistics