ADR 005: Token Accounting System for Cost Analysis and Budget Management

ADR-005: Token Accounting System for Cost Analysis and Budget Management

Document: ADR-005-token-accounting
Version: 1.0.0
Purpose: Document architectural decisions for token usage tracking, cost analysis, and budget management via /cx and /cxq integration
Audience: Framework contributors, developers, project managers
Date Created: 2025-12-11
Status: ACCEPTED
Related ADRs:
  - ADR-004-context-extraction-command (/cx)
  - ADR-CXQ-context-query-system (/cxq)
Related Documents:
  - commands/cx.md
  - commands/cxq.md
  - scripts/unified-message-extractor.py
  - scripts/context-db.py

Context and Problem Statement

The Token Visibility Problem

AI-assisted development with Claude Code generates significant token usage across sessions, but developers and organizations lack visibility into:

Per-session costs - How much each development session actually costs
Per-project costs - Total token expenditure across all sessions for a project
Usage patterns - Which types of work consume the most tokens
Cache efficiency - How effective prompt caching is at reducing costs
Budget tracking - Whether spending is within acceptable limits
ROI analysis - Value delivered relative to token costs

Business Impact:

Unpredictable API costs for organizations
No data-driven optimization of AI usage patterns
Inability to justify AI tooling expenditure to stakeholders
Missing cost allocation for project budgeting
No visibility into cache effectiveness

Existing Data (Already Available)

Claude Code JSONL Session Files: Every assistant message in ~/.claude/projects/{project}/sessions/*.jsonl contains a usage field:

{
  "type": "assistant",
  "message": {
    "id": "msg_...",
    "type": "message",
    "role": "assistant",
    "content": [...],
    "usage": {
      "input_tokens": 12847,
      "output_tokens": 3521,
      "cache_read_input_tokens": 89234,
      "cache_creation_input_tokens": 45123
    }
  },
  "costUSD": 0.082437,
  "durationMs": 4523
}

Available Metrics:

input_tokens - Fresh tokens sent to API (not cached)
output_tokens - Response tokens generated
cache_read_input_tokens - Tokens read from prompt cache (90% discount)
cache_creation_input_tokens - Tokens written to cache (25% premium)
costUSD - Pre-calculated cost (Claude Code calculates this)
durationMs - API call duration

Requirements

Must-Have (Critical):

Extract token usage from JSONL sessions during /cx processing
Store usage data in context.db for querying via /cxq
Per-session, per-project, and aggregate cost reporting
Cache efficiency metrics (read/create ratio)
Date range filtering for budget periods

Should-Have (Important):

Cost estimation with multiple pricing tiers (Opus/Sonnet/Haiku)
Comparison with pre-calculated costUSD values
Export to CSV/JSON for external analysis
Historical trend visualization data

Nice-to-Have:

Budget alerts when thresholds exceeded
Automatic cost optimization recommendations
Integration with billing systems

Decision Drivers

Technical Constraints

Must integrate with existing /cx extraction pipeline
Must use existing SQLite database (context.db)
Must work offline (no API calls for accounting)
Python-only implementation (stdlib preference)

User Experience Goals

Simple query commands (/cxq --tokens, /cxq --cost)
Default daily/weekly/monthly/project views
Human-readable output with formatting
JSON output for programmatic access

Data Quality Requirements

Handle sessions with missing usage data gracefully
Validate token counts against message content length
Preserve original timestamps for accurate date filtering
Support incremental updates as new sessions added

Decision Outcome

CHOSEN: Extend /cx and /cxq with Token Accounting Capabilities

Implementation Architecture

┌─────────────────────────────────────────────────────────────┐
│            Token Accounting Data Flow                        │
└─────────────────────────────────────────────────────────────┘
                              │
          ┌───────────────────┴───────────────────┐
          │ ~/.claude/projects/{project}/sessions/│
          │ *.jsonl (Raw Claude Code sessions)    │
          └───────────────────┬───────────────────┘
                              │
                              ▼
        ┌──────────────────────────────────────────┐
        │ /cx (unified-message-extractor.py)       │
        │ - Extract messages (existing)            │
        │ - NEW: Extract usage field per message   │
        │ - NEW: Extract costUSD per message       │
        │ - NEW: Extract durationMs per call       │
        │ - Write to unified_messages.jsonl        │
        └──────────────────────────────────────────┘
                              │
                              ▼
        ┌──────────────────────────────────────────┐
        │ context-db.py --index                    │
        │ ┌────────────────────────────────────┐   │
        │ │ NEW: token_usage table             │   │
        │ ├────────────────────────────────────┤   │
        │ │ message_id     - Link to messages  │   │
        │ │ session_id     - Session reference │   │
        │ │ project_path   - Project reference │   │
        │ │ input_tokens   - Fresh input       │   │
        │ │ output_tokens  - Response output   │   │
        │ │ cache_read     - Cached input      │   │
        │ │ cache_create   - Cache writes      │   │
        │ │ cost_usd       - Calculated cost   │   │
        │ │ duration_ms    - API latency       │   │
        │ │ timestamp      - When occurred     │   │
        │ │ model          - claude-opus-4, etc│   │
        │ └────────────────────────────────────┘   │
        └──────────────────────────────────────────┘
                              │
                              ▼
        ┌──────────────────────────────────────────┐
        │ /cxq Query Interface                     │
        │ --tokens           Summary by session    │
        │ --cost             Cost breakdown        │
        │ --by-session       Per-session detail    │
        │ --by-project       Per-project totals    │
        │ --by-date          Daily aggregation     │
        │ --cache-efficiency Cache effectiveness   │
        │ --budget N         Alert if > $N         │
        │ --since/--until    Date filtering        │
        │ --json             Programmatic output   │
        └──────────────────────────────────────────┘

Core Components

1. Token Usage Extraction (unified-message-extractor.py extension)

New Fields to Extract:

def extract_token_usage(message: Dict[str, Any]) -> Dict[str, Any]:
    """Extract token usage from JSONL assistant message."""
    usage = message.get('message', {}).get('usage', {})

    return {
        'input_tokens': usage.get('input_tokens', 0),
        'output_tokens': usage.get('output_tokens', 0),
        'cache_read_input_tokens': usage.get('cache_read_input_tokens', 0),
        'cache_creation_input_tokens': usage.get('cache_creation_input_tokens', 0),
        'cost_usd': message.get('costUSD', 0.0),
        'duration_ms': message.get('durationMs', 0),
        'model': extract_model_from_message(message)  # claude-opus-4-5-20251101
    }

Enhanced JSONL Message Format:

{
  "hash": "a1b2c3d4...",
  "content": "Full message text...",
  "role": "assistant",
  "provenance": {
    "source_type": "jsonl",
    "source_file": "/path/to/session.jsonl",
    "session_id": "abc123"
  },
  "timestamps": {
    "occurred": "2025-12-11T12:00:00Z",
    "extracted_at": "2025-12-11T19:00:00Z"
  },
  "token_usage": {
    "input_tokens": 12847,
    "output_tokens": 3521,
    "cache_read_input_tokens": 89234,
    "cache_creation_input_tokens": 45123,
    "cost_usd": 0.082437,
    "duration_ms": 4523,
    "model": "claude-opus-4-5-20251101"
  }
}

2. Database Schema Extension (context-db.py)

-- Token usage tracking table
CREATE TABLE token_usage (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    message_id INTEGER,                      -- FK to messages table (nullable for aggregate entries)
    session_id TEXT NOT NULL,                -- Session identifier
    project_path TEXT,                       -- Project root path

    -- Token counts
    input_tokens INTEGER DEFAULT 0,          -- Fresh tokens (not cached)
    output_tokens INTEGER DEFAULT 0,         -- Generated response tokens
    cache_read_input_tokens INTEGER DEFAULT 0,   -- Tokens read from cache
    cache_creation_input_tokens INTEGER DEFAULT 0, -- Tokens written to cache

    -- Cost tracking
    cost_usd REAL DEFAULT 0.0,               -- Pre-calculated cost from Claude Code
    estimated_cost_usd REAL,                 -- Our calculated estimate (for validation)

    -- Performance metrics
    duration_ms INTEGER DEFAULT 0,           -- API call duration

    -- Metadata
    model TEXT,                              -- Model used (claude-opus-4-5-20251101)
    timestamp TEXT NOT NULL,                 -- When the API call occurred
    indexed_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE SET NULL
);

-- Indexes for common query patterns
CREATE INDEX idx_token_usage_session ON token_usage(session_id);
CREATE INDEX idx_token_usage_project ON token_usage(project_path);
CREATE INDEX idx_token_usage_timestamp ON token_usage(timestamp);
CREATE INDEX idx_token_usage_model ON token_usage(model);

-- Aggregate views for performance
CREATE VIEW token_usage_by_session AS
SELECT
    session_id,
    project_path,
    COUNT(*) as api_calls,
    SUM(input_tokens) as total_input,
    SUM(output_tokens) as total_output,
    SUM(cache_read_input_tokens) as total_cache_read,
    SUM(cache_creation_input_tokens) as total_cache_create,
    SUM(cost_usd) as total_cost,
    SUM(duration_ms) as total_duration,
    MIN(timestamp) as first_call,
    MAX(timestamp) as last_call
FROM token_usage
GROUP BY session_id, project_path;

CREATE VIEW token_usage_by_project AS
SELECT
    project_path,
    COUNT(DISTINCT session_id) as sessions,
    COUNT(*) as api_calls,
    SUM(input_tokens) as total_input,
    SUM(output_tokens) as total_output,
    SUM(cache_read_input_tokens) as total_cache_read,
    SUM(cache_creation_input_tokens) as total_cache_create,
    SUM(cost_usd) as total_cost,
    MIN(timestamp) as first_call,
    MAX(timestamp) as last_call
FROM token_usage
GROUP BY project_path;

CREATE VIEW token_usage_by_date AS
SELECT
    DATE(timestamp) as date,
    COUNT(DISTINCT session_id) as sessions,
    COUNT(*) as api_calls,
    SUM(input_tokens) as total_input,
    SUM(output_tokens) as total_output,
    SUM(cache_read_input_tokens) as total_cache_read,
    SUM(cost_usd) as total_cost
FROM token_usage
GROUP BY DATE(timestamp)
ORDER BY date DESC;

3. Query Commands (/cxq extensions)

New Command Options:

# Token usage summary
/cxq --tokens                    # Overall token summary
/cxq --tokens --by-session       # Per-session breakdown
/cxq --tokens --by-project       # Per-project breakdown
/cxq --tokens --by-date          # Daily breakdown

# Cost analysis
/cxq --cost                      # Total cost summary
/cxq --cost --by-session         # Per-session costs
/cxq --cost --by-project         # Per-project costs
/cxq --cost --by-date            # Daily costs
/cxq --cost --by-model           # Costs per model tier

# Cache efficiency
/cxq --cache-efficiency          # Cache hit ratio, savings
/cxq --cache-efficiency --by-session  # Per-session cache stats

# Date filtering (works with all above)
/cxq --cost --today              # Today's costs
/cxq --cost --week               # Last 7 days
/cxq --cost --month              # Last 30 days
/cxq --cost --since 2025-12-01   # Since specific date
/cxq --cost --until 2025-12-31   # Until specific date

# Budget alerts
/cxq --budget 100                # Alert if monthly cost > $100
/cxq --budget 50 --by-project    # Per-project budget check

# Output formats
/cxq --cost --json               # JSON output for scripting
/cxq --cost --csv                # CSV for spreadsheet import

Example Output:

$ /cxq --cost --today

Token Cost Summary (2025-12-11)
============================================================

Sessions:        3
API Calls:       127
Total Duration:  23.4 min

Token Breakdown:
  Input Tokens:         185,432 (fresh)
  Output Tokens:        67,891
  Cache Read:           2,341,567 (saved $35.12 via caching)
  Cache Creation:       123,456

Cost Breakdown:
  Input Cost:           $2.78
  Output Cost:          $10.18
  Cache Read Cost:      $3.51 (90% discount applied)
  Cache Create Cost:    $0.46 (25% premium)
  ─────────────────────
  Total Cost:           $16.93
  Pre-calculated:       $16.89 (difference: $0.04)

Cache Efficiency:
  Cache Hit Rate:       92.7%
  Estimated Savings:    $35.12 (67.5%)

============================================================

$ /cxq --cost --by-project --json
{
  "projects": [
    {
      "project_path": "/Users/hal/PROJECTS/coditect-rollout-master",
      "sessions": 459,
      "api_calls": 47841,
      "input_tokens": 6824731,
      "output_tokens": 19247382,
      "cache_read_tokens": 4200000000,
      "cache_create_tokens": 558000000,
      "total_cost_usd": 9902.47,
      "first_call": "2025-10-15T08:23:41Z",
      "last_call": "2025-12-11T19:45:23Z"
    }
  ],
  "summary": {
    "total_projects": 1,
    "total_sessions": 459,
    "total_cost_usd": 9902.47,
    "generated_at": "2025-12-11T20:00:00Z"
  }
}

4. Cost Calculation Logic

Pricing Model (Claude Opus 4.5 as of Dec 2025):

PRICING = {
    'claude-opus-4-5-20251101': {
        'input_per_1m': 15.00,         # $15/M input tokens
        'output_per_1m': 75.00,        # $75/M output tokens
        'cache_read_per_1m': 1.50,     # $1.50/M (90% discount)
        'cache_create_per_1m': 18.75,  # $18.75/M (25% premium)
    },
    'claude-sonnet-4-20250514': {
        'input_per_1m': 3.00,
        'output_per_1m': 15.00,
        'cache_read_per_1m': 0.30,
        'cache_create_per_1m': 3.75,
    },
    'claude-3-5-haiku-20241022': {
        'input_per_1m': 0.80,
        'output_per_1m': 4.00,
        'cache_read_per_1m': 0.08,
        'cache_create_per_1m': 1.00,
    }
}

def calculate_cost(usage: Dict[str, int], model: str) -> float:
    """Calculate estimated cost from token usage."""
    rates = PRICING.get(model, PRICING['claude-opus-4-5-20251101'])

    cost = (
        (usage['input_tokens'] / 1_000_000) * rates['input_per_1m'] +
        (usage['output_tokens'] / 1_000_000) * rates['output_per_1m'] +
        (usage['cache_read_input_tokens'] / 1_000_000) * rates['cache_read_per_1m'] +
        (usage['cache_creation_input_tokens'] / 1_000_000) * rates['cache_create_per_1m']
    )

    return round(cost, 6)

def calculate_cache_savings(usage: Dict[str, int], model: str) -> float:
    """Calculate how much was saved via prompt caching."""
    rates = PRICING.get(model, PRICING['claude-opus-4-5-20251101'])

    # What we would have paid without caching
    full_price = (usage['cache_read_input_tokens'] / 1_000_000) * rates['input_per_1m']
    # What we actually paid (90% discount)
    cached_price = (usage['cache_read_input_tokens'] / 1_000_000) * rates['cache_read_per_1m']

    return round(full_price - cached_price, 6)

Storage Specifications

Token Usage in unified_messages.jsonl:

{
  "hash": "sha256...",
  "content": "...",
  "role": "assistant",
  "provenance": {...},
  "timestamps": {...},
  "token_usage": {
    "input_tokens": 12847,
    "output_tokens": 3521,
    "cache_read_input_tokens": 89234,
    "cache_creation_input_tokens": 45123,
    "cost_usd": 0.082437,
    "duration_ms": 4523,
    "model": "claude-opus-4-5-20251101"
  }
}

Note: Only assistant messages contain usage data. User messages and tool calls do not have token counts.

Consequences

Positive Consequences

Cost Visibility:

Full transparency into AI development costs
Per-project cost tracking for budgeting
Historical trends for forecasting
Cache efficiency optimization opportunities

Budget Management:

Alerts when spending exceeds thresholds
Data for cost allocation to projects/teams
Evidence for ROI calculations
Justification for AI tooling investment

Performance Insights:

API latency tracking (duration_ms)
Cache hit ratio optimization
Model usage patterns
Session efficiency comparison

Integration Benefits:

Builds on existing /cx and /cxq infrastructure
No additional dependencies required
Offline analysis (no API calls)
JSON export for external tools

Negative Consequences

Storage Overhead:

Additional ~200 bytes per assistant message
token_usage table grows with usage
Database size increase ~10-15%

Processing Time:

Slightly longer /cx extraction
Additional indexing pass for token data
Mitigated by incremental processing

Data Accuracy:

Only captures assistant messages (not streaming chunks)
Cost estimates may drift if pricing changes
Some sessions may have missing usage data (older formats)

Risk Mitigation

Missing Usage Data:

Graceful handling of messages without usage field
Default to 0 for missing values
Log warnings for incomplete data

Pricing Changes:

Store raw token counts (not just costs)
Configurable pricing table
Can recalculate costs retroactively

Database Growth:

Views provide aggregated data efficiently
Optional archival of old detailed records
Indexes optimize query performance

Implementation Plan

Phase 1: Extraction (Priority 1)

Extend unified-message-extractor.py to extract usage fields
Add token_usage to JSONL output format
Backward compatible with existing data

Phase 2: Storage (Priority 1)

Add token_usage table to context.db schema
Create indexes and aggregation views
Migrate existing data (re-run /cx)

Phase 3: Queries (Priority 1)

Add --tokens, --cost options to /cxq
Implement date filtering
Add JSON/CSV output formats

Phase 4: Analysis (Priority 2)

Cache efficiency reporting
Budget alert functionality
Trend visualization data

Validation Criteria

Functional:

/cx extracts token usage from JSONL sessions (via unified-message-extractor.py)
token_usage table populated during --index (context-db.py index_messages())
/cxq --tokens returns session summaries
/cxq --cost returns cost breakdowns
Date filtering works correctly (--since, --until)
JSON output is valid and complete (--json flag)

Additional Commands Implemented:

/cxq --by-session - Token usage breakdown by session
/cxq --by-project - Token usage breakdown by project
/cxq --by-date - Token usage breakdown by date
/cxq --by-model - Token usage breakdown by model
/cxq --cache-efficiency - Prompt cache statistics
/cxq --token-stats - Full token statistics

Performance:

Extraction adds <10% overhead to /cx
Queries complete in <1s for 50K messages
Aggregate views pre-compute common queries (4 views created)

Accuracy:

Uses pre-calculated costUSD from Claude Code (most accurate)
Token counts match content length proportionally
Cache efficiency calculations implemented (cache hit rate, estimated savings)

Upstream (Data Sources)

Claude Code JSONL sessions: ~/.claude/projects/
costUSD pre-calculated by Claude Code CLI

Downstream (Consumers)

/cxq query interface
Budget management workflows
External analytics tools (via JSON/CSV export)

Integration Points

Extends existing /cx extraction pipeline
Uses existing context.db database
Compatible with existing /cxq commands

Alternatives Considered

Alternative 1: Separate Token Accounting Tool

Pros: Clean separation of concerns Cons: Duplicate data extraction, different storage Rejected: Violates DRY principle, fragments tooling

Alternative 2: External Analytics Service

Pros: Advanced visualization, real-time updates Cons: Cloud dependency, privacy concerns, cost Rejected: Violates offline-first requirement

Alternative 3: Spreadsheet Export Only

Pros: Simple implementation, familiar tool Cons: No query capability, manual process Rejected: Insufficient for programmatic access

References

Documentation

External References

Anthropic API Pricing: https://www.anthropic.com/pricing
Prompt Caching Documentation: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Status: ACCEPTED Decision Date: 2025-12-11 Implementation Status: IMPLEMENTED (Phase 1-3 complete as of 2025-12-12) Maintainer: CODITECT Core Team Review Date: 2026-03-11 (quarterly review)

Implementation Summary (2025-12-12)

Files Modified:

scripts/context-db.py - Added token_usage table, views, indexes, query functions, CLI handlers
scripts/unified-message-extractor.py - Added extract_token_usage() and token_usage field extraction

Schema Added:

token_usage table with 14 columns (message_id, session_id, project_path, input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, cost_usd, estimated_cost_usd, duration_ms, model, timestamp, indexed_at)
4 indexes for common query patterns
4 aggregate views (by_session, by_project, by_date, by_model)

CLI Commands Added:

--tokens / --cost - Token usage summary
--by-session - Per-session breakdown
--by-project - Per-project breakdown
--by-date - Daily usage trend
--by-model - Per-model breakdown
--cache-efficiency - Cache hit rate and savings
--token-stats - Full statistics

Context and Problem Statement​

The Token Visibility Problem​

Existing Data (Already Available)​

Requirements​

Decision Drivers​

Technical Constraints​

User Experience Goals​

Data Quality Requirements​

Decision Outcome​

Implementation Architecture​

Core Components​

1. Token Usage Extraction (unified-message-extractor.py extension)​

2. Database Schema Extension (context-db.py)​

3. Query Commands (/cxq extensions)​

4. Cost Calculation Logic​

Storage Specifications​

Consequences​

Positive Consequences​

Negative Consequences​

Risk Mitigation​

Implementation Plan​

Phase 1: Extraction (Priority 1)​

Phase 2: Storage (Priority 1)​

Phase 3: Queries (Priority 1)​

Phase 4: Analysis (Priority 2)​

Validation Criteria​

Related Systems​

Upstream (Data Sources)​

Downstream (Consumers)​

Integration Points​

Alternatives Considered​

Alternative 1: Separate Token Accounting Tool​

Alternative 2: External Analytics Service​

Alternative 3: Spreadsheet Export Only​

References​

Documentation​

Related ADRs​

External References​

Implementation Summary (2025-12-12)​