ADR 005: Token Accounting System for Cost Analysis and Budget Management
ADR-005: Token Accounting System for Cost Analysis and Budget Management
Document: ADR-005-token-accounting
Version: 1.0.0
Purpose: Document architectural decisions for token usage tracking, cost analysis, and budget management via /cx and /cxq integration
Audience: Framework contributors, developers, project managers
Date Created: 2025-12-11
Status: ACCEPTED
Related ADRs:
- ADR-004-context-extraction-command (/cx)
- ADR-CXQ-context-query-system (/cxq)
Related Documents:
- commands/cx.md
- commands/cxq.md
- scripts/unified-message-extractor.py
- scripts/context-db.py
Context and Problem Statement
The Token Visibility Problem
AI-assisted development with Claude Code generates significant token usage across sessions, but developers and organizations lack visibility into:
- Per-session costs - How much each development session actually costs
- Per-project costs - Total token expenditure across all sessions for a project
- Usage patterns - Which types of work consume the most tokens
- Cache efficiency - How effective prompt caching is at reducing costs
- Budget tracking - Whether spending is within acceptable limits
- ROI analysis - Value delivered relative to token costs
Business Impact:
- Unpredictable API costs for organizations
- No data-driven optimization of AI usage patterns
- Inability to justify AI tooling expenditure to stakeholders
- Missing cost allocation for project budgeting
- No visibility into cache effectiveness
Existing Data (Already Available)
Claude Code JSONL Session Files:
Every assistant message in ~/.claude/projects/{project}/sessions/*.jsonl contains a usage field:
{
"type": "assistant",
"message": {
"id": "msg_...",
"type": "message",
"role": "assistant",
"content": [...],
"usage": {
"input_tokens": 12847,
"output_tokens": 3521,
"cache_read_input_tokens": 89234,
"cache_creation_input_tokens": 45123
}
},
"costUSD": 0.082437,
"durationMs": 4523
}
Available Metrics:
input_tokens- Fresh tokens sent to API (not cached)output_tokens- Response tokens generatedcache_read_input_tokens- Tokens read from prompt cache (90% discount)cache_creation_input_tokens- Tokens written to cache (25% premium)costUSD- Pre-calculated cost (Claude Code calculates this)durationMs- API call duration
Requirements
Must-Have (Critical):
- Extract token usage from JSONL sessions during
/cxprocessing - Store usage data in context.db for querying via
/cxq - Per-session, per-project, and aggregate cost reporting
- Cache efficiency metrics (read/create ratio)
- Date range filtering for budget periods
Should-Have (Important):
- Cost estimation with multiple pricing tiers (Opus/Sonnet/Haiku)
- Comparison with pre-calculated
costUSDvalues - Export to CSV/JSON for external analysis
- Historical trend visualization data
Nice-to-Have:
- Budget alerts when thresholds exceeded
- Automatic cost optimization recommendations
- Integration with billing systems
Decision Drivers
Technical Constraints
- Must integrate with existing
/cxextraction pipeline - Must use existing SQLite database (context.db)
- Must work offline (no API calls for accounting)
- Python-only implementation (stdlib preference)
User Experience Goals
- Simple query commands (
/cxq --tokens,/cxq --cost) - Default daily/weekly/monthly/project views
- Human-readable output with formatting
- JSON output for programmatic access
Data Quality Requirements
- Handle sessions with missing usage data gracefully
- Validate token counts against message content length
- Preserve original timestamps for accurate date filtering
- Support incremental updates as new sessions added
Decision Outcome
CHOSEN: Extend /cx and /cxq with Token Accounting Capabilities
Implementation Architecture
┌─────────────────────────────────────────────────────────────┐
│ Token Accounting Data Flow │
└─────────────────────────────────────────────────────────────┘
│
┌───────────────────┴───────────────────┐
│ ~/.claude/projects/{project}/sessions/│
│ *.jsonl (Raw Claude Code sessions) │
└───────────────────┬───────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ /cx (unified-message-extractor.py) │
│ - Extract messages (existing) │
│ - NEW: Extract usage field per message │
│ - NEW: Extract costUSD per message │
│ - NEW: Extract durationMs per call │
│ - Write to unified_messages.jsonl │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ context-db.py --index │
│ ┌────────────────────────────────────┐ │
│ │ NEW: token_usage table │ │
│ ├────────────────────────────────────┤ │
│ │ message_id - Link to messages │ │
│ │ session_id - Session reference │ │
│ │ project_path - Project reference │ │
│ │ input_tokens - Fresh input │ │
│ │ output_tokens - Response output │ │
│ │ cache_read - Cached input │ │
│ │ cache_create - Cache writes │ │
│ │ cost_usd - Calculated cost │ │
│ │ duration_ms - API latency │ │
│ │ timestamp - When occurred │ │
│ │ model - claude-opus-4, etc│ │
│ └────────────────────────────────────┘ │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ /cxq Query Interface │
│ --tokens Summary by session │
│ --cost Cost breakdown │
│ --by-session Per-session detail │
│ --by-project Per-project totals │
│ --by-date Daily aggregation │
│ --cache-efficiency Cache effectiveness │
│ --budget N Alert if > $N │
│ --since/--until Date filtering │
│ --json Programmatic output │
└──────────────────────────────────────────┘
Core Components
1. Token Usage Extraction (unified-message-extractor.py extension)
New Fields to Extract:
def extract_token_usage(message: Dict[str, Any]) -> Dict[str, Any]:
"""Extract token usage from JSONL assistant message."""
usage = message.get('message', {}).get('usage', {})
return {
'input_tokens': usage.get('input_tokens', 0),
'output_tokens': usage.get('output_tokens', 0),
'cache_read_input_tokens': usage.get('cache_read_input_tokens', 0),
'cache_creation_input_tokens': usage.get('cache_creation_input_tokens', 0),
'cost_usd': message.get('costUSD', 0.0),
'duration_ms': message.get('durationMs', 0),
'model': extract_model_from_message(message) # claude-opus-4-5-20251101
}
Enhanced JSONL Message Format:
{
"hash": "a1b2c3d4...",
"content": "Full message text...",
"role": "assistant",
"provenance": {
"source_type": "jsonl",
"source_file": "/path/to/session.jsonl",
"session_id": "abc123"
},
"timestamps": {
"occurred": "2025-12-11T12:00:00Z",
"extracted_at": "2025-12-11T19:00:00Z"
},
"token_usage": {
"input_tokens": 12847,
"output_tokens": 3521,
"cache_read_input_tokens": 89234,
"cache_creation_input_tokens": 45123,
"cost_usd": 0.082437,
"duration_ms": 4523,
"model": "claude-opus-4-5-20251101"
}
}
2. Database Schema Extension (context-db.py)
-- Token usage tracking table
CREATE TABLE token_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
message_id INTEGER, -- FK to messages table (nullable for aggregate entries)
session_id TEXT NOT NULL, -- Session identifier
project_path TEXT, -- Project root path
-- Token counts
input_tokens INTEGER DEFAULT 0, -- Fresh tokens (not cached)
output_tokens INTEGER DEFAULT 0, -- Generated response tokens
cache_read_input_tokens INTEGER DEFAULT 0, -- Tokens read from cache
cache_creation_input_tokens INTEGER DEFAULT 0, -- Tokens written to cache
-- Cost tracking
cost_usd REAL DEFAULT 0.0, -- Pre-calculated cost from Claude Code
estimated_cost_usd REAL, -- Our calculated estimate (for validation)
-- Performance metrics
duration_ms INTEGER DEFAULT 0, -- API call duration
-- Metadata
model TEXT, -- Model used (claude-opus-4-5-20251101)
timestamp TEXT NOT NULL, -- When the API call occurred
indexed_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (message_id) REFERENCES messages(id) ON DELETE SET NULL
);
-- Indexes for common query patterns
CREATE INDEX idx_token_usage_session ON token_usage(session_id);
CREATE INDEX idx_token_usage_project ON token_usage(project_path);
CREATE INDEX idx_token_usage_timestamp ON token_usage(timestamp);
CREATE INDEX idx_token_usage_model ON token_usage(model);
-- Aggregate views for performance
CREATE VIEW token_usage_by_session AS
SELECT
session_id,
project_path,
COUNT(*) as api_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_input_tokens) as total_cache_read,
SUM(cache_creation_input_tokens) as total_cache_create,
SUM(cost_usd) as total_cost,
SUM(duration_ms) as total_duration,
MIN(timestamp) as first_call,
MAX(timestamp) as last_call
FROM token_usage
GROUP BY session_id, project_path;
CREATE VIEW token_usage_by_project AS
SELECT
project_path,
COUNT(DISTINCT session_id) as sessions,
COUNT(*) as api_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_input_tokens) as total_cache_read,
SUM(cache_creation_input_tokens) as total_cache_create,
SUM(cost_usd) as total_cost,
MIN(timestamp) as first_call,
MAX(timestamp) as last_call
FROM token_usage
GROUP BY project_path;
CREATE VIEW token_usage_by_date AS
SELECT
DATE(timestamp) as date,
COUNT(DISTINCT session_id) as sessions,
COUNT(*) as api_calls,
SUM(input_tokens) as total_input,
SUM(output_tokens) as total_output,
SUM(cache_read_input_tokens) as total_cache_read,
SUM(cost_usd) as total_cost
FROM token_usage
GROUP BY DATE(timestamp)
ORDER BY date DESC;
3. Query Commands (/cxq extensions)
New Command Options:
# Token usage summary
/cxq --tokens # Overall token summary
/cxq --tokens --by-session # Per-session breakdown
/cxq --tokens --by-project # Per-project breakdown
/cxq --tokens --by-date # Daily breakdown
# Cost analysis
/cxq --cost # Total cost summary
/cxq --cost --by-session # Per-session costs
/cxq --cost --by-project # Per-project costs
/cxq --cost --by-date # Daily costs
/cxq --cost --by-model # Costs per model tier
# Cache efficiency
/cxq --cache-efficiency # Cache hit ratio, savings
/cxq --cache-efficiency --by-session # Per-session cache stats
# Date filtering (works with all above)
/cxq --cost --today # Today's costs
/cxq --cost --week # Last 7 days
/cxq --cost --month # Last 30 days
/cxq --cost --since 2025-12-01 # Since specific date
/cxq --cost --until 2025-12-31 # Until specific date
# Budget alerts
/cxq --budget 100 # Alert if monthly cost > $100
/cxq --budget 50 --by-project # Per-project budget check
# Output formats
/cxq --cost --json # JSON output for scripting
/cxq --cost --csv # CSV for spreadsheet import
Example Output:
$ /cxq --cost --today
Token Cost Summary (2025-12-11)
============================================================
Sessions: 3
API Calls: 127
Total Duration: 23.4 min
Token Breakdown:
Input Tokens: 185,432 (fresh)
Output Tokens: 67,891
Cache Read: 2,341,567 (saved $35.12 via caching)
Cache Creation: 123,456
Cost Breakdown:
Input Cost: $2.78
Output Cost: $10.18
Cache Read Cost: $3.51 (90% discount applied)
Cache Create Cost: $0.46 (25% premium)
─────────────────────
Total Cost: $16.93
Pre-calculated: $16.89 (difference: $0.04)
Cache Efficiency:
Cache Hit Rate: 92.7%
Estimated Savings: $35.12 (67.5%)
============================================================
$ /cxq --cost --by-project --json
{
"projects": [
{
"project_path": "/Users/hal/PROJECTS/coditect-rollout-master",
"sessions": 459,
"api_calls": 47841,
"input_tokens": 6824731,
"output_tokens": 19247382,
"cache_read_tokens": 4200000000,
"cache_create_tokens": 558000000,
"total_cost_usd": 9902.47,
"first_call": "2025-10-15T08:23:41Z",
"last_call": "2025-12-11T19:45:23Z"
}
],
"summary": {
"total_projects": 1,
"total_sessions": 459,
"total_cost_usd": 9902.47,
"generated_at": "2025-12-11T20:00:00Z"
}
}
4. Cost Calculation Logic
Pricing Model (Claude Opus 4.5 as of Dec 2025):
PRICING = {
'claude-opus-4-5-20251101': {
'input_per_1m': 15.00, # $15/M input tokens
'output_per_1m': 75.00, # $75/M output tokens
'cache_read_per_1m': 1.50, # $1.50/M (90% discount)
'cache_create_per_1m': 18.75, # $18.75/M (25% premium)
},
'claude-sonnet-4-20250514': {
'input_per_1m': 3.00,
'output_per_1m': 15.00,
'cache_read_per_1m': 0.30,
'cache_create_per_1m': 3.75,
},
'claude-3-5-haiku-20241022': {
'input_per_1m': 0.80,
'output_per_1m': 4.00,
'cache_read_per_1m': 0.08,
'cache_create_per_1m': 1.00,
}
}
def calculate_cost(usage: Dict[str, int], model: str) -> float:
"""Calculate estimated cost from token usage."""
rates = PRICING.get(model, PRICING['claude-opus-4-5-20251101'])
cost = (
(usage['input_tokens'] / 1_000_000) * rates['input_per_1m'] +
(usage['output_tokens'] / 1_000_000) * rates['output_per_1m'] +
(usage['cache_read_input_tokens'] / 1_000_000) * rates['cache_read_per_1m'] +
(usage['cache_creation_input_tokens'] / 1_000_000) * rates['cache_create_per_1m']
)
return round(cost, 6)
def calculate_cache_savings(usage: Dict[str, int], model: str) -> float:
"""Calculate how much was saved via prompt caching."""
rates = PRICING.get(model, PRICING['claude-opus-4-5-20251101'])
# What we would have paid without caching
full_price = (usage['cache_read_input_tokens'] / 1_000_000) * rates['input_per_1m']
# What we actually paid (90% discount)
cached_price = (usage['cache_read_input_tokens'] / 1_000_000) * rates['cache_read_per_1m']
return round(full_price - cached_price, 6)
Storage Specifications
Token Usage in unified_messages.jsonl:
{
"hash": "sha256...",
"content": "...",
"role": "assistant",
"provenance": {...},
"timestamps": {...},
"token_usage": {
"input_tokens": 12847,
"output_tokens": 3521,
"cache_read_input_tokens": 89234,
"cache_creation_input_tokens": 45123,
"cost_usd": 0.082437,
"duration_ms": 4523,
"model": "claude-opus-4-5-20251101"
}
}
Note: Only assistant messages contain usage data. User messages and tool calls do not have token counts.
Consequences
Positive Consequences
Cost Visibility:
- Full transparency into AI development costs
- Per-project cost tracking for budgeting
- Historical trends for forecasting
- Cache efficiency optimization opportunities
Budget Management:
- Alerts when spending exceeds thresholds
- Data for cost allocation to projects/teams
- Evidence for ROI calculations
- Justification for AI tooling investment
Performance Insights:
- API latency tracking (duration_ms)
- Cache hit ratio optimization
- Model usage patterns
- Session efficiency comparison
Integration Benefits:
- Builds on existing /cx and /cxq infrastructure
- No additional dependencies required
- Offline analysis (no API calls)
- JSON export for external tools
Negative Consequences
Storage Overhead:
- Additional ~200 bytes per assistant message
- token_usage table grows with usage
- Database size increase ~10-15%
Processing Time:
- Slightly longer /cx extraction
- Additional indexing pass for token data
- Mitigated by incremental processing
Data Accuracy:
- Only captures assistant messages (not streaming chunks)
- Cost estimates may drift if pricing changes
- Some sessions may have missing usage data (older formats)
Risk Mitigation
Missing Usage Data:
- Graceful handling of messages without usage field
- Default to 0 for missing values
- Log warnings for incomplete data
Pricing Changes:
- Store raw token counts (not just costs)
- Configurable pricing table
- Can recalculate costs retroactively
Database Growth:
- Views provide aggregated data efficiently
- Optional archival of old detailed records
- Indexes optimize query performance
Implementation Plan
Phase 1: Extraction (Priority 1)
- Extend unified-message-extractor.py to extract usage fields
- Add token_usage to JSONL output format
- Backward compatible with existing data
Phase 2: Storage (Priority 1)
- Add token_usage table to context.db schema
- Create indexes and aggregation views
- Migrate existing data (re-run /cx)
Phase 3: Queries (Priority 1)
- Add --tokens, --cost options to /cxq
- Implement date filtering
- Add JSON/CSV output formats
Phase 4: Analysis (Priority 2)
- Cache efficiency reporting
- Budget alert functionality
- Trend visualization data
Validation Criteria
Functional:
- /cx extracts token usage from JSONL sessions (via unified-message-extractor.py)
- token_usage table populated during --index (context-db.py index_messages())
- /cxq --tokens returns session summaries
- /cxq --cost returns cost breakdowns
- Date filtering works correctly (--since, --until)
- JSON output is valid and complete (--json flag)
Additional Commands Implemented:
- /cxq --by-session - Token usage breakdown by session
- /cxq --by-project - Token usage breakdown by project
- /cxq --by-date - Token usage breakdown by date
- /cxq --by-model - Token usage breakdown by model
- /cxq --cache-efficiency - Prompt cache statistics
- /cxq --token-stats - Full token statistics
Performance:
- Extraction adds <10% overhead to /cx
- Queries complete in <1s for 50K messages
- Aggregate views pre-compute common queries (4 views created)
Accuracy:
- Uses pre-calculated costUSD from Claude Code (most accurate)
- Token counts match content length proportionally
- Cache efficiency calculations implemented (cache hit rate, estimated savings)
Related Systems
Upstream (Data Sources)
- Claude Code JSONL sessions:
~/.claude/projects/ - costUSD pre-calculated by Claude Code CLI
Downstream (Consumers)
- /cxq query interface
- Budget management workflows
- External analytics tools (via JSON/CSV export)
Integration Points
- Extends existing /cx extraction pipeline
- Uses existing context.db database
- Compatible with existing /cxq commands
Alternatives Considered
Alternative 1: Separate Token Accounting Tool
Pros: Clean separation of concerns Cons: Duplicate data extraction, different storage Rejected: Violates DRY principle, fragments tooling
Alternative 2: External Analytics Service
Pros: Advanced visualization, real-time updates Cons: Cloud dependency, privacy concerns, cost Rejected: Violates offline-first requirement
Alternative 3: Spreadsheet Export Only
Pros: Simple implementation, familiar tool Cons: No query capability, manual process Rejected: Insufficient for programmatic access
References
Documentation
Related ADRs
External References
- Anthropic API Pricing: https://www.anthropic.com/pricing
- Prompt Caching Documentation: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching
Status: ACCEPTED Decision Date: 2025-12-11 Implementation Status: IMPLEMENTED (Phase 1-3 complete as of 2025-12-12) Maintainer: CODITECT Core Team Review Date: 2026-03-11 (quarterly review)
Document Version: 1.1.0 Last Updated: 2025-12-12 Author: CODITECT Architecture Team Copyright: 2025 AZ1.AI INC. All rights reserved.
Implementation Summary (2025-12-12)
Files Modified:
scripts/context-db.py- Added token_usage table, views, indexes, query functions, CLI handlersscripts/unified-message-extractor.py- Added extract_token_usage() and token_usage field extraction
Schema Added:
token_usagetable with 14 columns (message_id, session_id, project_path, input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, cost_usd, estimated_cost_usd, duration_ms, model, timestamp, indexed_at)- 4 indexes for common query patterns
- 4 aggregate views (by_session, by_project, by_date, by_model)
CLI Commands Added:
--tokens/--cost- Token usage summary--by-session- Per-session breakdown--by-project- Per-project breakdown--by-date- Daily usage trend--by-model- Per-model breakdown--cache-efficiency- Cache hit rate and savings--token-stats- Full statistics