ADR-111: Token Economics Instrumentation
Status
Proposed
Context
Problem Statement
Multi-agent autonomous development has a 15x token multiplier compared to single-agent workflows. Without instrumentation:
- Costs are unpredictable and can spike unexpectedly
- Budget overruns are detected too late
- Cost attribution per task/agent is impossible
- Optimization opportunities are invisible
- Enterprise customers cannot forecast spend
Key Insight from Ralph Wiggum Analysis
"Token economics matter—15x multiplier for multi-agent means cost awareness is essential."
Current State
- No per-agent token tracking
- No per-task cost attribution
- No budget enforcement
- No cost forecasting
- No real-time cost visibility
- No throttling on budget exceeded
Decision
Implement comprehensive token economics instrumentation that tracks consumption per agent/task/iteration, enforces budget limits with auto-throttling, enables cost projection, and supports enterprise chargeback.
1. Token Tracking Schema
token_record:
record_id: string # UUID
timestamp: datetime # When consumption occurred
context:
organization_id: string # For multi-tenant
project_id: string # Project/workspace
task_id: string # Parent task
agent_id: string # Executing agent
iteration: integer # Loop iteration
checkpoint_id: string # Associated checkpoint
consumption:
model: string # claude-opus-4-5, claude-sonnet-4-5, etc.
input_tokens: integer # Prompt tokens
output_tokens: integer # Completion tokens
cache_read_tokens: integer # Prompt cache hits
cache_write_tokens: integer # Prompt cache writes
total_tokens: integer # Sum of all
cost:
input_cost_usd: decimal # Calculated cost
output_cost_usd: decimal
cache_cost_usd: decimal
total_cost_usd: decimal
metadata:
tool_calls: array # Tools invoked in this call
latency_ms: integer # API response time
success: boolean # Whether call succeeded
error_type: string # If failed, error category
2. Budget Hierarchy
budgets:
organization:
monthly_limit_usd: 10000
alert_threshold_percent: 80
hard_limit_action: "throttle" # throttle, pause, alert_only
project:
daily_limit_usd: 500
task_limit_usd: 50
alert_threshold_percent: 75
task:
max_tokens: 1000000
max_iterations: 50
max_cost_usd: 100
per_iteration_limit_tokens: 50000
agent:
max_tokens_per_call: 100000
max_cost_per_call_usd: 5
rate_limit_calls_per_minute: 60
inheritance:
# Lower level cannot exceed parent
organization → project → task → agent
3. Budget Enforcement Protocol
PRE-CALL CHECK:
1. Retrieve current consumption for context
2. Estimate call cost (based on prompt size + expected output)
3. Check against all applicable budgets:
├── Agent call limit
├── Task iteration limit
├── Task total limit
├── Project daily limit
└── Organization monthly limit
4. If ANY limit would be exceeded:
├── If hard limit: Block call, return budget_exceeded error
├── If soft limit: Log warning, proceed with monitoring
└── If alert threshold: Notify but proceed
POST-CALL RECORDING:
1. Record actual consumption
2. Update running totals
3. Check if any threshold newly crossed
4. Emit events for alerts/dashboard
THROTTLING BEHAVIOR:
├── Approach threshold (80%): Emit warning event
├── At threshold (100%):
│ ├── throttle: Delay calls by 1s, then 2s, then 4s...
│ ├── pause: Block all calls until budget reset/increase
│ └── alert_only: Log and continue
└── Override available for emergency/human approval
4. Pricing Model (Configurable)
interface ModelPricing {
model: string;
inputPerMillionTokens: number; // USD
outputPerMillionTokens: number; // USD
cacheReadPerMillionTokens: number;
cacheWritePerMillionTokens: number;
}
const PRICING: ModelPricing[] = [
{
model: "claude-opus-4-5-20251101",
inputPerMillionTokens: 15.00,
outputPerMillionTokens: 75.00,
cacheReadPerMillionTokens: 1.50,
cacheWritePerMillionTokens: 18.75
},
{
model: "claude-sonnet-4-5-20250929",
inputPerMillionTokens: 3.00,
outputPerMillionTokens: 15.00,
cacheReadPerMillionTokens: 0.30,
cacheWritePerMillionTokens: 3.75
},
{
model: "claude-haiku-4-5-20251001",
inputPerMillionTokens: 0.80,
outputPerMillionTokens: 4.00,
cacheReadPerMillionTokens: 0.08,
cacheWritePerMillionTokens: 1.00
}
];
function calculateCost(record: TokenConsumption): CostBreakdown {
const pricing = PRICING.find(p => p.model === record.model);
return {
input: (record.input_tokens / 1_000_000) * pricing.inputPerMillionTokens,
output: (record.output_tokens / 1_000_000) * pricing.outputPerMillionTokens,
cacheRead: (record.cache_read_tokens / 1_000_000) * pricing.cacheReadPerMillionTokens,
cacheWrite: (record.cache_write_tokens / 1_000_000) * pricing.cacheWritePerMillionTokens,
total: /* sum of above */
};
}
5. Aggregation and Efficiency Metrics
real_time_aggregations: # Updated on each record
- per_agent_running_total
- per_task_running_total
- per_project_daily_total
- per_organization_monthly_total
- global_system_total
periodic_rollups: # Hourly, daily, monthly
- task_summaries
- project_summaries
- model_usage_breakdown
- agent_efficiency_metrics
- cost_trend_analysis
efficiency_metrics:
tokens_per_tool_call: total_tokens / tool_calls
cost_per_iteration: total_cost / iterations
cache_hit_rate: cache_read / (cache_read + input)
output_input_ratio: output_tokens / input_tokens
cost_per_completed_task: total_cost / successful_tasks
6. Forecasting
forecasting:
short_term: # Task-level
based_on: "Current consumption rate, remaining work estimate"
formula: "current_rate * estimated_remaining_iterations"
confidence: "Based on variance in iteration costs"
update_frequency: "Every iteration"
medium_term: # Project-level
based_on: "Historical project patterns, active task count"
formula: "avg_task_cost * remaining_tasks"
considers: "Task complexity estimates"
update_frequency: "Daily"
long_term: # Organization-level
based_on: "Monthly trends, growth rate, planned projects"
formula: "Exponential smoothing of historical data"
considers: "Seasonality, team growth"
update_frequency: "Weekly"
7. Database Schema
Database Architecture Note (ADR-002, ADR-089):
- Local: SQLite (context.db) for offline operation and immediate recording
- Cloud: PostgreSQL with RLS for multi-tenant isolation and aggregations
- Sync via cursor-based polling (ADR-053)
Local SQLite (context.db):
CREATE TABLE token_records (
id TEXT PRIMARY KEY,
timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
task_id TEXT,
agent_id TEXT,
iteration INTEGER,
checkpoint_id TEXT,
model TEXT NOT NULL,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_tokens INTEGER DEFAULT 0,
cache_write_tokens INTEGER DEFAULT 0,
total_tokens INTEGER GENERATED ALWAYS AS (input_tokens + output_tokens + cache_read_tokens + cache_write_tokens) STORED,
total_cost_usd REAL,
tool_calls TEXT, -- JSON array
latency_ms INTEGER,
success INTEGER DEFAULT 1,
synced INTEGER DEFAULT 0
);
CREATE TABLE budgets (
id TEXT PRIMARY KEY,
scope TEXT NOT NULL, -- organization, project, task, agent
scope_id TEXT NOT NULL,
limit_type TEXT NOT NULL, -- tokens, cost_usd, calls
limit_value REAL NOT NULL,
current_value REAL DEFAULT 0,
period TEXT, -- hourly, daily, monthly, total
action TEXT DEFAULT 'alert_only', -- throttle, pause, alert_only
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_token_records_task ON token_records(task_id, timestamp DESC);
CREATE INDEX idx_token_records_unsynced ON token_records(synced) WHERE synced = 0;
Cloud PostgreSQL (with RLS):
CREATE TABLE token_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id),
project_id UUID REFERENCES projects(id),
task_id UUID REFERENCES tasks(id),
agent_id TEXT NOT NULL,
iteration INTEGER DEFAULT 1,
checkpoint_id UUID REFERENCES checkpoints(id),
model TEXT NOT NULL,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
cache_read_tokens INTEGER DEFAULT 0,
cache_write_tokens INTEGER DEFAULT 0,
total_tokens INTEGER GENERATED ALWAYS AS (input_tokens + output_tokens + cache_read_tokens + cache_write_tokens) STORED,
input_cost_usd DECIMAL(10,6),
output_cost_usd DECIMAL(10,6),
cache_cost_usd DECIMAL(10,6),
total_cost_usd DECIMAL(10,6),
tool_calls JSONB,
latency_ms INTEGER,
success BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Row-Level Security
ALTER TABLE token_records ENABLE ROW LEVEL SECURITY;
CREATE POLICY org_isolation_tokens ON token_records
FOR ALL USING (
organization_id IN (
SELECT organization_id FROM organization_members
WHERE user_id = current_setting('app.current_user_id')::UUID
)
);
-- Budgets with hierarchy
CREATE TABLE budgets (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
organization_id UUID NOT NULL REFERENCES organizations(id),
project_id UUID REFERENCES projects(id),
task_id UUID REFERENCES tasks(id),
scope TEXT NOT NULL,
limit_type TEXT NOT NULL,
limit_value DECIMAL(12,2) NOT NULL,
current_value DECIMAL(12,2) DEFAULT 0,
period TEXT,
action TEXT DEFAULT 'alert_only',
alert_threshold_percent INTEGER DEFAULT 80,
updated_at TIMESTAMPTZ DEFAULT NOW()
);
ALTER TABLE budgets ENABLE ROW LEVEL SECURITY;
-- Materialized view for aggregations (refreshed periodically)
CREATE MATERIALIZED VIEW token_aggregations AS
SELECT
organization_id,
project_id,
DATE_TRUNC('day', created_at) as period_day,
model,
SUM(total_tokens) as total_tokens,
SUM(total_cost_usd) as total_cost,
COUNT(*) as call_count,
AVG(latency_ms) as avg_latency_ms
FROM token_records
GROUP BY organization_id, project_id, DATE_TRUNC('day', created_at), model;
CREATE UNIQUE INDEX idx_token_agg ON token_aggregations(organization_id, project_id, period_day, model);
8. Event Definitions
type TokenEconomicsEvents =
| { type: 'TOKEN_RECORDED'; payload: {
recordId: string; taskId: string; tokens: number; cost: number
}}
| { type: 'BUDGET_THRESHOLD_CROSSED'; payload: {
context: string; threshold: number; current: number
}}
| { type: 'BUDGET_EXHAUSTED'; payload: {
context: string; limit: number; action: string
}}
| { type: 'THROTTLE_ACTIVATED'; payload: {
context: string; delayMs: number
}}
| { type: 'CONSUMPTION_SPIKE'; payload: {
context: string; rate: number; threshold: number
}}
| { type: 'FORECAST_WARNING'; payload: {
context: string; forecastedCost: number; budget: number
}};
9. Configuration Schema
token_economics:
enabled: true
recording:
buffer_size: 100
flush_interval_ms: 1000
retry_attempts: 3
pricing:
source: "config" # config, api, or external_service
update_interval: "daily"
budgets:
default_organization_monthly_usd: 10000
default_project_daily_usd: 500
default_task_max_usd: 100
alert_threshold_percent: 80
hard_limit_action: "throttle"
throttling:
initial_delay_ms: 1000
max_delay_ms: 60000
backoff_multiplier: 2
forecasting:
task_lookback_iterations: 10
project_lookback_days: 30
confidence_level: 0.9
alerting:
channels:
- type: "slack"
webhook_url: "${SLACK_WEBHOOK_URL}"
events: ["budget_threshold", "budget_exhausted", "spike"]
- type: "pagerduty"
routing_key: "${PAGERDUTY_KEY}"
events: ["budget_exhausted"]
dashboard:
refresh_interval_ms: 5000
retention_days: 90
Consequences
Positive
- Cost Visibility - Real-time tracking of all token consumption
- Budget Control - Automatic enforcement prevents overruns
- Cost Attribution - Per-task/agent chargeback for enterprise
- Optimization - Efficiency metrics identify improvement opportunities
- Forecasting - Predictable costs enable planning
- Throttling - Graceful degradation when limits approached
Negative
- Recording Overhead - Each API call has tracking overhead (~10ms)
- Storage Growth - Token records accumulate (mitigated by retention)
- Complexity - Budget checks on every call path
- Pricing Maintenance - Must track model pricing changes
Risks
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Pricing changes | Medium | Incorrect costs | Configurable pricing, alerts on API changes |
| High-volume recording impact | Medium | Performance degradation | Async buffered writes |
| Budget race conditions | Low | Over-budget execution | Atomic budget checks with reservations |
| Forecast inaccuracy | Medium | Poor planning | Confidence intervals, multiple models |
Performance Requirements
| Metric | Target |
|---|---|
| Token recording latency | < 10ms |
| Budget check latency | < 50ms |
| Aggregation query latency | < 100ms |
| Recording throughput | 10,000+ records/minute |
| Dashboard refresh | < 5s |
| Token tracking accuracy | > 99% |
| Budget enforcement latency | < 1s |
| Cost projection accuracy | ±10% |
Implementation Phases
Phase 1: Token Recording (Week 1)
- Token record schema definition
- Recording service with buffering
- API client wrapper for consumption capture
Phase 2: Budget System (Week 2)
- Budget configuration and hierarchy
- Budget enforcement service
- Running totals with real-time updates
Phase 3: Aggregation and Reporting (Week 3)
- Aggregation service (real-time + periodic)
- Query API for consumption data
- Forecasting engine
Phase 4: Dashboard and Alerts (Week 4)
- Dashboard data service (WebSocket updates)
- Alert evaluation and notification
- Testing and documentation
Related ADRs
- ADR-002: PostgreSQL as Primary Database (cloud storage with RLS)
- ADR-053: Cloud Context Sync Architecture (local-to-cloud sync)
- ADR-089: Two-Database Architecture (SQLite local + PostgreSQL cloud)
- ADR-108: Checkpoint Protocol (token metrics in checkpoint)
- ADR-109: Browser Automation (browser test cost attribution)
- ADR-110: Health Monitoring (consumption anomaly detection)
- ADR-112: Ralph Wiggum Database Architecture (consolidates DB decisions)
Glossary
| Term | Definition |
|---|---|
| Token | Basic unit of text processed by LLMs; approximately 4 characters or 0.75 words in English |
| Input Token | Token sent to the model as part of the prompt (context, instructions, user message) |
| Output Token | Token generated by the model in its response |
| Cache Token | Token retrieved from or written to prompt cache, reducing API costs |
| Budget | Spending limit configured at organization, project, task, or agent level |
| Throttling | Slowing request rate when approaching budget limits with exponential backoff |
| Chargeback | Attribution of costs to specific organizational units for billing |
| Forecasting | Predicting future token consumption based on historical patterns |
| Aggregation | Combining token records into summary statistics (hourly, daily, monthly) |
| RLS | Row-Level Security - PostgreSQL feature for automatic tenant data isolation |
| Materialized View | Pre-computed query result stored in database for fast aggregation queries |
| JSONB | PostgreSQL binary JSON type with indexing support for efficient queries |
| Exponential Backoff | Delay strategy that doubles wait time between retries (1s, 2s, 4s, ...) |
| WebSocket | Full-duplex communication protocol for real-time updates |
References
ADR-111 | Created: 2026-01-24 | Status: Proposed