ADR-111: Token Economics Instrumentation

Status

Proposed

Context

Problem Statement

Multi-agent autonomous development has a 15x token multiplier compared to single-agent workflows. Without instrumentation:

Costs are unpredictable and can spike unexpectedly
Budget overruns are detected too late
Cost attribution per task/agent is impossible
Optimization opportunities are invisible
Enterprise customers cannot forecast spend

Key Insight from Ralph Wiggum Analysis

"Token economics matter—15x multiplier for multi-agent means cost awareness is essential."

Current State

No per-agent token tracking
No per-task cost attribution
No budget enforcement
No cost forecasting
No real-time cost visibility
No throttling on budget exceeded

Decision

Implement comprehensive token economics instrumentation that tracks consumption per agent/task/iteration, enforces budget limits with auto-throttling, enables cost projection, and supports enterprise chargeback.

1. Token Tracking Schema

token_record:
  record_id: string              # UUID
  timestamp: datetime            # When consumption occurred

  context:
    organization_id: string      # For multi-tenant
    project_id: string           # Project/workspace
    task_id: string              # Parent task
    agent_id: string             # Executing agent
    iteration: integer           # Loop iteration
    checkpoint_id: string        # Associated checkpoint

  consumption:
    model: string                # claude-opus-4-5, claude-sonnet-4-5, etc.
    input_tokens: integer        # Prompt tokens
    output_tokens: integer       # Completion tokens
    cache_read_tokens: integer   # Prompt cache hits
    cache_write_tokens: integer  # Prompt cache writes
    total_tokens: integer        # Sum of all

  cost:
    input_cost_usd: decimal      # Calculated cost
    output_cost_usd: decimal
    cache_cost_usd: decimal
    total_cost_usd: decimal

  metadata:
    tool_calls: array            # Tools invoked in this call
    latency_ms: integer          # API response time
    success: boolean             # Whether call succeeded
    error_type: string           # If failed, error category

2. Budget Hierarchy

budgets:
  organization:
    monthly_limit_usd: 10000
    alert_threshold_percent: 80
    hard_limit_action: "throttle"  # throttle, pause, alert_only

  project:
    daily_limit_usd: 500
    task_limit_usd: 50
    alert_threshold_percent: 75

  task:
    max_tokens: 1000000
    max_iterations: 50
    max_cost_usd: 100
    per_iteration_limit_tokens: 50000

  agent:
    max_tokens_per_call: 100000
    max_cost_per_call_usd: 5
    rate_limit_calls_per_minute: 60

inheritance:
  # Lower level cannot exceed parent
  organization → project → task → agent

3. Budget Enforcement Protocol

PRE-CALL CHECK:
1. Retrieve current consumption for context
2. Estimate call cost (based on prompt size + expected output)
3. Check against all applicable budgets:
   ├── Agent call limit
   ├── Task iteration limit
   ├── Task total limit
   ├── Project daily limit
   └── Organization monthly limit
4. If ANY limit would be exceeded:
   ├── If hard limit: Block call, return budget_exceeded error
   ├── If soft limit: Log warning, proceed with monitoring
   └── If alert threshold: Notify but proceed

POST-CALL RECORDING:
1. Record actual consumption
2. Update running totals
3. Check if any threshold newly crossed
4. Emit events for alerts/dashboard

THROTTLING BEHAVIOR:
├── Approach threshold (80%): Emit warning event
├── At threshold (100%):
│   ├── throttle: Delay calls by 1s, then 2s, then 4s...
│   ├── pause: Block all calls until budget reset/increase
│   └── alert_only: Log and continue
└── Override available for emergency/human approval

4. Pricing Model (Configurable)

interface ModelPricing {
  model: string;
  inputPerMillionTokens: number;   // USD
  outputPerMillionTokens: number;  // USD
  cacheReadPerMillionTokens: number;
  cacheWritePerMillionTokens: number;
}

const PRICING: ModelPricing[] = [
  {
    model: "claude-opus-4-5-20251101",
    inputPerMillionTokens: 15.00,
    outputPerMillionTokens: 75.00,
    cacheReadPerMillionTokens: 1.50,
    cacheWritePerMillionTokens: 18.75
  },
  {
    model: "claude-sonnet-4-5-20250929",
    inputPerMillionTokens: 3.00,
    outputPerMillionTokens: 15.00,
    cacheReadPerMillionTokens: 0.30,
    cacheWritePerMillionTokens: 3.75
  },
  {
    model: "claude-haiku-4-5-20251001",
    inputPerMillionTokens: 0.80,
    outputPerMillionTokens: 4.00,
    cacheReadPerMillionTokens: 0.08,
    cacheWritePerMillionTokens: 1.00
  }
];

function calculateCost(record: TokenConsumption): CostBreakdown {
  const pricing = PRICING.find(p => p.model === record.model);
  return {
    input: (record.input_tokens / 1_000_000) * pricing.inputPerMillionTokens,
    output: (record.output_tokens / 1_000_000) * pricing.outputPerMillionTokens,
    cacheRead: (record.cache_read_tokens / 1_000_000) * pricing.cacheReadPerMillionTokens,
    cacheWrite: (record.cache_write_tokens / 1_000_000) * pricing.cacheWritePerMillionTokens,
    total: /* sum of above */
  };
}

5. Aggregation and Efficiency Metrics

real_time_aggregations:  # Updated on each record
  - per_agent_running_total
  - per_task_running_total
  - per_project_daily_total
  - per_organization_monthly_total
  - global_system_total

periodic_rollups:  # Hourly, daily, monthly
  - task_summaries
  - project_summaries
  - model_usage_breakdown
  - agent_efficiency_metrics
  - cost_trend_analysis

efficiency_metrics:
  tokens_per_tool_call: total_tokens / tool_calls
  cost_per_iteration: total_cost / iterations
  cache_hit_rate: cache_read / (cache_read + input)
  output_input_ratio: output_tokens / input_tokens
  cost_per_completed_task: total_cost / successful_tasks

6. Forecasting

forecasting:
  short_term:  # Task-level
    based_on: "Current consumption rate, remaining work estimate"
    formula: "current_rate * estimated_remaining_iterations"
    confidence: "Based on variance in iteration costs"
    update_frequency: "Every iteration"

  medium_term:  # Project-level
    based_on: "Historical project patterns, active task count"
    formula: "avg_task_cost * remaining_tasks"
    considers: "Task complexity estimates"
    update_frequency: "Daily"

  long_term:  # Organization-level
    based_on: "Monthly trends, growth rate, planned projects"
    formula: "Exponential smoothing of historical data"
    considers: "Seasonality, team growth"
    update_frequency: "Weekly"

7. Database Schema

Database Architecture Note (ADR-002, ADR-089):

Local: SQLite (context.db) for offline operation and immediate recording

Cloud: PostgreSQL with RLS for multi-tenant isolation and aggregations

Sync via cursor-based polling (ADR-053)

Local SQLite (context.db):

CREATE TABLE token_records (
    id TEXT PRIMARY KEY,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    task_id TEXT,
    agent_id TEXT,
    iteration INTEGER,
    checkpoint_id TEXT,
    model TEXT NOT NULL,
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    cache_read_tokens INTEGER DEFAULT 0,
    cache_write_tokens INTEGER DEFAULT 0,
    total_tokens INTEGER GENERATED ALWAYS AS (input_tokens + output_tokens + cache_read_tokens + cache_write_tokens) STORED,
    total_cost_usd REAL,
    tool_calls TEXT,  -- JSON array
    latency_ms INTEGER,
    success INTEGER DEFAULT 1,
    synced INTEGER DEFAULT 0
);

CREATE TABLE budgets (
    id TEXT PRIMARY KEY,
    scope TEXT NOT NULL,  -- organization, project, task, agent
    scope_id TEXT NOT NULL,
    limit_type TEXT NOT NULL,  -- tokens, cost_usd, calls
    limit_value REAL NOT NULL,
    current_value REAL DEFAULT 0,
    period TEXT,  -- hourly, daily, monthly, total
    action TEXT DEFAULT 'alert_only',  -- throttle, pause, alert_only
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_token_records_task ON token_records(task_id, timestamp DESC);
CREATE INDEX idx_token_records_unsynced ON token_records(synced) WHERE synced = 0;

Cloud PostgreSQL (with RLS):

CREATE TABLE token_records (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id),
    project_id UUID REFERENCES projects(id),
    task_id UUID REFERENCES tasks(id),
    agent_id TEXT NOT NULL,
    iteration INTEGER DEFAULT 1,
    checkpoint_id UUID REFERENCES checkpoints(id),
    model TEXT NOT NULL,
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    cache_read_tokens INTEGER DEFAULT 0,
    cache_write_tokens INTEGER DEFAULT 0,
    total_tokens INTEGER GENERATED ALWAYS AS (input_tokens + output_tokens + cache_read_tokens + cache_write_tokens) STORED,
    input_cost_usd DECIMAL(10,6),
    output_cost_usd DECIMAL(10,6),
    cache_cost_usd DECIMAL(10,6),
    total_cost_usd DECIMAL(10,6),
    tool_calls JSONB,
    latency_ms INTEGER,
    success BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Row-Level Security
ALTER TABLE token_records ENABLE ROW LEVEL SECURITY;

CREATE POLICY org_isolation_tokens ON token_records
    FOR ALL USING (
        organization_id IN (
            SELECT organization_id FROM organization_members
            WHERE user_id = current_setting('app.current_user_id')::UUID
        )
    );

-- Budgets with hierarchy
CREATE TABLE budgets (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID NOT NULL REFERENCES organizations(id),
    project_id UUID REFERENCES projects(id),
    task_id UUID REFERENCES tasks(id),
    scope TEXT NOT NULL,
    limit_type TEXT NOT NULL,
    limit_value DECIMAL(12,2) NOT NULL,
    current_value DECIMAL(12,2) DEFAULT 0,
    period TEXT,
    action TEXT DEFAULT 'alert_only',
    alert_threshold_percent INTEGER DEFAULT 80,
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

ALTER TABLE budgets ENABLE ROW LEVEL SECURITY;

-- Materialized view for aggregations (refreshed periodically)
CREATE MATERIALIZED VIEW token_aggregations AS
SELECT
    organization_id,
    project_id,
    DATE_TRUNC('day', created_at) as period_day,
    model,
    SUM(total_tokens) as total_tokens,
    SUM(total_cost_usd) as total_cost,
    COUNT(*) as call_count,
    AVG(latency_ms) as avg_latency_ms
FROM token_records
GROUP BY organization_id, project_id, DATE_TRUNC('day', created_at), model;

CREATE UNIQUE INDEX idx_token_agg ON token_aggregations(organization_id, project_id, period_day, model);

8. Event Definitions

type TokenEconomicsEvents =
  | { type: 'TOKEN_RECORDED'; payload: {
      recordId: string; taskId: string; tokens: number; cost: number
    }}
  | { type: 'BUDGET_THRESHOLD_CROSSED'; payload: {
      context: string; threshold: number; current: number
    }}
  | { type: 'BUDGET_EXHAUSTED'; payload: {
      context: string; limit: number; action: string
    }}
  | { type: 'THROTTLE_ACTIVATED'; payload: {
      context: string; delayMs: number
    }}
  | { type: 'CONSUMPTION_SPIKE'; payload: {
      context: string; rate: number; threshold: number
    }}
  | { type: 'FORECAST_WARNING'; payload: {
      context: string; forecastedCost: number; budget: number
    }};

9. Configuration Schema

token_economics:
  enabled: true

  recording:
    buffer_size: 100
    flush_interval_ms: 1000
    retry_attempts: 3

  pricing:
    source: "config"  # config, api, or external_service
    update_interval: "daily"

  budgets:
    default_organization_monthly_usd: 10000
    default_project_daily_usd: 500
    default_task_max_usd: 100
    alert_threshold_percent: 80
    hard_limit_action: "throttle"

  throttling:
    initial_delay_ms: 1000
    max_delay_ms: 60000
    backoff_multiplier: 2

  forecasting:
    task_lookback_iterations: 10
    project_lookback_days: 30
    confidence_level: 0.9

  alerting:
    channels:
      - type: "slack"
        webhook_url: "${SLACK_WEBHOOK_URL}"
        events: ["budget_threshold", "budget_exhausted", "spike"]
      - type: "pagerduty"
        routing_key: "${PAGERDUTY_KEY}"
        events: ["budget_exhausted"]

  dashboard:
    refresh_interval_ms: 5000
    retention_days: 90

Consequences

Positive

Cost Visibility - Real-time tracking of all token consumption
Budget Control - Automatic enforcement prevents overruns
Cost Attribution - Per-task/agent chargeback for enterprise
Optimization - Efficiency metrics identify improvement opportunities
Forecasting - Predictable costs enable planning
Throttling - Graceful degradation when limits approached

Negative

Recording Overhead - Each API call has tracking overhead (~10ms)
Storage Growth - Token records accumulate (mitigated by retention)
Complexity - Budget checks on every call path
Pricing Maintenance - Must track model pricing changes

Risks

Risk	Likelihood	Impact	Mitigation
Pricing changes	Medium	Incorrect costs	Configurable pricing, alerts on API changes
High-volume recording impact	Medium	Performance degradation	Async buffered writes
Budget race conditions	Low	Over-budget execution	Atomic budget checks with reservations
Forecast inaccuracy	Medium	Poor planning	Confidence intervals, multiple models

Performance Requirements

Metric	Target
Token recording latency	< 10ms
Budget check latency	< 50ms
Aggregation query latency	< 100ms
Recording throughput	10,000+ records/minute
Dashboard refresh	< 5s
Token tracking accuracy	> 99%
Budget enforcement latency	< 1s
Cost projection accuracy	±10%

Implementation Phases

Phase 1: Token Recording (Week 1)

Token record schema definition
Recording service with buffering
API client wrapper for consumption capture

Phase 2: Budget System (Week 2)

Budget configuration and hierarchy
Budget enforcement service
Running totals with real-time updates

Phase 3: Aggregation and Reporting (Week 3)

Aggregation service (real-time + periodic)
Query API for consumption data
Forecasting engine

Phase 4: Dashboard and Alerts (Week 4)

Dashboard data service (WebSocket updates)
Alert evaluation and notification
Testing and documentation

ADR-002: PostgreSQL as Primary Database (cloud storage with RLS)
ADR-053: Cloud Context Sync Architecture (local-to-cloud sync)
ADR-089: Two-Database Architecture (SQLite local + PostgreSQL cloud)
ADR-108: Checkpoint Protocol (token metrics in checkpoint)
ADR-109: Browser Automation (browser test cost attribution)
ADR-110: Health Monitoring (consumption anomaly detection)
ADR-112: Ralph Wiggum Database Architecture (consolidates DB decisions)

Glossary

Term	Definition
Token	Basic unit of text processed by LLMs; approximately 4 characters or 0.75 words in English
Input Token	Token sent to the model as part of the prompt (context, instructions, user message)
Output Token	Token generated by the model in its response
Cache Token	Token retrieved from or written to prompt cache, reducing API costs
Budget	Spending limit configured at organization, project, task, or agent level
Throttling	Slowing request rate when approaching budget limits with exponential backoff
Chargeback	Attribution of costs to specific organizational units for billing
Forecasting	Predicting future token consumption based on historical patterns
Aggregation	Combining token records into summary statistics (hourly, daily, monthly)
RLS	Row-Level Security - PostgreSQL feature for automatic tenant data isolation
Materialized View	Pre-computed query result stored in database for fast aggregation queries
JSONB	PostgreSQL binary JSON type with indexing support for efficient queries
Exponential Backoff	Delay strategy that doubles wait time between retries (1s, 2s, 4s, ...)
WebSocket	Full-duplex communication protocol for real-time updates

References

ADR-111 | Created: 2026-01-24 | Status: Proposed

Status​

Context​

Problem Statement​

Key Insight from Ralph Wiggum Analysis​

Current State​

Decision​

1. Token Tracking Schema​

2. Budget Hierarchy​

3. Budget Enforcement Protocol​

4. Pricing Model (Configurable)​

5. Aggregation and Efficiency Metrics​

6. Forecasting​

7. Database Schema​

8. Event Definitions​

9. Configuration Schema​

Consequences​

Positive​

Negative​

Risks​

Performance Requirements​

Implementation Phases​

Phase 1: Token Recording (Week 1)​

Phase 2: Budget System (Week 2)​

Phase 3: Aggregation and Reporting (Week 3)​

Phase 4: Dashboard and Alerts (Week 4)​

Related ADRs​

Glossary​

References​