Implementation Requirements: Token Economics Instrumentation

Document ID: IMPL-REQ-004
Priority: P1 (High)
Target ADR: ADR-111 (Proposed)
Estimated Effort: 2 Sprints
Dependencies: Orchestrator, Checkpoint Service, Observability Stack

1. Overview

1.1 Problem Statement

Multi-agent autonomous development has a 15x token multiplier compared to single-agent workflows (per user preferences documentation). Without instrumentation:

Costs are unpredictable and can spike unexpectedly
Budget overruns are detected too late
Cost attribution per task/agent is impossible
Optimization opportunities are invisible
Enterprise customers cannot forecast spend

1.2 Objective

Implement comprehensive token economics instrumentation that:

Tracks token consumption per agent, task, and iteration
Provides real-time cost visibility
Enforces budget limits with auto-throttling
Enables cost projection and forecasting
Supports chargeback/attribution for enterprise

1.3 Success Criteria

Metric	Target
Token tracking accuracy	> 99%
Budget enforcement latency	< 1s
Cost projection accuracy	±10%
Dashboard refresh rate	< 5s

2. Functional Requirements

2.1 Token Tracking Schema

FR-001: Token Consumption Record

token_record:
  record_id: string              # UUID
  timestamp: datetime            # When consumption occurred
  
  context:
    organization_id: string      # For multi-tenant
    project_id: string           # Project/workspace
    task_id: string              # Parent task
    agent_id: string             # Executing agent
    iteration: integer           # Loop iteration
    checkpoint_id: string        # Associated checkpoint
    
  consumption:
    model: string                # claude-opus-4.5, claude-sonnet-4.5, etc.
    input_tokens: integer        # Prompt tokens
    output_tokens: integer       # Completion tokens
    cache_read_tokens: integer   # Prompt cache hits
    cache_write_tokens: integer  # Prompt cache writes
    total_tokens: integer        # Sum of all
    
  cost:
    input_cost_usd: decimal      # Calculated cost
    output_cost_usd: decimal
    cache_cost_usd: decimal
    total_cost_usd: decimal
    
  metadata:
    tool_calls: array            # Tools invoked in this call
    latency_ms: integer          # API response time
    success: boolean             # Whether call succeeded
    error_type: string           # If failed, error category

2.2 Budget Configuration

FR-002: Budget Hierarchy

budgets:
  organization:
    monthly_limit_usd: 10000
    alert_threshold_percent: 80
    hard_limit_action: "throttle"  # throttle, pause, alert_only
    
  project:
    daily_limit_usd: 500
    task_limit_usd: 50
    alert_threshold_percent: 75
    
  task:
    max_tokens: 1000000
    max_iterations: 50
    max_cost_usd: 100
    per_iteration_limit_tokens: 50000
    
  agent:
    max_tokens_per_call: 100000
    max_cost_per_call_usd: 5
    rate_limit_calls_per_minute: 60

Budget Inheritance:
├── Organization (top level)
│   ├── Project (inherits org limits)
│   │   ├── Task (inherits project limits)
│   │   │   └── Agent (inherits task limits)
│   │   └── Lower level cannot exceed parent

2.3 Budget Enforcement

FR-003: Enforcement Protocol

PRE-CALL CHECK:
1. Retrieve current consumption for context
2. Estimate call cost (based on prompt size + expected output)
3. Check against all applicable budgets:
   ├── Agent call limit
   ├── Task iteration limit
   ├── Task total limit
   ├── Project daily limit
   └── Organization monthly limit
4. If ANY limit would be exceeded:
   ├── If hard limit: Block call, return budget_exceeded error
   ├── If soft limit: Log warning, proceed with monitoring
   └── If alert threshold: Notify but proceed

POST-CALL RECORDING:
1. Record actual consumption
2. Update running totals
3. Check if any threshold newly crossed
4. Emit events for alerts/dashboard

THROTTLING BEHAVIOR:
├── Approach threshold (80%): Emit warning event
├── At threshold (100%): 
│   ├── throttle: Delay calls by 1s, then 2s, then 4s...
│   ├── pause: Block all calls until budget reset/increase
│   └── alert_only: Log and continue
└── Override available for emergency/human approval

2.4 Cost Calculation

FR-004: Pricing Model (January 2026 rates, configurable)

interface ModelPricing {
  model: string;
  inputPerMillionTokens: number;   // USD
  outputPerMillionTokens: number;  // USD
  cacheReadPerMillionTokens: number;
  cacheWritePerMillionTokens: number;
}

const PRICING: ModelPricing[] = [
  {
    model: "claude-opus-4-5-20251101",
    inputPerMillionTokens: 15.00,
    outputPerMillionTokens: 75.00,
    cacheReadPerMillionTokens: 1.50,
    cacheWritePerMillionTokens: 18.75
  },
  {
    model: "claude-sonnet-4-5-20250929",
    inputPerMillionTokens: 3.00,
    outputPerMillionTokens: 15.00,
    cacheReadPerMillionTokens: 0.30,
    cacheWritePerMillionTokens: 3.75
  },
  {
    model: "claude-haiku-4-5-20251001",
    inputPerMillionTokens: 0.80,
    outputPerMillionTokens: 4.00,
    cacheReadPerMillionTokens: 0.08,
    cacheWritePerMillionTokens: 1.00
  }
];

// Cost calculation function
function calculateCost(record: TokenConsumption): CostBreakdown {
  const pricing = PRICING.find(p => p.model === record.model);
  return {
    input: (record.input_tokens / 1_000_000) * pricing.inputPerMillionTokens,
    output: (record.output_tokens / 1_000_000) * pricing.outputPerMillionTokens,
    cacheRead: (record.cache_read_tokens / 1_000_000) * pricing.cacheReadPerMillionTokens,
    cacheWrite: (record.cache_write_tokens / 1_000_000) * pricing.cacheWritePerMillionTokens,
    total: /* sum of above */
  };
}

2.5 Aggregation and Reporting

FR-005: Consumption Aggregations

Real-time aggregations (updated on each record):
├── Per-agent running total (tokens, cost)
├── Per-task running total
├── Per-project daily total
├── Per-organization monthly total
└── Global system total

Periodic rollups (hourly, daily, monthly):
├── Task summaries
├── Project summaries
├── Model usage breakdown
├── Agent efficiency metrics
└── Cost trend analysis

Efficiency Metrics:
├── tokens_per_tool_call: Total tokens / tool calls
├── cost_per_iteration: Total cost / iterations
├── cache_hit_rate: cache_read / (cache_read + input)
├── output_input_ratio: output_tokens / input_tokens
└── cost_per_completed_task: Total cost / successful tasks

2.6 Forecasting

FR-006: Cost Projection

Short-term (task-level):
├── Based on: Current consumption rate, remaining work estimate
├── Formula: current_rate * estimated_remaining_iterations
├── Confidence: Based on variance in iteration costs
└── Update frequency: Every iteration

Medium-term (project-level):
├── Based on: Historical project patterns, active task count
├── Formula: avg_task_cost * remaining_tasks
├── Considers: Task complexity estimates
└── Update frequency: Daily

Long-term (organization-level):
├── Based on: Monthly trends, growth rate, planned projects
├── Formula: Exponential smoothing of historical data
├── Considers: Seasonality, team growth
└── Update frequency: Weekly

3. Non-Functional Requirements

3.1 Performance

Requirement	Specification
NFR-001	Token recording < 10ms
NFR-002	Budget check < 50ms
NFR-003	Aggregation query < 100ms
NFR-004	Support 10,000+ records/minute
NFR-005	Dashboard refresh < 5s

3.2 Reliability

Requirement	Specification
NFR-006	Zero lost token records
NFR-007	Budget enforcement always active
NFR-008	Graceful degradation if aggregation fails
NFR-009	Async recording (non-blocking to API calls)

3.3 Accuracy

Requirement	Specification
NFR-010	Token count matches API response exactly
NFR-011	Cost calculation uses current pricing
NFR-012	Aggregations consistent within 1 second
NFR-013	Forecast accuracy ±10% at task level

3.4 Observability

Requirement	Specification
NFR-014	Metrics: token_consumption histogram
NFR-015	Metrics: cost_usd histogram
NFR-016	Metrics: budget_utilization gauge
NFR-017	Alerts: Budget threshold crossed
NFR-018	Logs: All budget enforcement decisions

4. Implementation Steps

Phase 1: Token Recording (Week 1)

Step 1.1: Token Record Schema
├── Define TokenRecord type (TypeScript/Rust)
├── Define FoundationDB key structure:
│   └── /coditect/tokens/{org}/{project}/{task}/{record_id}
├── Add indexing keys for aggregation:
│   ├── /coditect/tokens/by-agent/{agent_id}/{timestamp}
│   ├── /coditect/tokens/by-task/{task_id}/{timestamp}
│   └── /coditect/tokens/by-day/{date}/{org}/{project}
└── Implement serialization

Step 1.2: Recording Service
├── Create TokenRecordingService
├── Implement record() method:
│   ├── Validate record schema
│   ├── Calculate costs
│   ├── Write to FDB
│   └── Update running totals (async)
├── Add buffering for high-volume:
│   ├── In-memory buffer (100 records)
│   ├── Flush every 1 second or on buffer full
│   └── Guaranteed delivery (retry on failure)
└── Emit TOKEN_RECORDED event

Step 1.3: API Integration
├── Wrap Anthropic API client
├── Capture usage from response:
│   ├── input_tokens
│   ├── output_tokens
│   └── cache metrics
├── Enrich with context (task_id, agent_id, etc.)
├── Call recording service
└── Non-blocking (fire and forget with retry)

Phase 2: Budget System (Week 2)

Step 2.1: Budget Configuration
├── Define Budget schema
├── Create budget hierarchy structure
├── Implement budget storage:
│   └── /coditect/budgets/{org}/{project}/{task}
├── Add budget inheritance resolution
└── Support runtime budget updates

Step 2.2: Budget Enforcement
├── Create BudgetEnforcementService
├── Implement pre-call check:
│   ├── Estimate call cost
│   ├── Query running totals
│   ├── Check all applicable budgets
│   └── Return allow/deny/throttle decision
├── Implement throttling:
│   ├── Track throttle state per context
│   ├── Apply exponential backoff
│   └── Reset on budget refresh
└── Add override mechanism for emergencies

Step 2.3: Running Totals
├── Implement real-time aggregation:
│   ├── Atomic increment on record
│   ├── Per-agent total
│   ├── Per-task total
│   ├── Per-project daily total
│   └── Per-organization monthly total
├── Add cache layer for read performance
└── Implement total recalculation (for recovery)

Phase 3: Aggregation and Reporting (Week 3)

Step 3.1: Aggregation Service
├── Create AggregationService
├── Implement real-time aggregations:
│   ├── Subscribe to TOKEN_RECORDED events
│   ├── Update materialized views
│   └── Maintain consistency with source
├── Implement periodic rollups:
│   ├── Hourly job: Task summaries
│   ├── Daily job: Project summaries
│   └── Monthly job: Organization summaries
└── Add efficiency metric calculation

Step 3.2: Query API
├── Implement consumption queries:
│   ├── getAgentConsumption(agentId, timeRange)
│   ├── getTaskConsumption(taskId)
│   ├── getProjectConsumption(projectId, date)
│   ├── getOrganizationConsumption(orgId, month)
│   └── getModelBreakdown(context, timeRange)
├── Implement budget queries:
│   ├── getBudgetUtilization(context)
│   ├── getBudgetHistory(context, timeRange)
│   └── getBudgetForecast(context)
└── Add pagination and filtering

Step 3.3: Forecasting Engine
├── Implement task-level forecast:
│   ├── Calculate iteration cost variance
│   ├── Estimate remaining iterations
│   └── Project completion cost
├── Implement project-level forecast:
│   ├── Aggregate task forecasts
│   ├── Apply historical patterns
│   └── Confidence intervals
└── Add forecast to dashboard data

Phase 4: Dashboard and Alerts (Week 4)

Step 4.1: Dashboard Data Service
├── Create DashboardDataService
├── Implement real-time data feed:
│   ├── WebSocket for live updates
│   ├── Polling fallback
│   └── Delta updates (changed data only)
├── Implement dashboard views:
│   ├── Overview (org summary)
│   ├── Project drill-down
│   ├── Task detail
│   ├── Agent comparison
│   └── Model usage breakdown
└── Add export capabilities (CSV, JSON)

Step 4.2: Alerting
├── Define alert conditions:
│   ├── budget_threshold_crossed
│   ├── budget_exhausted
│   ├── unusual_consumption_spike
│   ├── forecast_exceeds_budget
│   └── efficiency_degradation
├── Implement alert evaluation:
│   ├── Subscribe to relevant events
│   ├── Evaluate conditions
│   └── Emit alerts
├── Integrate with notification channels:
│   ├── Slack
│   ├── PagerDuty
│   ├── Email
│   └── In-app notifications
└── Add alert acknowledgment/mute

Step 4.3: Testing and Documentation
├── Unit tests:
│   ├── Cost calculation accuracy
│   ├── Budget enforcement logic
│   ├── Aggregation correctness
│   └── Forecast algorithms
├── Integration tests:
│   ├── End-to-end recording flow
│   ├── Budget enforcement scenarios
│   ├── Dashboard data accuracy
│   └── Alert triggering
└── Documentation:
    ├── API reference
    ├── Budget configuration guide
    ├── Dashboard user guide
    └── Cost optimization best practices

5. API Specification

interface TokenEconomicsService {
  // Recording
  recordConsumption(record: TokenConsumption): Promise<void>;
  
  // Budget management
  setBudget(context: BudgetContext, budget: Budget): Promise<void>;
  getBudget(context: BudgetContext): Promise<Budget>;
  checkBudget(context: BudgetContext, estimatedCost: number): Promise<BudgetCheckResult>;
  
  // Consumption queries
  getConsumption(context: ConsumptionContext, timeRange?: TimeRange): Promise<ConsumptionSummary>;
  getConsumptionBreakdown(context: ConsumptionContext, groupBy: GroupBy): Promise<ConsumptionBreakdown>;
  
  // Aggregations
  getRunningTotal(context: BudgetContext): Promise<RunningTotal>;
  getEfficiencyMetrics(context: ConsumptionContext): Promise<EfficiencyMetrics>;
  
  // Forecasting
  getForecast(context: BudgetContext): Promise<CostForecast>;
  
  // Dashboard
  getDashboardData(context: DashboardContext): Promise<DashboardData>;
  subscribeToDashboard(context: DashboardContext, callback: DashboardCallback): Unsubscribe;
}

interface BudgetCheckResult {
  allowed: boolean;
  action: 'allow' | 'throttle' | 'deny';
  throttleDelayMs?: number;
  reason?: string;
  budgetUtilization: {
    agent: number;
    task: number;
    project: number;
    organization: number;
  };
  estimatedCost: number;
  remainingBudget: number;
}

interface EfficiencyMetrics {
  tokensPerToolCall: number;
  costPerIteration: number;
  cacheHitRate: number;
  outputInputRatio: number;
  costPerCompletedTask: number;
  modelUsageBreakdown: { model: string; percentage: number; cost: number }[];
}

6. Event Definitions

type TokenEconomicsEvents =
  | { type: 'TOKEN_RECORDED'; payload: { recordId: string; taskId: string; tokens: number; cost: number } }
  | { type: 'BUDGET_THRESHOLD_CROSSED'; payload: { context: string; threshold: number; current: number } }
  | { type: 'BUDGET_EXHAUSTED'; payload: { context: string; limit: number; action: string } }
  | { type: 'THROTTLE_ACTIVATED'; payload: { context: string; delayMs: number } }
  | { type: 'CONSUMPTION_SPIKE'; payload: { context: string; rate: number; threshold: number } }
  | { type: 'FORECAST_WARNING'; payload: { context: string; forecastedCost: number; budget: number } };

7. Dashboard Wireframe

┌─────────────────────────────────────────────────────────────────────┐
│ TOKEN ECONOMICS DASHBOARD                           [Export] [⚙️]   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐     │
│  │ MONTHLY SPEND   │  │ TODAY'S SPEND   │  │ ACTIVE AGENTS   │     │
│  │    $2,847.32    │  │    $142.18      │  │       12        │     │
│  │ ▲ 12% vs last  │  │ Budget: $500    │  │ Consuming now   │     │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘     │
│                                                                     │
│  BUDGET UTILIZATION                                                 │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │ Organization ████████████████████░░░░░░░░░░ 71% ($7,100)    │   │
│  │ Project A    ████████████░░░░░░░░░░░░░░░░░░ 45% ($450)      │   │
│  │ Project B    ██████████████████████████░░░░ 89% ($890) ⚠️   │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  CONSUMPTION BY MODEL (Today)                                       │
│  ┌───────────────────────────────────┐  ┌────────────────────────┐ │
│  │  ┌────┐                           │  │ Model        Cost      │ │
│  │  │████│ Opus 4.5                  │  │ Opus 4.5     $98.42    │ │
│  │  │████│ 69%                       │  │ Sonnet 4.5   $38.76    │ │
│  │  │    │                           │  │ Haiku 4.5    $5.00     │ │
│  │  │░░░░│ Sonnet 4.5                │  ├────────────────────────┤ │
│  │  │░░░░│ 27%                       │  │ Total        $142.18   │ │
│  │  │    │                           │  └────────────────────────┘ │
│  │  │░   │ Haiku 4.5                 │                             │
│  │  │░   │ 4%                        │                             │
│  │  └────┘                           │                             │
│  └───────────────────────────────────┘                             │
│                                                                     │
│  EFFICIENCY METRICS                                                 │
│  ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐   │
│  │ Tokens/Tool Call │ │ Cache Hit Rate   │ │ Cost/Task        │   │
│  │      1,247       │ │      34.2%       │ │     $8.42        │   │
│  │    ▼ 5% better   │ │    ▲ 12% better  │ │   ▼ 3% better    │   │
│  └──────────────────┘ └──────────────────┘ └──────────────────┘   │
│                                                                     │
│  RECENT ACTIVITY                                    [View All →]   │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │ 14:32  Task-047  Implementation Agent  2,341 tokens  $0.42  │   │
│  │ 14:31  Task-047  Implementation Agent  1,892 tokens  $0.34  │   │
│  │ 14:28  Task-046  QA Agent             4,521 tokens  $0.81  │   │
│  │ 14:25  Task-046  QA Agent             3,102 tokens  $0.56  │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

8. Configuration Schema

# coditect-config.yaml
token_economics:
  enabled: true
  
  recording:
    buffer_size: 100
    flush_interval_ms: 1000
    retry_attempts: 3
    
  pricing:
    source: "config"  # config, api, or external_service
    update_interval: "daily"
    models:
      - model: "claude-opus-4-5-20251101"
        input_per_million: 15.00
        output_per_million: 75.00
        cache_read_per_million: 1.50
        cache_write_per_million: 18.75
      # ... other models
      
  budgets:
    default_organization_monthly_usd: 10000
    default_project_daily_usd: 500
    default_task_max_usd: 100
    alert_threshold_percent: 80
    hard_limit_action: "throttle"  # throttle, pause, alert_only
    
  throttling:
    initial_delay_ms: 1000
    max_delay_ms: 60000
    backoff_multiplier: 2
    
  forecasting:
    task_lookback_iterations: 10
    project_lookback_days: 30
    confidence_level: 0.9
    
  alerting:
    channels:
      - type: "slack"
        webhook_url: "${SLACK_WEBHOOK_URL}"
        events: ["budget_threshold", "budget_exhausted", "spike"]
      - type: "pagerduty"
        routing_key: "${PAGERDUTY_KEY}"
        events: ["budget_exhausted"]
        
  dashboard:
    refresh_interval_ms: 5000
    retention_days: 90

9. Dependencies

Dependency	Type	Status
FoundationDB	Infrastructure	✅ Available
Event Bus	Platform	✅ Available
Anthropic API Client	Library	✅ Requires wrapping
Checkpoint Service	Platform	🔄 IMPL-REQ-001
Dashboard Framework	Frontend	⚠️ TBD
Alerting (Slack/PagerDuty)	External	✅ Available

10. Risks and Mitigations

Risk	Impact	Likelihood	Mitigation
Pricing changes	Incorrect cost calculations	Medium	Configurable pricing, alerts on API changes
High-volume recording impact	Performance degradation	Medium	Async buffered writes
Budget race conditions	Over-budget execution	Low	Atomic budget checks with reservations
Forecast inaccuracy	Poor planning decisions	Medium	Confidence intervals, multiple models
Dashboard latency	Poor user experience	Low	Caching, incremental updates

11. Acceptance Criteria

Document Version: 1.0 | Last Updated: January 24, 2026

1. Overview​

1.1 Problem Statement​

1.2 Objective​

1.3 Success Criteria​

2. Functional Requirements​

2.1 Token Tracking Schema​

2.2 Budget Configuration​

2.3 Budget Enforcement​

2.4 Cost Calculation​

2.5 Aggregation and Reporting​

2.6 Forecasting​

3. Non-Functional Requirements​

3.1 Performance​

3.2 Reliability​

3.3 Accuracy​

3.4 Observability​

4. Implementation Steps​

Phase 1: Token Recording (Week 1)​

Phase 2: Budget System (Week 2)​

Phase 3: Aggregation and Reporting (Week 3)​

Phase 4: Dashboard and Alerts (Week 4)​

5. API Specification​

6. Event Definitions​

7. Dashboard Wireframe​

8. Configuration Schema​

9. Dependencies​

10. Risks and Mitigations​

11. Acceptance Criteria​