Skip to main content

Implementation Requirements: Token Economics Instrumentation

Document ID: IMPL-REQ-004
Priority: P1 (High)
Target ADR: ADR-111 (Proposed)
Estimated Effort: 2 Sprints
Dependencies: Orchestrator, Checkpoint Service, Observability Stack


1. Overview

1.1 Problem Statement

Multi-agent autonomous development has a 15x token multiplier compared to single-agent workflows (per user preferences documentation). Without instrumentation:

  • Costs are unpredictable and can spike unexpectedly
  • Budget overruns are detected too late
  • Cost attribution per task/agent is impossible
  • Optimization opportunities are invisible
  • Enterprise customers cannot forecast spend

1.2 Objective

Implement comprehensive token economics instrumentation that:

  • Tracks token consumption per agent, task, and iteration
  • Provides real-time cost visibility
  • Enforces budget limits with auto-throttling
  • Enables cost projection and forecasting
  • Supports chargeback/attribution for enterprise

1.3 Success Criteria

MetricTarget
Token tracking accuracy> 99%
Budget enforcement latency< 1s
Cost projection accuracy±10%
Dashboard refresh rate< 5s

2. Functional Requirements

2.1 Token Tracking Schema

FR-001: Token Consumption Record

token_record:
record_id: string # UUID
timestamp: datetime # When consumption occurred

context:
organization_id: string # For multi-tenant
project_id: string # Project/workspace
task_id: string # Parent task
agent_id: string # Executing agent
iteration: integer # Loop iteration
checkpoint_id: string # Associated checkpoint

consumption:
model: string # claude-opus-4.5, claude-sonnet-4.5, etc.
input_tokens: integer # Prompt tokens
output_tokens: integer # Completion tokens
cache_read_tokens: integer # Prompt cache hits
cache_write_tokens: integer # Prompt cache writes
total_tokens: integer # Sum of all

cost:
input_cost_usd: decimal # Calculated cost
output_cost_usd: decimal
cache_cost_usd: decimal
total_cost_usd: decimal

metadata:
tool_calls: array # Tools invoked in this call
latency_ms: integer # API response time
success: boolean # Whether call succeeded
error_type: string # If failed, error category

2.2 Budget Configuration

FR-002: Budget Hierarchy

budgets:
organization:
monthly_limit_usd: 10000
alert_threshold_percent: 80
hard_limit_action: "throttle" # throttle, pause, alert_only

project:
daily_limit_usd: 500
task_limit_usd: 50
alert_threshold_percent: 75

task:
max_tokens: 1000000
max_iterations: 50
max_cost_usd: 100
per_iteration_limit_tokens: 50000

agent:
max_tokens_per_call: 100000
max_cost_per_call_usd: 5
rate_limit_calls_per_minute: 60

Budget Inheritance:
├── Organization (top level)
│ ├── Project (inherits org limits)
│ │ ├── Task (inherits project limits)
│ │ │ └── Agent (inherits task limits)
│ │ └── Lower level cannot exceed parent

2.3 Budget Enforcement

FR-003: Enforcement Protocol

PRE-CALL CHECK:
1. Retrieve current consumption for context
2. Estimate call cost (based on prompt size + expected output)
3. Check against all applicable budgets:
├── Agent call limit
├── Task iteration limit
├── Task total limit
├── Project daily limit
└── Organization monthly limit
4. If ANY limit would be exceeded:
├── If hard limit: Block call, return budget_exceeded error
├── If soft limit: Log warning, proceed with monitoring
└── If alert threshold: Notify but proceed

POST-CALL RECORDING:
1. Record actual consumption
2. Update running totals
3. Check if any threshold newly crossed
4. Emit events for alerts/dashboard

THROTTLING BEHAVIOR:
├── Approach threshold (80%): Emit warning event
├── At threshold (100%):
│ ├── throttle: Delay calls by 1s, then 2s, then 4s...
│ ├── pause: Block all calls until budget reset/increase
│ └── alert_only: Log and continue
└── Override available for emergency/human approval

2.4 Cost Calculation

FR-004: Pricing Model (January 2026 rates, configurable)

interface ModelPricing {
model: string;
inputPerMillionTokens: number; // USD
outputPerMillionTokens: number; // USD
cacheReadPerMillionTokens: number;
cacheWritePerMillionTokens: number;
}

const PRICING: ModelPricing[] = [
{
model: "claude-opus-4-5-20251101",
inputPerMillionTokens: 15.00,
outputPerMillionTokens: 75.00,
cacheReadPerMillionTokens: 1.50,
cacheWritePerMillionTokens: 18.75
},
{
model: "claude-sonnet-4-5-20250929",
inputPerMillionTokens: 3.00,
outputPerMillionTokens: 15.00,
cacheReadPerMillionTokens: 0.30,
cacheWritePerMillionTokens: 3.75
},
{
model: "claude-haiku-4-5-20251001",
inputPerMillionTokens: 0.80,
outputPerMillionTokens: 4.00,
cacheReadPerMillionTokens: 0.08,
cacheWritePerMillionTokens: 1.00
}
];

// Cost calculation function
function calculateCost(record: TokenConsumption): CostBreakdown {
const pricing = PRICING.find(p => p.model === record.model);
return {
input: (record.input_tokens / 1_000_000) * pricing.inputPerMillionTokens,
output: (record.output_tokens / 1_000_000) * pricing.outputPerMillionTokens,
cacheRead: (record.cache_read_tokens / 1_000_000) * pricing.cacheReadPerMillionTokens,
cacheWrite: (record.cache_write_tokens / 1_000_000) * pricing.cacheWritePerMillionTokens,
total: /* sum of above */
};
}

2.5 Aggregation and Reporting

FR-005: Consumption Aggregations

Real-time aggregations (updated on each record):
├── Per-agent running total (tokens, cost)
├── Per-task running total
├── Per-project daily total
├── Per-organization monthly total
└── Global system total

Periodic rollups (hourly, daily, monthly):
├── Task summaries
├── Project summaries
├── Model usage breakdown
├── Agent efficiency metrics
└── Cost trend analysis

Efficiency Metrics:
├── tokens_per_tool_call: Total tokens / tool calls
├── cost_per_iteration: Total cost / iterations
├── cache_hit_rate: cache_read / (cache_read + input)
├── output_input_ratio: output_tokens / input_tokens
└── cost_per_completed_task: Total cost / successful tasks

2.6 Forecasting

FR-006: Cost Projection

Short-term (task-level):
├── Based on: Current consumption rate, remaining work estimate
├── Formula: current_rate * estimated_remaining_iterations
├── Confidence: Based on variance in iteration costs
└── Update frequency: Every iteration

Medium-term (project-level):
├── Based on: Historical project patterns, active task count
├── Formula: avg_task_cost * remaining_tasks
├── Considers: Task complexity estimates
└── Update frequency: Daily

Long-term (organization-level):
├── Based on: Monthly trends, growth rate, planned projects
├── Formula: Exponential smoothing of historical data
├── Considers: Seasonality, team growth
└── Update frequency: Weekly

3. Non-Functional Requirements

3.1 Performance

RequirementSpecification
NFR-001Token recording < 10ms
NFR-002Budget check < 50ms
NFR-003Aggregation query < 100ms
NFR-004Support 10,000+ records/minute
NFR-005Dashboard refresh < 5s

3.2 Reliability

RequirementSpecification
NFR-006Zero lost token records
NFR-007Budget enforcement always active
NFR-008Graceful degradation if aggregation fails
NFR-009Async recording (non-blocking to API calls)

3.3 Accuracy

RequirementSpecification
NFR-010Token count matches API response exactly
NFR-011Cost calculation uses current pricing
NFR-012Aggregations consistent within 1 second
NFR-013Forecast accuracy ±10% at task level

3.4 Observability

RequirementSpecification
NFR-014Metrics: token_consumption histogram
NFR-015Metrics: cost_usd histogram
NFR-016Metrics: budget_utilization gauge
NFR-017Alerts: Budget threshold crossed
NFR-018Logs: All budget enforcement decisions

4. Implementation Steps

Phase 1: Token Recording (Week 1)

Step 1.1: Token Record Schema
├── Define TokenRecord type (TypeScript/Rust)
├── Define FoundationDB key structure:
│ └── /coditect/tokens/{org}/{project}/{task}/{record_id}
├── Add indexing keys for aggregation:
│ ├── /coditect/tokens/by-agent/{agent_id}/{timestamp}
│ ├── /coditect/tokens/by-task/{task_id}/{timestamp}
│ └── /coditect/tokens/by-day/{date}/{org}/{project}
└── Implement serialization

Step 1.2: Recording Service
├── Create TokenRecordingService
├── Implement record() method:
│ ├── Validate record schema
│ ├── Calculate costs
│ ├── Write to FDB
│ └── Update running totals (async)
├── Add buffering for high-volume:
│ ├── In-memory buffer (100 records)
│ ├── Flush every 1 second or on buffer full
│ └── Guaranteed delivery (retry on failure)
└── Emit TOKEN_RECORDED event

Step 1.3: API Integration
├── Wrap Anthropic API client
├── Capture usage from response:
│ ├── input_tokens
│ ├── output_tokens
│ └── cache metrics
├── Enrich with context (task_id, agent_id, etc.)
├── Call recording service
└── Non-blocking (fire and forget with retry)

Phase 2: Budget System (Week 2)

Step 2.1: Budget Configuration
├── Define Budget schema
├── Create budget hierarchy structure
├── Implement budget storage:
│ └── /coditect/budgets/{org}/{project}/{task}
├── Add budget inheritance resolution
└── Support runtime budget updates

Step 2.2: Budget Enforcement
├── Create BudgetEnforcementService
├── Implement pre-call check:
│ ├── Estimate call cost
│ ├── Query running totals
│ ├── Check all applicable budgets
│ └── Return allow/deny/throttle decision
├── Implement throttling:
│ ├── Track throttle state per context
│ ├── Apply exponential backoff
│ └── Reset on budget refresh
└── Add override mechanism for emergencies

Step 2.3: Running Totals
├── Implement real-time aggregation:
│ ├── Atomic increment on record
│ ├── Per-agent total
│ ├── Per-task total
│ ├── Per-project daily total
│ └── Per-organization monthly total
├── Add cache layer for read performance
└── Implement total recalculation (for recovery)

Phase 3: Aggregation and Reporting (Week 3)

Step 3.1: Aggregation Service
├── Create AggregationService
├── Implement real-time aggregations:
│ ├── Subscribe to TOKEN_RECORDED events
│ ├── Update materialized views
│ └── Maintain consistency with source
├── Implement periodic rollups:
│ ├── Hourly job: Task summaries
│ ├── Daily job: Project summaries
│ └── Monthly job: Organization summaries
└── Add efficiency metric calculation

Step 3.2: Query API
├── Implement consumption queries:
│ ├── getAgentConsumption(agentId, timeRange)
│ ├── getTaskConsumption(taskId)
│ ├── getProjectConsumption(projectId, date)
│ ├── getOrganizationConsumption(orgId, month)
│ └── getModelBreakdown(context, timeRange)
├── Implement budget queries:
│ ├── getBudgetUtilization(context)
│ ├── getBudgetHistory(context, timeRange)
│ └── getBudgetForecast(context)
└── Add pagination and filtering

Step 3.3: Forecasting Engine
├── Implement task-level forecast:
│ ├── Calculate iteration cost variance
│ ├── Estimate remaining iterations
│ └── Project completion cost
├── Implement project-level forecast:
│ ├── Aggregate task forecasts
│ ├── Apply historical patterns
│ └── Confidence intervals
└── Add forecast to dashboard data

Phase 4: Dashboard and Alerts (Week 4)

Step 4.1: Dashboard Data Service
├── Create DashboardDataService
├── Implement real-time data feed:
│ ├── WebSocket for live updates
│ ├── Polling fallback
│ └── Delta updates (changed data only)
├── Implement dashboard views:
│ ├── Overview (org summary)
│ ├── Project drill-down
│ ├── Task detail
│ ├── Agent comparison
│ └── Model usage breakdown
└── Add export capabilities (CSV, JSON)

Step 4.2: Alerting
├── Define alert conditions:
│ ├── budget_threshold_crossed
│ ├── budget_exhausted
│ ├── unusual_consumption_spike
│ ├── forecast_exceeds_budget
│ └── efficiency_degradation
├── Implement alert evaluation:
│ ├── Subscribe to relevant events
│ ├── Evaluate conditions
│ └── Emit alerts
├── Integrate with notification channels:
│ ├── Slack
│ ├── PagerDuty
│ ├── Email
│ └── In-app notifications
└── Add alert acknowledgment/mute

Step 4.3: Testing and Documentation
├── Unit tests:
│ ├── Cost calculation accuracy
│ ├── Budget enforcement logic
│ ├── Aggregation correctness
│ └── Forecast algorithms
├── Integration tests:
│ ├── End-to-end recording flow
│ ├── Budget enforcement scenarios
│ ├── Dashboard data accuracy
│ └── Alert triggering
└── Documentation:
├── API reference
├── Budget configuration guide
├── Dashboard user guide
└── Cost optimization best practices

5. API Specification

interface TokenEconomicsService {
// Recording
recordConsumption(record: TokenConsumption): Promise<void>;

// Budget management
setBudget(context: BudgetContext, budget: Budget): Promise<void>;
getBudget(context: BudgetContext): Promise<Budget>;
checkBudget(context: BudgetContext, estimatedCost: number): Promise<BudgetCheckResult>;

// Consumption queries
getConsumption(context: ConsumptionContext, timeRange?: TimeRange): Promise<ConsumptionSummary>;
getConsumptionBreakdown(context: ConsumptionContext, groupBy: GroupBy): Promise<ConsumptionBreakdown>;

// Aggregations
getRunningTotal(context: BudgetContext): Promise<RunningTotal>;
getEfficiencyMetrics(context: ConsumptionContext): Promise<EfficiencyMetrics>;

// Forecasting
getForecast(context: BudgetContext): Promise<CostForecast>;

// Dashboard
getDashboardData(context: DashboardContext): Promise<DashboardData>;
subscribeToDashboard(context: DashboardContext, callback: DashboardCallback): Unsubscribe;
}

interface BudgetCheckResult {
allowed: boolean;
action: 'allow' | 'throttle' | 'deny';
throttleDelayMs?: number;
reason?: string;
budgetUtilization: {
agent: number;
task: number;
project: number;
organization: number;
};
estimatedCost: number;
remainingBudget: number;
}

interface EfficiencyMetrics {
tokensPerToolCall: number;
costPerIteration: number;
cacheHitRate: number;
outputInputRatio: number;
costPerCompletedTask: number;
modelUsageBreakdown: { model: string; percentage: number; cost: number }[];
}

6. Event Definitions

type TokenEconomicsEvents =
| { type: 'TOKEN_RECORDED'; payload: { recordId: string; taskId: string; tokens: number; cost: number } }
| { type: 'BUDGET_THRESHOLD_CROSSED'; payload: { context: string; threshold: number; current: number } }
| { type: 'BUDGET_EXHAUSTED'; payload: { context: string; limit: number; action: string } }
| { type: 'THROTTLE_ACTIVATED'; payload: { context: string; delayMs: number } }
| { type: 'CONSUMPTION_SPIKE'; payload: { context: string; rate: number; threshold: number } }
| { type: 'FORECAST_WARNING'; payload: { context: string; forecastedCost: number; budget: number } };

7. Dashboard Wireframe

┌─────────────────────────────────────────────────────────────────────┐
│ TOKEN ECONOMICS DASHBOARD [Export] [⚙️] │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ MONTHLY SPEND │ │ TODAY'S SPEND │ │ ACTIVE AGENTS │ │
│ │ $2,847.32 │ │ $142.18 │ │ 12 │ │
│ │ ▲ 12% vs last │ │ Budget: $500 │ │ Consuming now │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ BUDGET UTILIZATION │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Organization ████████████████████░░░░░░░░░░ 71% ($7,100) │ │
│ │ Project A ████████████░░░░░░░░░░░░░░░░░░ 45% ($450) │ │
│ │ Project B ██████████████████████████░░░░ 89% ($890) ⚠️ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ CONSUMPTION BY MODEL (Today) │
│ ┌───────────────────────────────────┐ ┌────────────────────────┐ │
│ │ ┌────┐ │ │ Model Cost │ │
│ │ │████│ Opus 4.5 │ │ Opus 4.5 $98.42 │ │
│ │ │████│ 69% │ │ Sonnet 4.5 $38.76 │ │
│ │ │ │ │ │ Haiku 4.5 $5.00 │ │
│ │ │░░░░│ Sonnet 4.5 │ ├────────────────────────┤ │
│ │ │░░░░│ 27% │ │ Total $142.18 │ │
│ │ │ │ │ └────────────────────────┘ │
│ │ │░ │ Haiku 4.5 │ │
│ │ │░ │ 4% │ │
│ │ └────┘ │ │
│ └───────────────────────────────────┘ │
│ │
│ EFFICIENCY METRICS │
│ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Tokens/Tool Call │ │ Cache Hit Rate │ │ Cost/Task │ │
│ │ 1,247 │ │ 34.2% │ │ $8.42 │ │
│ │ ▼ 5% better │ │ ▲ 12% better │ │ ▼ 3% better │ │
│ └──────────────────┘ └──────────────────┘ └──────────────────┘ │
│ │
│ RECENT ACTIVITY [View All →] │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 14:32 Task-047 Implementation Agent 2,341 tokens $0.42 │ │
│ │ 14:31 Task-047 Implementation Agent 1,892 tokens $0.34 │ │
│ │ 14:28 Task-046 QA Agent 4,521 tokens $0.81 │ │
│ │ 14:25 Task-046 QA Agent 3,102 tokens $0.56 │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘

8. Configuration Schema

# coditect-config.yaml
token_economics:
enabled: true

recording:
buffer_size: 100
flush_interval_ms: 1000
retry_attempts: 3

pricing:
source: "config" # config, api, or external_service
update_interval: "daily"
models:
- model: "claude-opus-4-5-20251101"
input_per_million: 15.00
output_per_million: 75.00
cache_read_per_million: 1.50
cache_write_per_million: 18.75
# ... other models

budgets:
default_organization_monthly_usd: 10000
default_project_daily_usd: 500
default_task_max_usd: 100
alert_threshold_percent: 80
hard_limit_action: "throttle" # throttle, pause, alert_only

throttling:
initial_delay_ms: 1000
max_delay_ms: 60000
backoff_multiplier: 2

forecasting:
task_lookback_iterations: 10
project_lookback_days: 30
confidence_level: 0.9

alerting:
channels:
- type: "slack"
webhook_url: "${SLACK_WEBHOOK_URL}"
events: ["budget_threshold", "budget_exhausted", "spike"]
- type: "pagerduty"
routing_key: "${PAGERDUTY_KEY}"
events: ["budget_exhausted"]

dashboard:
refresh_interval_ms: 5000
retention_days: 90

9. Dependencies

DependencyTypeStatus
FoundationDBInfrastructure✅ Available
Event BusPlatform✅ Available
Anthropic API ClientLibrary✅ Requires wrapping
Checkpoint ServicePlatform🔄 IMPL-REQ-001
Dashboard FrameworkFrontend⚠️ TBD
Alerting (Slack/PagerDuty)External✅ Available

10. Risks and Mitigations

RiskImpactLikelihoodMitigation
Pricing changesIncorrect cost calculationsMediumConfigurable pricing, alerts on API changes
High-volume recording impactPerformance degradationMediumAsync buffered writes
Budget race conditionsOver-budget executionLowAtomic budget checks with reservations
Forecast inaccuracyPoor planning decisionsMediumConfidence intervals, multiple models
Dashboard latencyPoor user experienceLowCaching, incremental updates

11. Acceptance Criteria

  • All API calls have token consumption recorded
  • Cost calculations match manual verification
  • Budget enforcement blocks calls at limit
  • Throttling applies correct exponential backoff
  • Running totals update in real-time
  • Dashboard displays accurate current state
  • Alerts fire within 1 minute of threshold crossing
  • Forecasts within ±10% of actual (measured over 1 week)
  • Export functionality produces valid CSV/JSON
  • Documentation covers all configuration options

Document Version: 1.0 | Last Updated: January 24, 2026