Skip to main content

Component Token Scaling Analysis

Executive Summary

CODITECT is a product framework that provides AI-assisted development environments to customers. Each customer installation includes 3,392 components (766 agents, 431 skills, 364 commands, 108 hooks, 563 scripts, 1,152 workflows). When Claude Code loads agent descriptions into the Task tool's system prompt, the cumulative token cost is ~25.6k tokens -- exceeding the recommended 15k token budget by 70%.

This is a product architecture scaling problem, not merely an internal tooling concern. Every CODITECT customer bears these costs in every session. As the component inventory grows, this problem compounds across the entire customer base.

This document defines the problem space, surveys academic and industry solutions, presents measured data on the exact loading mechanism, and recommends a Progressive Component Disclosure architecture (ADR-162) that can achieve 85-96% token reduction while maintaining full component access.


1. Problem Definition

1.1 Current State (Measured)

MetricValueSource
Total components3,392config/component-counts.json
Agent definitions (.claude/agents/*.md)770 filesFilesystem count
Agent file total size4.88 MB (~452K tokens)Measured
Agent descriptions in Task tool schema~25.6k tokensEstimated from ~33 tokens/description x 770
Skill files432 files (5.75 MB, ~510K tokens)Measured
Command files365 files (2.87 MB, ~260K tokens)Measured
CLAUDE.md system prompt954 tokensMeasured
Tool definitions (built-in + MCP)~15-35K tokensEstimated
Recommended agent description budget15k tokensAnthropic guidance
Overage+70% (10.6k tokens)
Context consumed before user input~13% of 200k window

1.2 Loading Mechanism (Discovered)

Claude Code loads custom agents via the .claude/agents/ directory. In CODITECT, .claude is a symlink:

.claude -> .coditect -> submodules/core/coditect-core/

Therefore .claude/agents/ resolves to coditect-core/agents/ -- all 770 agent markdown files are loaded by Claude Code at session start. Claude Code extracts name + description from each file's frontmatter and embeds them as subagent_type enum values in the Task tool's function schema. The full markdown body is loaded only when an agent is dispatched.

Token breakdown per API call:

ComponentTokensWhen Loaded
CLAUDE.md (system prompt)~954Every call
Agent descriptions (Task schema)~25,600Every call
Tool definitions (built-in + MCP)~15,000-35,000Every call
Agent full body~589 avgOnly on dispatch
Total system overhead~41,500-61,500Every call

Key insight: The ~25.6K token agent description cost is embedded in the Task tool's JSON schema definition and is unavoidable -- it's paid on every API call regardless of whether the Task tool is invoked.

1.3 Dual Loading Mechanism

CODITECT has two parallel agent dispatch mechanisms:

  1. Claude Code native (.claude/agents/): Agent name+description loaded into Task tool schema. Claude dispatches directly via Task(subagent_type="agent-name", prompt="...").

  2. CODITECT proxy pattern (/agent command): Routes through subagent_type="general-purpose" via invoke-agent.py, injecting the full agent system prompt into the prompt parameter.

Both mechanisms coexist. The proxy pattern was designed for flexibility (dynamic prompt injection, agent chaining) but does not reduce the Task tool schema cost -- all 770 agent descriptions are still loaded into the schema regardless of which dispatch mechanism is used.

1.4 Growth Trajectory

DateAgentsSkillsTotal Components
Dec 2025~118~221~1,200
Jan 2026~210~376~2,100
Feb 20267704313,392
Projected Jun 2026~1,200~600~5,000+

Growth rate: ~3.5x per quarter. At this rate, agent descriptions alone will consume 75k+ tokens by Q3 2026 -- exceeding the entire system prompt budget.

1.5 Impact

Direct Costs (Per Customer):

  • Every API call pays for ~25.6k tokens of agent descriptions (input cost)
  • With Claude Opus at $15/M input tokens: $0.384 per 1,000 API calls just for agent descriptions
  • At ~200 API calls/session: $0.077/session wasted on unused agent descriptions
  • At 100 customers x 5 sessions/day: $38.50/day wasted = $14,000/year

Indirect Costs:

  • Reduced context window for actual work (code, conversation history, RAG retrieval)
  • Increased latency (more tokens to process per request)
  • Tool selection accuracy degrades catastrophically as tool count increases (see Section 2.5)
  • Agents compete for attention in a crowded prompt

Product Quality Impact:

  • Customers experience slower responses
  • Agent selection becomes unreliable past ~100 agents
  • Context window available for user work shrinks from 87% to 63% (projected Q3 2026)
  • Competitive disadvantage vs. tools that use progressive disclosure

1.6 Root Cause

The Claude Code Task tool definition embeds ALL available subagent_type values and their descriptions directly in the function schema. This is loaded into every API call regardless of which agents are actually needed.

Task tool schema:
subagent_type: enum[
"Explore": "Fast agent for exploring codebases...",
"Plan": "Software architect agent...",
"backend-architect": "You are a Backend Architect...",
... (770 entries)
]

This "load everything upfront" pattern doesn't scale past ~50-100 agents. The .claude/agents/ directory provides no mechanism for selective loading, tiering, or lazy discovery.


2. Literature Review

2.1 Academic Research

ToolLLM (ICLR 2024 Spotlight)

  • Addresses tool selection across 16,464 real-world APIs
  • Introduces Neural API Retriever: recommends APIs semantically rather than listing all
  • DFSDT (Depth-First Search Decision Tree) for multi-step reasoning
  • Key insight: pre-filtering via retrieval is essential at scale
  • Source: https://arxiv.org/abs/2307.16789

AutoTool: Efficient Tool Selection (2025)

  • Execution-driven validation for tool selection accuracy
  • Demonstrates that retrieval-based selection outperforms exhaustive listing
  • Source: https://arxiv.org/html/2511.14650v1

MasRouter: Multi-Agent LLM Routing (ACL 2025)

TokenOps: Compiler-Style Token Optimization (2025)

SUPO: Summarization-Based Context Management (2025)

2.2 Industry Case Studies

Speakeasy: 160x Token Reduction (2025)

Anthropic: Token-Efficient Tool Use (2025)

SynapticLabs: Meta-Tool Pattern (2025)

MCPJam: Progressive Disclosure (2025)

  • Three-level architecture: metadata -> core content -> detailed resources
  • Agents discover context incrementally through exploration
  • Tradeoff: runtime performance vs. context window savings
  • Source: https://www.mcpjam.com/blog/claude-agent-skills

2.3 Semantic Routing Research

vLLM Semantic Router (2025)

Semantic Tool Discovery (2025)

2.4 Competitive Tool Scaling Strategies

GitHub Copilot: Core Tool Reduction

  • Explicitly reduced from 40+ tools to 13 core tools + 4 grouped categories
  • Embedding-guided routing for tool discovery beyond core set
  • Rationale: smaller core set improves selection accuracy dramatically
  • Proven at massive scale (millions of users)

Cursor: RAG-Based Tool Discovery

  • Uses retrieval-augmented generation for tool/context selection
  • Tools loaded dynamically based on semantic relevance to current task
  • No upfront loading of full tool catalog

Windsurf (Codeium): Auto-Context

  • Automatic context assembly based on active file and task intent
  • Tool set adapts per-request rather than per-session

Cline: All-Upfront Loading

  • Loads all tools into system prompt (similar to current CODITECT approach)
  • Works at small scale (<30 tools) but acknowledged as non-scalable

Continue.dev: MCP-Based Extension

  • Uses MCP protocol for tool discovery and dispatch
  • Progressive loading via MCP server negotiation

Key Pattern: GitHub's strategy of reducing to ~13 core tools is the most validated approach at scale. Their research showed that smaller, curated tool sets significantly outperform large catalogs for selection accuracy.

2.5 Tool Selection Accuracy Degradation

Research and benchmarks demonstrate a catastrophic accuracy cliff as tool count increases:

Tool CountSelection AccuracySource
10-2090-95%Berkeley BFCL v4
5084-90%Speakeasy data
10065-75%ToolLLM benchmarks
20040-60%AutoTool evaluation
500+15-30%Extrapolated from trend
740+0-20%Speakeasy empirical (405k tokens)

Critical finding: At CODITECT's current 770 agents, the model is operating in the catastrophic failure zone for tool selection. This is not a gradual degradation -- accuracy collapses once the tool count exceeds the model's effective attention span for function schemas.

Anthropic's own data confirms this: When they introduced defer_loading, Opus 4 accuracy jumped from 49% to 74% -- a +25% improvement simply from reducing the visible tool set. This directly validates the progressive disclosure approach.

2.6 Key Findings Summary

ApproachToken ReductionAccuracy ImpactComplexity
Meta-tool pattern96%Neutral to positiveMedium
Anthropic defer_loading85%+25% (Opus 4)Low
Progressive disclosure85-95%PositiveMedium
Semantic routing90-99%+5-10%High
Description compression40-46%NeutralLow
Core tool reduction (GitHub)87-95%Significant positiveLow

Consensus: The industry has converged on progressive disclosure + semantic routing as the standard for systems with 100+ tools. Upfront loading of all descriptions is universally recognized as an anti-pattern at scale. GitHub's proven approach of 13 core tools provides the strongest real-world validation.


3. CODITECT-Specific Analysis

3.1 Product Architecture Implications

CODITECT is a product delivered to customers, not merely an internal development tool. This fundamentally changes the analysis:

  • Every customer installation bears the 25.6k token overhead
  • Customer agent customization will increase component counts further
  • Multi-tenant scaling means N customers x 25.6k tokens = multiplicative cost
  • Product quality perception depends on agent selection accuracy
  • Competitive positioning requires comparable or better UX than GitHub Copilot, Cursor, etc.

The solution must be:

  1. Transparent to customers -- no configuration burden
  2. Extensible -- customers can add agents without degrading the system
  3. Tunable -- enterprise customers may want different core agent sets
  4. Backward compatible -- existing /agent workflows must continue working

3.2 Existing Infrastructure

CODITECT already has key building blocks:

ComponentLocationRelevance
MoE Classifierscripts/moe_classifier/classify.py13 Type Experts, 5 judges, semantic classification
Track Registryscripts/moe_classifier/track_registry.py38 tracks with keyword/pattern matching
Intelligent Track Mapperscripts/moe_classifier/intelligent_track_mapper.pySemantic track assignment
Agent Registrylib/orchestration/agent_registry.pyLLM-agnostic agent discovery by capability
Dynamic Capability Routerskills/dynamic-capability-router/SKILL.mdIntent-based agent routing pattern
Agent Validatorscripts/validate-agent-structure.pyFrontmatter validation with required fields
Component Indexerscripts/component-indexer.pySearchable component index
Framework Registryconfig/framework-registry.jsonAuto-generated component registry

3.3 Control Surface: .claude/agents/ Directory

The critical control surface is the .claude/agents/ directory (symlinked from coditect-core/agents/). Claude Code reads this directory at session start and populates the Task tool schema.

Available interventions:

  1. Reduce agent files in .claude/agents/ -- move non-core agents to a different directory
  2. Shorten descriptions -- compress frontmatter description fields
  3. Tiered directory structure -- only core agents in .claude/agents/, rest discoverable via MCP
  4. Agent manifest file -- if supported, declare which agents to load

Constraint: Claude Code's .claude/agents/ loader has no configuration for selective loading. All .md files in the directory are loaded. The only way to reduce loaded agents is to physically reduce the number of files in that directory.

3.4 Agent Type Distribution

Agent TypeCount%Token Load Pattern
enterprise-specialist~50065%Rarely invoked per-session
specialist~8010%Domain-specific, moderate use
orchestrator~203%Frequently invoked
reviewer~6<1%Session-dependent
generator~5<1%Task-specific
Core session agents~253%Used in most sessions

Key insight: Only ~25 agents (3%) are used in a typical session, yet all 770 (100%) are loaded every time.

3.5 Track-Based Clustering

Agents naturally cluster into the 38-track taxonomy:

Track ClusterAgent CountTypical Session Relevance
A: Backend API~15High (development)
B: Frontend UI~10Medium
C: DevOps Infra~25Medium
D-E: Security/Testing~20Medium
F: Documentation~15Low
G: DMS Product~20Low (deferred)
H: Framework~15High (meta-operations)
I: UI Components~15Medium
O-AA: PCF Business~100+Low (domain-specific)
Finance/HR/Sales~200+Very low
Remaining~200+Very low

4. Solution Architecture

A three-tier architecture that leverages existing CODITECT infrastructure:

Tier 0: Core Agents (Always Loaded via .claude/agents/)
~25 agents, ~800 tokens
Explore, Plan, Bash, general-purpose, senior-architect,
testing-specialist, code-reviewer, debugger, etc.

Tier 1: Track Index (Loaded on Demand via MCP)
38 track summaries, ~600 tokens when loaded
"Backend API: 15 agents for REST, GraphQL, database design"
Loaded when user invokes /which or semantic routing triggers

Tier 2: Agent Details (Loaded on Selection via MCP)
Individual agent full spec, ~50-200 tokens each
Loaded only when specific agent is dispatched

Token Budget:

  • Always loaded: ~800 tokens (Tier 0)
  • Worst case per-query: +600 (Tier 1) + 200 (Tier 2) = ~1,600 tokens
  • Total: ~1,600 tokens vs. current 25,600 tokens = 94% reduction

4.2 Concrete Implementation

Phase 1: Directory Restructuring (Week 1)

The .claude/agents/ directory must be physically reorganized:

coditect-core/
├── agents/ # Tier 0: ~25 core agents (loaded by Claude Code)
│ ├── senior-architect.md
│ ├── testing-specialist.md
│ ├── code-reviewer.md
│ └── ... (~25 files)
├── agents-extended/ # Tier 1-2: ~745 agents (NOT loaded by Claude Code)
│ ├── enterprise/ # PCF business agents
│ ├── specialist/ # Domain specialists
│ └── ... (organized by track)

Because .claude -> coditect-core, only files in coditect-core/agents/ are loaded into the Task schema. Files in agents-extended/ are invisible to Claude Code's auto-loader but accessible via MCP discovery tools.

Phase 1 Tasks:

  1. Create agents-extended/ directory structure
  2. Identify ~25 core agents (based on usage frequency, session universality)
  3. Move ~745 non-core agents from agents/ to agents-extended/
  4. Create config/core-agents.yaml manifest
  5. Update scripts/validate-agent-structure.py to scan both directories
  6. Update scripts/component-indexer.py to index both directories

Phase 2: MCP Discovery Tools (Weeks 2-3)

Add two MCP tools to the coditect-call-graph server (or new coditect-agent-router server):

# Tool 1: discover_agents
def discover_agents(query: str, track: str = None, limit: int = 5) -> list:
"""Search agent registry by keyword, semantic similarity, or track.
Returns: [{name, description, track, capabilities}]
"""

# Tool 2: get_agent_spec
def get_agent_spec(agent_name: str) -> dict:
"""Load full agent specification for dispatch.
Returns: {name, description, tools, model, domain, full_prompt}
"""

These tools use agent_registry.py and track_registry.py to find agents in agents-extended/, then return the spec for Claude to dispatch via the general-purpose proxy pattern.

Phase 3: Anthropic API Integration (When Stable)

  1. Adopt Anthropic's Tool Search Tool with defer_loading: true
  2. Supports 10,000+ tools natively
  3. Accuracy improvement from 49% to 74% on Opus 4

4.3 Alternatives Considered

AlternativeProsConsDecision
Prune agents to <100Simple, immediateLoses capability, doesn't scaleRejected
Compress descriptions onlyEasy, no architecture changeOnly 40-46% reduction, still doesn't scaleSupplement only
Full MCP agent serverMaximum flexibilityComplex, latency overhead, new infrastructureDeferred
Progressive disclosure94% reduction, leverages existing infraRequires directory restructure, MCP toolsSelected
Do nothingZero effort75k+ tokens by Q3 2026; catastrophic accuracyRejected

5. Risk Assessment

RiskProbabilityImpactMitigation
Agent selection accuracy dropsLowHighCore agents always loaded; MCP discovery maintains access
Extra round-trip latencyMediumLow<100ms for local index lookup; parallel with user thinking
Stale index after agent additionsLowMediumHook on agent file creation auto-updates index
Breaking change for existing workflowsLowHigh/agent command continues via proxy pattern; core agents unchanged
Customer confusionLowMediumTransparent -- discovery "just works"; no customer configuration needed
Agent accuracy improves so much it changes UXMediumPositiveMonitor and adjust core agent list based on data

6. Success Metrics

MetricCurrentTargetMethod
Agent description tokens in system prompt25,600<2,000Token counter hook
Agent selection accuracy~20-40% (estimated, 770 agents)>85%Benchmark suite
Time to first agent dispatch~0ms (preloaded)<200msLatency measurement
Context window available for work87%99%System prompt audit
Customer-reported agent misfiresUnknown<5% of dispatchesTelemetry

7. Open Research Gaps

7.1 Usage Analytics (Gap #2 -- Open)

No data on which agents are actually dispatched per session. Need telemetry to validate the ~25 core agent hypothesis.

Proposed: Add a PostToolUse hook that logs subagent_type to sessions.db for each Task tool invocation. Analyze 30 days of data to determine actual core agent set.

7.2 Customer Perspective (Gap #4 -- Open)

Multi-tenant implications not yet analyzed:

  • How do customer-specific agents interact with the core set?
  • Should the core agent list be configurable per customer/tier?
  • What's the impact on plugin/marketplace agent loading?

7.3 Impact Measurement (Gap #6 -- Open)

No baseline measurements for:

  • Actual agent selection accuracy at 770 agents
  • Latency impact of current token overhead
  • Dollar cost attribution per customer session

Proposed: Run a benchmark suite with 50 standardized agent selection tasks, measure accuracy at current 770 agents vs. 25 core agents.


8. References

Academic Papers

  1. Qin et al. "ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs" ICLR 2024. https://arxiv.org/abs/2307.16789
  2. Li et al. "MasRouter: Learning to Route LLMs for Multi-Agent System" ACL 2025. https://aclanthology.org/2025.acl-long.757.pdf
  3. "AutoTool: Efficient Tool Selection for Large Language Model Agents" 2025. https://arxiv.org/html/2511.14650v1
  4. "A Compiler-Style Architecture for Token Optimization in LLMs" 2025. https://www.chitrangana.com/wp-content/uploads/2025/04/Research-Paper-TokenOps.pdf
  5. "Scaling LLM Multi-turn RL with End-to-end Summarization" 2025. https://miaolu3.github.io/ArXiv_SUPO.pdf

Industry Sources

  1. Speakeasy. "100x Token Reduction with Dynamic Toolsets" 2025. https://www.speakeasy.com/blog/100x-token-reduction-dynamic-toolsets
  2. Anthropic. "Token-efficient tool use" 2025. https://docs.claude.com/en/docs/agents-and-tools/tool-use/token-efficient-tool-use
  3. SynapticLabs. "The Meta-Tool Pattern" 2025. https://blog.synapticlabs.ai/bounded-context-packs-meta-tool-pattern
  4. MCPJam. "Progressive Disclosure for Claude Agent Skills" 2025. https://www.mcpjam.com/blog/claude-agent-skills
  5. Anthropic. "Building Effective Agents" 2025. https://www.anthropic.com/research/building-effective-agents
  6. ZenML. "LLM Agents in Production" 2025. https://www.zenml.io/blog/llm-agents-in-production-architectures-challenges-and-best-practices
  7. Berkeley. "Function Calling Leaderboard V4" 2025. https://gorilla.cs.berkeley.edu/leaderboard.html

Competitive Intelligence

  1. GitHub Copilot: 13 core tools + 4 grouped categories with embedding-guided routing
  2. Cursor: RAG-based tool discovery, dynamic loading
  3. Windsurf (Codeium): Auto-context assembly, per-request tool adaptation

CODITECT Internal

  1. ADR-003: CODITECT Agent System
  2. ADR-026: Intent Classification System
  3. Dynamic Capability Router skill: skills/dynamic-capability-router/SKILL.md
  4. Agent Registry: lib/orchestration/agent_registry.py
  5. MoE Classifier: scripts/moe_classifier/classify.py

Author: Claude (Opus 4.6) Date: 2026-02-07 (Updated) Classification: Internal Reference