Component Token Scaling Analysis
Executive Summary
CODITECT is a product framework that provides AI-assisted development environments to customers. Each customer installation includes 3,392 components (766 agents, 431 skills, 364 commands, 108 hooks, 563 scripts, 1,152 workflows). When Claude Code loads agent descriptions into the Task tool's system prompt, the cumulative token cost is ~25.6k tokens -- exceeding the recommended 15k token budget by 70%.
This is a product architecture scaling problem, not merely an internal tooling concern. Every CODITECT customer bears these costs in every session. As the component inventory grows, this problem compounds across the entire customer base.
This document defines the problem space, surveys academic and industry solutions, presents measured data on the exact loading mechanism, and recommends a Progressive Component Disclosure architecture (ADR-162) that can achieve 85-96% token reduction while maintaining full component access.
1. Problem Definition
1.1 Current State (Measured)
| Metric | Value | Source |
|---|---|---|
| Total components | 3,392 | config/component-counts.json |
Agent definitions (.claude/agents/*.md) | 770 files | Filesystem count |
| Agent file total size | 4.88 MB (~452K tokens) | Measured |
| Agent descriptions in Task tool schema | ~25.6k tokens | Estimated from ~33 tokens/description x 770 |
| Skill files | 432 files (5.75 MB, ~510K tokens) | Measured |
| Command files | 365 files (2.87 MB, ~260K tokens) | Measured |
| CLAUDE.md system prompt | 954 tokens | Measured |
| Tool definitions (built-in + MCP) | ~15-35K tokens | Estimated |
| Recommended agent description budget | 15k tokens | Anthropic guidance |
| Overage | +70% (10.6k tokens) | |
| Context consumed before user input | ~13% of 200k window |
1.2 Loading Mechanism (Discovered)
Claude Code loads custom agents via the .claude/agents/ directory. In CODITECT, .claude is a symlink:
.claude -> .coditect -> submodules/core/coditect-core/
Therefore .claude/agents/ resolves to coditect-core/agents/ -- all 770 agent markdown files are loaded by Claude Code at session start. Claude Code extracts name + description from each file's frontmatter and embeds them as subagent_type enum values in the Task tool's function schema. The full markdown body is loaded only when an agent is dispatched.
Token breakdown per API call:
| Component | Tokens | When Loaded |
|---|---|---|
| CLAUDE.md (system prompt) | ~954 | Every call |
| Agent descriptions (Task schema) | ~25,600 | Every call |
| Tool definitions (built-in + MCP) | ~15,000-35,000 | Every call |
| Agent full body | ~589 avg | Only on dispatch |
| Total system overhead | ~41,500-61,500 | Every call |
Key insight: The ~25.6K token agent description cost is embedded in the Task tool's JSON schema definition and is unavoidable -- it's paid on every API call regardless of whether the Task tool is invoked.
1.3 Dual Loading Mechanism
CODITECT has two parallel agent dispatch mechanisms:
-
Claude Code native (
.claude/agents/): Agent name+description loaded into Task tool schema. Claude dispatches directly viaTask(subagent_type="agent-name", prompt="..."). -
CODITECT proxy pattern (
/agentcommand): Routes throughsubagent_type="general-purpose"viainvoke-agent.py, injecting the full agent system prompt into the prompt parameter.
Both mechanisms coexist. The proxy pattern was designed for flexibility (dynamic prompt injection, agent chaining) but does not reduce the Task tool schema cost -- all 770 agent descriptions are still loaded into the schema regardless of which dispatch mechanism is used.
1.4 Growth Trajectory
| Date | Agents | Skills | Total Components |
|---|---|---|---|
| Dec 2025 | ~118 | ~221 | ~1,200 |
| Jan 2026 | ~210 | ~376 | ~2,100 |
| Feb 2026 | 770 | 431 | 3,392 |
| Projected Jun 2026 | ~1,200 | ~600 | ~5,000+ |
Growth rate: ~3.5x per quarter. At this rate, agent descriptions alone will consume 75k+ tokens by Q3 2026 -- exceeding the entire system prompt budget.
1.5 Impact
Direct Costs (Per Customer):
- Every API call pays for ~25.6k tokens of agent descriptions (input cost)
- With Claude Opus at $15/M input tokens: $0.384 per 1,000 API calls just for agent descriptions
- At ~200 API calls/session: $0.077/session wasted on unused agent descriptions
- At 100 customers x 5 sessions/day: $38.50/day wasted = $14,000/year
Indirect Costs:
- Reduced context window for actual work (code, conversation history, RAG retrieval)
- Increased latency (more tokens to process per request)
- Tool selection accuracy degrades catastrophically as tool count increases (see Section 2.5)
- Agents compete for attention in a crowded prompt
Product Quality Impact:
- Customers experience slower responses
- Agent selection becomes unreliable past ~100 agents
- Context window available for user work shrinks from 87% to 63% (projected Q3 2026)
- Competitive disadvantage vs. tools that use progressive disclosure
1.6 Root Cause
The Claude Code Task tool definition embeds ALL available subagent_type values and their descriptions directly in the function schema. This is loaded into every API call regardless of which agents are actually needed.
Task tool schema:
subagent_type: enum[
"Explore": "Fast agent for exploring codebases...",
"Plan": "Software architect agent...",
"backend-architect": "You are a Backend Architect...",
... (770 entries)
]
This "load everything upfront" pattern doesn't scale past ~50-100 agents. The .claude/agents/ directory provides no mechanism for selective loading, tiering, or lazy discovery.
2. Literature Review
2.1 Academic Research
ToolLLM (ICLR 2024 Spotlight)
- Addresses tool selection across 16,464 real-world APIs
- Introduces Neural API Retriever: recommends APIs semantically rather than listing all
- DFSDT (Depth-First Search Decision Tree) for multi-step reasoning
- Key insight: pre-filtering via retrieval is essential at scale
- Source: https://arxiv.org/abs/2307.16789
AutoTool: Efficient Tool Selection (2025)
- Execution-driven validation for tool selection accuracy
- Demonstrates that retrieval-based selection outperforms exhaustive listing
- Source: https://arxiv.org/html/2511.14650v1
MasRouter: Multi-Agent LLM Routing (ACL 2025)
- Three-layer decision architecture: collaboration mode, role allocation, LLM routing
- Cascaded controller network for real-time agent selection
- Source: https://aclanthology.org/2025.acl-long.757.pdf
TokenOps: Compiler-Style Token Optimization (2025)
- Achieves 40-46% token reduction without task fidelity loss
- 29-36% latency improvements
- Source: https://www.chitrangana.com/wp-content/uploads/2025/04/Research-Paper-TokenOps.pdf
SUPO: Summarization-Based Context Management (2025)
- LLM-generated summaries preserve task-relevant information
- Periodically compress tool-use history to reset working context
- Source: https://miaolu3.github.io/ArXiv_SUPO.pdf
2.2 Industry Case Studies
Speakeasy: 160x Token Reduction (2025)
- Problem: 400 MCP tools consumed 405k tokens (exceeding 200k context)
- Solution: Hierarchical meta-tool pattern with 3 meta-tools
- Result: 96% input token reduction, 90% total reduction, 100% success rate
- Source: https://www.speakeasy.com/blog/100x-token-reduction-dynamic-toolsets
Anthropic: Token-Efficient Tool Use (2025)
- Official beta:
token-efficient-tools-2025-02-19 defer_loading: trueparameter for lazy tool schema loading- Tool Search Tool supports 10,000+ tools with deferred loading
- Result: 85% token reduction, accuracy improvement from 49% to 74% (Opus 4, +25%)
- Source: https://docs.claude.com/en/docs/agents-and-tools/tool-use/token-efficient-tool-use
SynapticLabs: Meta-Tool Pattern (2025)
- Two registered tools provide access to many capabilities
- Discovery tool + execution tool replaces all schemas at startup
- Token overhead: ~500 tokens vs. 85-95% reduction
- Source: https://blog.synapticlabs.ai/bounded-context-packs-meta-tool-pattern
MCPJam: Progressive Disclosure (2025)
- Three-level architecture: metadata -> core content -> detailed resources
- Agents discover context incrementally through exploration
- Tradeoff: runtime performance vs. context window savings
- Source: https://www.mcpjam.com/blog/claude-agent-skills
2.3 Semantic Routing Research
vLLM Semantic Router (2025)
- Intent-aware routing using semantic vector space
- Superfast decision layer before LLM invocation
- Source: https://blog.vllm.ai/2025/09/11/semantic-router.html
Semantic Tool Discovery (2025)
- Decompose tool schemas into granular components with separate embeddings
- Each node (Tool, Parameter, ReturnType) has own vector
- Enables nuanced matching across entire schema
- Source: https://www.rconnect.tech/blog/semantic-tool-discovery
2.4 Competitive Tool Scaling Strategies
GitHub Copilot: Core Tool Reduction
- Explicitly reduced from 40+ tools to 13 core tools + 4 grouped categories
- Embedding-guided routing for tool discovery beyond core set
- Rationale: smaller core set improves selection accuracy dramatically
- Proven at massive scale (millions of users)
Cursor: RAG-Based Tool Discovery
- Uses retrieval-augmented generation for tool/context selection
- Tools loaded dynamically based on semantic relevance to current task
- No upfront loading of full tool catalog
Windsurf (Codeium): Auto-Context
- Automatic context assembly based on active file and task intent
- Tool set adapts per-request rather than per-session
Cline: All-Upfront Loading
- Loads all tools into system prompt (similar to current CODITECT approach)
- Works at small scale (<30 tools) but acknowledged as non-scalable
Continue.dev: MCP-Based Extension
- Uses MCP protocol for tool discovery and dispatch
- Progressive loading via MCP server negotiation
Key Pattern: GitHub's strategy of reducing to ~13 core tools is the most validated approach at scale. Their research showed that smaller, curated tool sets significantly outperform large catalogs for selection accuracy.
2.5 Tool Selection Accuracy Degradation
Research and benchmarks demonstrate a catastrophic accuracy cliff as tool count increases:
| Tool Count | Selection Accuracy | Source |
|---|---|---|
| 10-20 | 90-95% | Berkeley BFCL v4 |
| 50 | 84-90% | Speakeasy data |
| 100 | 65-75% | ToolLLM benchmarks |
| 200 | 40-60% | AutoTool evaluation |
| 500+ | 15-30% | Extrapolated from trend |
| 740+ | 0-20% | Speakeasy empirical (405k tokens) |
Critical finding: At CODITECT's current 770 agents, the model is operating in the catastrophic failure zone for tool selection. This is not a gradual degradation -- accuracy collapses once the tool count exceeds the model's effective attention span for function schemas.
Anthropic's own data confirms this: When they introduced defer_loading, Opus 4 accuracy jumped from 49% to 74% -- a +25% improvement simply from reducing the visible tool set. This directly validates the progressive disclosure approach.
2.6 Key Findings Summary
| Approach | Token Reduction | Accuracy Impact | Complexity |
|---|---|---|---|
| Meta-tool pattern | 96% | Neutral to positive | Medium |
| Anthropic defer_loading | 85% | +25% (Opus 4) | Low |
| Progressive disclosure | 85-95% | Positive | Medium |
| Semantic routing | 90-99% | +5-10% | High |
| Description compression | 40-46% | Neutral | Low |
| Core tool reduction (GitHub) | 87-95% | Significant positive | Low |
Consensus: The industry has converged on progressive disclosure + semantic routing as the standard for systems with 100+ tools. Upfront loading of all descriptions is universally recognized as an anti-pattern at scale. GitHub's proven approach of 13 core tools provides the strongest real-world validation.
3. CODITECT-Specific Analysis
3.1 Product Architecture Implications
CODITECT is a product delivered to customers, not merely an internal development tool. This fundamentally changes the analysis:
- Every customer installation bears the 25.6k token overhead
- Customer agent customization will increase component counts further
- Multi-tenant scaling means N customers x 25.6k tokens = multiplicative cost
- Product quality perception depends on agent selection accuracy
- Competitive positioning requires comparable or better UX than GitHub Copilot, Cursor, etc.
The solution must be:
- Transparent to customers -- no configuration burden
- Extensible -- customers can add agents without degrading the system
- Tunable -- enterprise customers may want different core agent sets
- Backward compatible -- existing
/agentworkflows must continue working
3.2 Existing Infrastructure
CODITECT already has key building blocks:
| Component | Location | Relevance |
|---|---|---|
| MoE Classifier | scripts/moe_classifier/classify.py | 13 Type Experts, 5 judges, semantic classification |
| Track Registry | scripts/moe_classifier/track_registry.py | 38 tracks with keyword/pattern matching |
| Intelligent Track Mapper | scripts/moe_classifier/intelligent_track_mapper.py | Semantic track assignment |
| Agent Registry | lib/orchestration/agent_registry.py | LLM-agnostic agent discovery by capability |
| Dynamic Capability Router | skills/dynamic-capability-router/SKILL.md | Intent-based agent routing pattern |
| Agent Validator | scripts/validate-agent-structure.py | Frontmatter validation with required fields |
| Component Indexer | scripts/component-indexer.py | Searchable component index |
| Framework Registry | config/framework-registry.json | Auto-generated component registry |
3.3 Control Surface: .claude/agents/ Directory
The critical control surface is the .claude/agents/ directory (symlinked from coditect-core/agents/). Claude Code reads this directory at session start and populates the Task tool schema.
Available interventions:
- Reduce agent files in
.claude/agents/-- move non-core agents to a different directory - Shorten descriptions -- compress frontmatter
descriptionfields - Tiered directory structure -- only core agents in
.claude/agents/, rest discoverable via MCP - Agent manifest file -- if supported, declare which agents to load
Constraint: Claude Code's .claude/agents/ loader has no configuration for selective loading. All .md files in the directory are loaded. The only way to reduce loaded agents is to physically reduce the number of files in that directory.
3.4 Agent Type Distribution
| Agent Type | Count | % | Token Load Pattern |
|---|---|---|---|
| enterprise-specialist | ~500 | 65% | Rarely invoked per-session |
| specialist | ~80 | 10% | Domain-specific, moderate use |
| orchestrator | ~20 | 3% | Frequently invoked |
| reviewer | ~6 | <1% | Session-dependent |
| generator | ~5 | <1% | Task-specific |
| Core session agents | ~25 | 3% | Used in most sessions |
Key insight: Only ~25 agents (3%) are used in a typical session, yet all 770 (100%) are loaded every time.
3.5 Track-Based Clustering
Agents naturally cluster into the 38-track taxonomy:
| Track Cluster | Agent Count | Typical Session Relevance |
|---|---|---|
| A: Backend API | ~15 | High (development) |
| B: Frontend UI | ~10 | Medium |
| C: DevOps Infra | ~25 | Medium |
| D-E: Security/Testing | ~20 | Medium |
| F: Documentation | ~15 | Low |
| G: DMS Product | ~20 | Low (deferred) |
| H: Framework | ~15 | High (meta-operations) |
| I: UI Components | ~15 | Medium |
| O-AA: PCF Business | ~100+ | Low (domain-specific) |
| Finance/HR/Sales | ~200+ | Very low |
| Remaining | ~200+ | Very low |
4. Solution Architecture
4.1 Recommended: Progressive Component Disclosure (ADR-162)
A three-tier architecture that leverages existing CODITECT infrastructure:
Tier 0: Core Agents (Always Loaded via .claude/agents/)
~25 agents, ~800 tokens
Explore, Plan, Bash, general-purpose, senior-architect,
testing-specialist, code-reviewer, debugger, etc.
Tier 1: Track Index (Loaded on Demand via MCP)
38 track summaries, ~600 tokens when loaded
"Backend API: 15 agents for REST, GraphQL, database design"
Loaded when user invokes /which or semantic routing triggers
Tier 2: Agent Details (Loaded on Selection via MCP)
Individual agent full spec, ~50-200 tokens each
Loaded only when specific agent is dispatched
Token Budget:
- Always loaded: ~800 tokens (Tier 0)
- Worst case per-query: +600 (Tier 1) + 200 (Tier 2) = ~1,600 tokens
- Total: ~1,600 tokens vs. current 25,600 tokens = 94% reduction
4.2 Concrete Implementation
Phase 1: Directory Restructuring (Week 1)
The .claude/agents/ directory must be physically reorganized:
coditect-core/
├── agents/ # Tier 0: ~25 core agents (loaded by Claude Code)
│ ├── senior-architect.md
│ ├── testing-specialist.md
│ ├── code-reviewer.md
│ └── ... (~25 files)
├── agents-extended/ # Tier 1-2: ~745 agents (NOT loaded by Claude Code)
│ ├── enterprise/ # PCF business agents
│ ├── specialist/ # Domain specialists
│ └── ... (organized by track)
Because .claude -> coditect-core, only files in coditect-core/agents/ are loaded into the Task schema. Files in agents-extended/ are invisible to Claude Code's auto-loader but accessible via MCP discovery tools.
Phase 1 Tasks:
- Create
agents-extended/directory structure - Identify ~25 core agents (based on usage frequency, session universality)
- Move ~745 non-core agents from
agents/toagents-extended/ - Create
config/core-agents.yamlmanifest - Update
scripts/validate-agent-structure.pyto scan both directories - Update
scripts/component-indexer.pyto index both directories
Phase 2: MCP Discovery Tools (Weeks 2-3)
Add two MCP tools to the coditect-call-graph server (or new coditect-agent-router server):
# Tool 1: discover_agents
def discover_agents(query: str, track: str = None, limit: int = 5) -> list:
"""Search agent registry by keyword, semantic similarity, or track.
Returns: [{name, description, track, capabilities}]
"""
# Tool 2: get_agent_spec
def get_agent_spec(agent_name: str) -> dict:
"""Load full agent specification for dispatch.
Returns: {name, description, tools, model, domain, full_prompt}
"""
These tools use agent_registry.py and track_registry.py to find agents in agents-extended/, then return the spec for Claude to dispatch via the general-purpose proxy pattern.
Phase 3: Anthropic API Integration (When Stable)
- Adopt Anthropic's Tool Search Tool with
defer_loading: true - Supports 10,000+ tools natively
- Accuracy improvement from 49% to 74% on Opus 4
4.3 Alternatives Considered
| Alternative | Pros | Cons | Decision |
|---|---|---|---|
| Prune agents to <100 | Simple, immediate | Loses capability, doesn't scale | Rejected |
| Compress descriptions only | Easy, no architecture change | Only 40-46% reduction, still doesn't scale | Supplement only |
| Full MCP agent server | Maximum flexibility | Complex, latency overhead, new infrastructure | Deferred |
| Progressive disclosure | 94% reduction, leverages existing infra | Requires directory restructure, MCP tools | Selected |
| Do nothing | Zero effort | 75k+ tokens by Q3 2026; catastrophic accuracy | Rejected |
5. Risk Assessment
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Agent selection accuracy drops | Low | High | Core agents always loaded; MCP discovery maintains access |
| Extra round-trip latency | Medium | Low | <100ms for local index lookup; parallel with user thinking |
| Stale index after agent additions | Low | Medium | Hook on agent file creation auto-updates index |
| Breaking change for existing workflows | Low | High | /agent command continues via proxy pattern; core agents unchanged |
| Customer confusion | Low | Medium | Transparent -- discovery "just works"; no customer configuration needed |
| Agent accuracy improves so much it changes UX | Medium | Positive | Monitor and adjust core agent list based on data |
6. Success Metrics
| Metric | Current | Target | Method |
|---|---|---|---|
| Agent description tokens in system prompt | 25,600 | <2,000 | Token counter hook |
| Agent selection accuracy | ~20-40% (estimated, 770 agents) | >85% | Benchmark suite |
| Time to first agent dispatch | ~0ms (preloaded) | <200ms | Latency measurement |
| Context window available for work | 87% | 99% | System prompt audit |
| Customer-reported agent misfires | Unknown | <5% of dispatches | Telemetry |
7. Open Research Gaps
7.1 Usage Analytics (Gap #2 -- Open)
No data on which agents are actually dispatched per session. Need telemetry to validate the ~25 core agent hypothesis.
Proposed: Add a PostToolUse hook that logs subagent_type to sessions.db for each Task tool invocation. Analyze 30 days of data to determine actual core agent set.
7.2 Customer Perspective (Gap #4 -- Open)
Multi-tenant implications not yet analyzed:
- How do customer-specific agents interact with the core set?
- Should the core agent list be configurable per customer/tier?
- What's the impact on plugin/marketplace agent loading?
7.3 Impact Measurement (Gap #6 -- Open)
No baseline measurements for:
- Actual agent selection accuracy at 770 agents
- Latency impact of current token overhead
- Dollar cost attribution per customer session
Proposed: Run a benchmark suite with 50 standardized agent selection tasks, measure accuracy at current 770 agents vs. 25 core agents.
8. References
Academic Papers
- Qin et al. "ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs" ICLR 2024. https://arxiv.org/abs/2307.16789
- Li et al. "MasRouter: Learning to Route LLMs for Multi-Agent System" ACL 2025. https://aclanthology.org/2025.acl-long.757.pdf
- "AutoTool: Efficient Tool Selection for Large Language Model Agents" 2025. https://arxiv.org/html/2511.14650v1
- "A Compiler-Style Architecture for Token Optimization in LLMs" 2025. https://www.chitrangana.com/wp-content/uploads/2025/04/Research-Paper-TokenOps.pdf
- "Scaling LLM Multi-turn RL with End-to-end Summarization" 2025. https://miaolu3.github.io/ArXiv_SUPO.pdf
Industry Sources
- Speakeasy. "100x Token Reduction with Dynamic Toolsets" 2025. https://www.speakeasy.com/blog/100x-token-reduction-dynamic-toolsets
- Anthropic. "Token-efficient tool use" 2025. https://docs.claude.com/en/docs/agents-and-tools/tool-use/token-efficient-tool-use
- SynapticLabs. "The Meta-Tool Pattern" 2025. https://blog.synapticlabs.ai/bounded-context-packs-meta-tool-pattern
- MCPJam. "Progressive Disclosure for Claude Agent Skills" 2025. https://www.mcpjam.com/blog/claude-agent-skills
- Anthropic. "Building Effective Agents" 2025. https://www.anthropic.com/research/building-effective-agents
- ZenML. "LLM Agents in Production" 2025. https://www.zenml.io/blog/llm-agents-in-production-architectures-challenges-and-best-practices
- Berkeley. "Function Calling Leaderboard V4" 2025. https://gorilla.cs.berkeley.edu/leaderboard.html
Competitive Intelligence
- GitHub Copilot: 13 core tools + 4 grouped categories with embedding-guided routing
- Cursor: RAG-based tool discovery, dynamic loading
- Windsurf (Codeium): Auto-context assembly, per-request tool adaptation
CODITECT Internal
- ADR-003: CODITECT Agent System
- ADR-026: Intent Classification System
- Dynamic Capability Router skill:
skills/dynamic-capability-router/SKILL.md - Agent Registry:
lib/orchestration/agent_registry.py - MoE Classifier:
scripts/moe_classifier/classify.py
Author: Claude (Opus 4.6) Date: 2026-02-07 (Updated) Classification: Internal Reference