Component Token Scaling Analysis

Executive Summary

CODITECT is a product framework that provides AI-assisted development environments to customers. Each customer installation includes 3,392 components (766 agents, 431 skills, 364 commands, 108 hooks, 563 scripts, 1,152 workflows). When Claude Code loads agent descriptions into the Task tool's system prompt, the cumulative token cost is ~25.6k tokens -- exceeding the recommended 15k token budget by 70%.

This is a product architecture scaling problem, not merely an internal tooling concern. Every CODITECT customer bears these costs in every session. As the component inventory grows, this problem compounds across the entire customer base.

This document defines the problem space, surveys academic and industry solutions, presents measured data on the exact loading mechanism, and recommends a Progressive Component Disclosure architecture (ADR-162) that can achieve 85-96% token reduction while maintaining full component access.

1. Problem Definition

1.1 Current State (Measured)

Metric	Value	Source
Total components	3,392	`config/component-counts.json`
Agent definitions (`.claude/agents/*.md`)	770 files	Filesystem count
Agent file total size	4.88 MB (~452K tokens)	Measured
Agent descriptions in Task tool schema	~25.6k tokens	Estimated from ~33 tokens/description x 770
Skill files	432 files (5.75 MB, ~510K tokens)	Measured
Command files	365 files (2.87 MB, ~260K tokens)	Measured
CLAUDE.md system prompt	954 tokens	Measured
Tool definitions (built-in + MCP)	~15-35K tokens	Estimated
Recommended agent description budget	15k tokens	Anthropic guidance
Overage	+70% (10.6k tokens)
Context consumed before user input	~13% of 200k window

1.2 Loading Mechanism (Discovered)

Claude Code loads custom agents via the .claude/agents/ directory. In CODITECT, .claude is a symlink:

.claude -> .coditect -> submodules/core/coditect-core/

Therefore .claude/agents/ resolves to coditect-core/agents/ -- all 770 agent markdown files are loaded by Claude Code at session start. Claude Code extracts name + description from each file's frontmatter and embeds them as subagent_type enum values in the Task tool's function schema. The full markdown body is loaded only when an agent is dispatched.

Token breakdown per API call:

Component	Tokens	When Loaded
CLAUDE.md (system prompt)	~954	Every call
Agent descriptions (Task schema)	~25,600	Every call
Tool definitions (built-in + MCP)	~15,000-35,000	Every call
Agent full body	~589 avg	Only on dispatch
Total system overhead	~41,500-61,500	Every call

Key insight: The ~25.6K token agent description cost is embedded in the Task tool's JSON schema definition and is unavoidable -- it's paid on every API call regardless of whether the Task tool is invoked.

1.3 Dual Loading Mechanism

CODITECT has two parallel agent dispatch mechanisms:

Claude Code native (.claude/agents/): Agent name+description loaded into Task tool schema. Claude dispatches directly via Task(subagent_type="agent-name", prompt="...").
CODITECT proxy pattern (/agent command): Routes through subagent_type="general-purpose" via invoke-agent.py, injecting the full agent system prompt into the prompt parameter.

Both mechanisms coexist. The proxy pattern was designed for flexibility (dynamic prompt injection, agent chaining) but does not reduce the Task tool schema cost -- all 770 agent descriptions are still loaded into the schema regardless of which dispatch mechanism is used.

1.4 Growth Trajectory

Date	Agents	Skills	Total Components
Dec 2025	~118	~221	~1,200
Jan 2026	~210	~376	~2,100
Feb 2026	770	431	3,392
Projected Jun 2026	~1,200	~600	~5,000+

Growth rate: ~3.5x per quarter. At this rate, agent descriptions alone will consume 75k+ tokens by Q3 2026 -- exceeding the entire system prompt budget.

1.5 Impact

Direct Costs (Per Customer):

Every API call pays for ~25.6k tokens of agent descriptions (input cost)
With Claude Opus at $15/M input tokens: $0.384 per 1,000 API calls just for agent descriptions
At ~200 API calls/session: $0.077/session wasted on unused agent descriptions
At 100 customers x 5 sessions/day: $38.50/day wasted = $14,000/year

Indirect Costs:

Reduced context window for actual work (code, conversation history, RAG retrieval)
Increased latency (more tokens to process per request)
Tool selection accuracy degrades catastrophically as tool count increases (see Section 2.5)
Agents compete for attention in a crowded prompt

Product Quality Impact:

Customers experience slower responses
Agent selection becomes unreliable past ~100 agents
Context window available for user work shrinks from 87% to 63% (projected Q3 2026)
Competitive disadvantage vs. tools that use progressive disclosure

1.6 Root Cause

The Claude Code Task tool definition embeds ALL available subagent_type values and their descriptions directly in the function schema. This is loaded into every API call regardless of which agents are actually needed.

Task tool schema:
  subagent_type: enum[
    "Explore": "Fast agent for exploring codebases...",
    "Plan": "Software architect agent...",
    "backend-architect": "You are a Backend Architect...",
    ... (770 entries)
  ]

This "load everything upfront" pattern doesn't scale past ~50-100 agents. The .claude/agents/ directory provides no mechanism for selective loading, tiering, or lazy discovery.

2. Literature Review

2.1 Academic Research

ToolLLM (ICLR 2024 Spotlight)

Addresses tool selection across 16,464 real-world APIs
Introduces Neural API Retriever: recommends APIs semantically rather than listing all
DFSDT (Depth-First Search Decision Tree) for multi-step reasoning
Key insight: pre-filtering via retrieval is essential at scale
Source: https://arxiv.org/abs/2307.16789

AutoTool: Efficient Tool Selection (2025)

Execution-driven validation for tool selection accuracy
Demonstrates that retrieval-based selection outperforms exhaustive listing
Source: https://arxiv.org/html/2511.14650v1

MasRouter: Multi-Agent LLM Routing (ACL 2025)

Three-layer decision architecture: collaboration mode, role allocation, LLM routing
Cascaded controller network for real-time agent selection
Source: https://aclanthology.org/2025.acl-long.757.pdf

TokenOps: Compiler-Style Token Optimization (2025)

Achieves 40-46% token reduction without task fidelity loss
29-36% latency improvements
Source: https://www.chitrangana.com/wp-content/uploads/2025/04/Research-Paper-TokenOps.pdf

SUPO: Summarization-Based Context Management (2025)

LLM-generated summaries preserve task-relevant information
Periodically compress tool-use history to reset working context
Source: https://miaolu3.github.io/ArXiv_SUPO.pdf

2.2 Industry Case Studies

Speakeasy: 160x Token Reduction (2025)

Problem: 400 MCP tools consumed 405k tokens (exceeding 200k context)
Solution: Hierarchical meta-tool pattern with 3 meta-tools
Result: 96% input token reduction, 90% total reduction, 100% success rate
Source: https://www.speakeasy.com/blog/100x-token-reduction-dynamic-toolsets

Anthropic: Token-Efficient Tool Use (2025)

Official beta: token-efficient-tools-2025-02-19
defer_loading: true parameter for lazy tool schema loading
Tool Search Tool supports 10,000+ tools with deferred loading
Result: 85% token reduction, accuracy improvement from 49% to 74% (Opus 4, +25%)
Source: https://docs.claude.com/en/docs/agents-and-tools/tool-use/token-efficient-tool-use

SynapticLabs: Meta-Tool Pattern (2025)

Two registered tools provide access to many capabilities
Discovery tool + execution tool replaces all schemas at startup
Token overhead: ~500 tokens vs. 85-95% reduction
Source: https://blog.synapticlabs.ai/bounded-context-packs-meta-tool-pattern

MCPJam: Progressive Disclosure (2025)

Three-level architecture: metadata -> core content -> detailed resources
Agents discover context incrementally through exploration
Tradeoff: runtime performance vs. context window savings
Source: https://www.mcpjam.com/blog/claude-agent-skills

2.3 Semantic Routing Research

vLLM Semantic Router (2025)

Intent-aware routing using semantic vector space
Superfast decision layer before LLM invocation
Source: https://blog.vllm.ai/2025/09/11/semantic-router.html

Semantic Tool Discovery (2025)

Decompose tool schemas into granular components with separate embeddings
Each node (Tool, Parameter, ReturnType) has own vector
Enables nuanced matching across entire schema
Source: https://www.rconnect.tech/blog/semantic-tool-discovery

2.4 Competitive Tool Scaling Strategies

GitHub Copilot: Core Tool Reduction

Explicitly reduced from 40+ tools to 13 core tools + 4 grouped categories
Embedding-guided routing for tool discovery beyond core set
Rationale: smaller core set improves selection accuracy dramatically
Proven at massive scale (millions of users)

Cursor: RAG-Based Tool Discovery

Uses retrieval-augmented generation for tool/context selection
Tools loaded dynamically based on semantic relevance to current task
No upfront loading of full tool catalog

Windsurf (Codeium): Auto-Context

Automatic context assembly based on active file and task intent
Tool set adapts per-request rather than per-session

Cline: All-Upfront Loading

Loads all tools into system prompt (similar to current CODITECT approach)
Works at small scale (<30 tools) but acknowledged as non-scalable

Continue.dev: MCP-Based Extension

Uses MCP protocol for tool discovery and dispatch
Progressive loading via MCP server negotiation

Key Pattern: GitHub's strategy of reducing to ~13 core tools is the most validated approach at scale. Their research showed that smaller, curated tool sets significantly outperform large catalogs for selection accuracy.

2.5 Tool Selection Accuracy Degradation

Research and benchmarks demonstrate a catastrophic accuracy cliff as tool count increases:

Tool Count	Selection Accuracy	Source
10-20	90-95%	Berkeley BFCL v4
50	84-90%	Speakeasy data
100	65-75%	ToolLLM benchmarks
200	40-60%	AutoTool evaluation
500+	15-30%	Extrapolated from trend
740+	0-20%	Speakeasy empirical (405k tokens)

Critical finding: At CODITECT's current 770 agents, the model is operating in the catastrophic failure zone for tool selection. This is not a gradual degradation -- accuracy collapses once the tool count exceeds the model's effective attention span for function schemas.

Anthropic's own data confirms this: When they introduced defer_loading, Opus 4 accuracy jumped from 49% to 74% -- a +25% improvement simply from reducing the visible tool set. This directly validates the progressive disclosure approach.

2.6 Key Findings Summary

Approach	Token Reduction	Accuracy Impact	Complexity
Meta-tool pattern	96%	Neutral to positive	Medium
Anthropic defer_loading	85%	+25% (Opus 4)	Low
Progressive disclosure	85-95%	Positive	Medium
Semantic routing	90-99%	+5-10%	High
Description compression	40-46%	Neutral	Low
Core tool reduction (GitHub)	87-95%	Significant positive	Low

Consensus: The industry has converged on progressive disclosure + semantic routing as the standard for systems with 100+ tools. Upfront loading of all descriptions is universally recognized as an anti-pattern at scale. GitHub's proven approach of 13 core tools provides the strongest real-world validation.

3. CODITECT-Specific Analysis

3.1 Product Architecture Implications

CODITECT is a product delivered to customers, not merely an internal development tool. This fundamentally changes the analysis:

Every customer installation bears the 25.6k token overhead
Customer agent customization will increase component counts further
Multi-tenant scaling means N customers x 25.6k tokens = multiplicative cost
Product quality perception depends on agent selection accuracy
Competitive positioning requires comparable or better UX than GitHub Copilot, Cursor, etc.

The solution must be:

Transparent to customers -- no configuration burden
Extensible -- customers can add agents without degrading the system
Tunable -- enterprise customers may want different core agent sets
Backward compatible -- existing /agent workflows must continue working

3.2 Existing Infrastructure

CODITECT already has key building blocks:

Component	Location	Relevance
MoE Classifier	`scripts/moe_classifier/classify.py`	13 Type Experts, 5 judges, semantic classification
Track Registry	`scripts/moe_classifier/track_registry.py`	38 tracks with keyword/pattern matching
Intelligent Track Mapper	`scripts/moe_classifier/intelligent_track_mapper.py`	Semantic track assignment
Agent Registry	`lib/orchestration/agent_registry.py`	LLM-agnostic agent discovery by capability
Dynamic Capability Router	`skills/dynamic-capability-router/SKILL.md`	Intent-based agent routing pattern
Agent Validator	`scripts/validate-agent-structure.py`	Frontmatter validation with required fields
Component Indexer	`scripts/component-indexer.py`	Searchable component index
Framework Registry	`config/framework-registry.json`	Auto-generated component registry

3.3 Control Surface: `.claude/agents/` Directory

The critical control surface is the .claude/agents/ directory (symlinked from coditect-core/agents/). Claude Code reads this directory at session start and populates the Task tool schema.

Available interventions:

Reduce agent files in .claude/agents/ -- move non-core agents to a different directory
Shorten descriptions -- compress frontmatter description fields
Tiered directory structure -- only core agents in .claude/agents/, rest discoverable via MCP
Agent manifest file -- if supported, declare which agents to load

Constraint: Claude Code's .claude/agents/ loader has no configuration for selective loading. All .md files in the directory are loaded. The only way to reduce loaded agents is to physically reduce the number of files in that directory.

3.4 Agent Type Distribution

Agent Type	Count	%	Token Load Pattern
enterprise-specialist	~500	65%	Rarely invoked per-session
specialist	~80	10%	Domain-specific, moderate use
orchestrator	~20	3%	Frequently invoked
reviewer	~6	<1%	Session-dependent
generator	~5	<1%	Task-specific
Core session agents	~25	3%	Used in most sessions

Key insight: Only ~25 agents (3%) are used in a typical session, yet all 770 (100%) are loaded every time.

3.5 Track-Based Clustering

Agents naturally cluster into the 38-track taxonomy:

Track Cluster	Agent Count	Typical Session Relevance
A: Backend API	~15	High (development)
B: Frontend UI	~10	Medium
C: DevOps Infra	~25	Medium
D-E: Security/Testing	~20	Medium
F: Documentation	~15	Low
G: DMS Product	~20	Low (deferred)
H: Framework	~15	High (meta-operations)
I: UI Components	~15	Medium
O-AA: PCF Business	~100+	Low (domain-specific)
Finance/HR/Sales	~200+	Very low
Remaining	~200+	Very low

4. Solution Architecture

4.1 Recommended: Progressive Component Disclosure (ADR-162)

A three-tier architecture that leverages existing CODITECT infrastructure:

Tier 0: Core Agents (Always Loaded via .claude/agents/)
  ~25 agents, ~800 tokens
  Explore, Plan, Bash, general-purpose, senior-architect,
  testing-specialist, code-reviewer, debugger, etc.

Tier 1: Track Index (Loaded on Demand via MCP)
  38 track summaries, ~600 tokens when loaded
  "Backend API: 15 agents for REST, GraphQL, database design"
  Loaded when user invokes /which or semantic routing triggers

Tier 2: Agent Details (Loaded on Selection via MCP)
  Individual agent full spec, ~50-200 tokens each
  Loaded only when specific agent is dispatched

Token Budget:

Always loaded: ~800 tokens (Tier 0)
Worst case per-query: +600 (Tier 1) + 200 (Tier 2) = ~1,600 tokens
Total: ~1,600 tokens vs. current 25,600 tokens = 94% reduction

4.2 Concrete Implementation

Phase 1: Directory Restructuring (Week 1)

The .claude/agents/ directory must be physically reorganized:

coditect-core/
├── agents/               # Tier 0: ~25 core agents (loaded by Claude Code)
│   ├── senior-architect.md
│   ├── testing-specialist.md
│   ├── code-reviewer.md
│   └── ... (~25 files)
├── agents-extended/      # Tier 1-2: ~745 agents (NOT loaded by Claude Code)
│   ├── enterprise/       # PCF business agents
│   ├── specialist/       # Domain specialists
│   └── ... (organized by track)

Because .claude -> coditect-core, only files in coditect-core/agents/ are loaded into the Task schema. Files in agents-extended/ are invisible to Claude Code's auto-loader but accessible via MCP discovery tools.

Phase 1 Tasks:

Create agents-extended/ directory structure
Identify ~25 core agents (based on usage frequency, session universality)
Move ~745 non-core agents from agents/ to agents-extended/
Create config/core-agents.yaml manifest
Update scripts/validate-agent-structure.py to scan both directories
Update scripts/component-indexer.py to index both directories

Phase 2: MCP Discovery Tools (Weeks 2-3)

Add two MCP tools to the coditect-call-graph server (or new coditect-agent-router server):

# Tool 1: discover_agents
def discover_agents(query: str, track: str = None, limit: int = 5) -> list:
    """Search agent registry by keyword, semantic similarity, or track.
    Returns: [{name, description, track, capabilities}]
    """

# Tool 2: get_agent_spec
def get_agent_spec(agent_name: str) -> dict:
    """Load full agent specification for dispatch.
    Returns: {name, description, tools, model, domain, full_prompt}
    """

These tools use agent_registry.py and track_registry.py to find agents in agents-extended/, then return the spec for Claude to dispatch via the general-purpose proxy pattern.

Phase 3: Anthropic API Integration (When Stable)

Adopt Anthropic's Tool Search Tool with defer_loading: true
Supports 10,000+ tools natively
Accuracy improvement from 49% to 74% on Opus 4

4.3 Alternatives Considered

Alternative	Pros	Cons	Decision
Prune agents to <100	Simple, immediate	Loses capability, doesn't scale	Rejected
Compress descriptions only	Easy, no architecture change	Only 40-46% reduction, still doesn't scale	Supplement only
Full MCP agent server	Maximum flexibility	Complex, latency overhead, new infrastructure	Deferred
Progressive disclosure	94% reduction, leverages existing infra	Requires directory restructure, MCP tools	Selected
Do nothing	Zero effort	75k+ tokens by Q3 2026; catastrophic accuracy	Rejected

5. Risk Assessment

Risk	Probability	Impact	Mitigation
Agent selection accuracy drops	Low	High	Core agents always loaded; MCP discovery maintains access
Extra round-trip latency	Medium	Low	<100ms for local index lookup; parallel with user thinking
Stale index after agent additions	Low	Medium	Hook on agent file creation auto-updates index
Breaking change for existing workflows	Low	High	`/agent` command continues via proxy pattern; core agents unchanged
Customer confusion	Low	Medium	Transparent -- discovery "just works"; no customer configuration needed
Agent accuracy improves so much it changes UX	Medium	Positive	Monitor and adjust core agent list based on data

6. Success Metrics

Metric	Current	Target	Method
Agent description tokens in system prompt	25,600	<2,000	Token counter hook
Agent selection accuracy	~20-40% (estimated, 770 agents)	>85%	Benchmark suite
Time to first agent dispatch	~0ms (preloaded)	<200ms	Latency measurement
Context window available for work	87%	99%	System prompt audit
Customer-reported agent misfires	Unknown	<5% of dispatches	Telemetry

7. Open Research Gaps

7.1 Usage Analytics (Gap #2 -- Open)

No data on which agents are actually dispatched per session. Need telemetry to validate the ~25 core agent hypothesis.

Proposed: Add a PostToolUse hook that logs subagent_type to sessions.db for each Task tool invocation. Analyze 30 days of data to determine actual core agent set.

7.2 Customer Perspective (Gap #4 -- Open)

Multi-tenant implications not yet analyzed:

How do customer-specific agents interact with the core set?
Should the core agent list be configurable per customer/tier?
What's the impact on plugin/marketplace agent loading?

7.3 Impact Measurement (Gap #6 -- Open)

No baseline measurements for:

Actual agent selection accuracy at 770 agents
Latency impact of current token overhead
Dollar cost attribution per customer session

Proposed: Run a benchmark suite with 50 standardized agent selection tasks, measure accuracy at current 770 agents vs. 25 core agents.

8. References

Academic Papers

Qin et al. "ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs" ICLR 2024. https://arxiv.org/abs/2307.16789
Li et al. "MasRouter: Learning to Route LLMs for Multi-Agent System" ACL 2025. https://aclanthology.org/2025.acl-long.757.pdf
"AutoTool: Efficient Tool Selection for Large Language Model Agents" 2025. https://arxiv.org/html/2511.14650v1
"A Compiler-Style Architecture for Token Optimization in LLMs" 2025. https://www.chitrangana.com/wp-content/uploads/2025/04/Research-Paper-TokenOps.pdf
"Scaling LLM Multi-turn RL with End-to-end Summarization" 2025. https://miaolu3.github.io/ArXiv_SUPO.pdf

Industry Sources

Speakeasy. "100x Token Reduction with Dynamic Toolsets" 2025. https://www.speakeasy.com/blog/100x-token-reduction-dynamic-toolsets
Anthropic. "Token-efficient tool use" 2025. https://docs.claude.com/en/docs/agents-and-tools/tool-use/token-efficient-tool-use
SynapticLabs. "The Meta-Tool Pattern" 2025. https://blog.synapticlabs.ai/bounded-context-packs-meta-tool-pattern
MCPJam. "Progressive Disclosure for Claude Agent Skills" 2025. https://www.mcpjam.com/blog/claude-agent-skills
Anthropic. "Building Effective Agents" 2025. https://www.anthropic.com/research/building-effective-agents
ZenML. "LLM Agents in Production" 2025. https://www.zenml.io/blog/llm-agents-in-production-architectures-challenges-and-best-practices
Berkeley. "Function Calling Leaderboard V4" 2025. https://gorilla.cs.berkeley.edu/leaderboard.html

Competitive Intelligence

GitHub Copilot: 13 core tools + 4 grouped categories with embedding-guided routing
Cursor: RAG-based tool discovery, dynamic loading
Windsurf (Codeium): Auto-context assembly, per-request tool adaptation

CODITECT Internal

ADR-003: CODITECT Agent System
ADR-026: Intent Classification System
Dynamic Capability Router skill: skills/dynamic-capability-router/SKILL.md
Agent Registry: lib/orchestration/agent_registry.py
MoE Classifier: scripts/moe_classifier/classify.py

Author: Claude (Opus 4.6) Date: 2026-02-07 (Updated) Classification: Internal Reference

Executive Summary​

1. Problem Definition​

1.1 Current State (Measured)​

1.2 Loading Mechanism (Discovered)​

1.3 Dual Loading Mechanism​

1.4 Growth Trajectory​

1.5 Impact​

1.6 Root Cause​

2. Literature Review​

2.1 Academic Research​

2.2 Industry Case Studies​

2.3 Semantic Routing Research​

2.4 Competitive Tool Scaling Strategies​

2.5 Tool Selection Accuracy Degradation​

2.6 Key Findings Summary​

3. CODITECT-Specific Analysis​

3.1 Product Architecture Implications​

3.2 Existing Infrastructure​

3.3 Control Surface: .claude/agents/ Directory​

3.4 Agent Type Distribution​

3.5 Track-Based Clustering​

4. Solution Architecture​

4.1 Recommended: Progressive Component Disclosure (ADR-162)​

4.2 Concrete Implementation​

4.3 Alternatives Considered​

5. Risk Assessment​

6. Success Metrics​

7. Open Research Gaps​

7.1 Usage Analytics (Gap #2 -- Open)​

7.2 Customer Perspective (Gap #4 -- Open)​

7.3 Impact Measurement (Gap #6 -- Open)​

8. References​

Academic Papers​

Industry Sources​

Competitive Intelligence​

CODITECT Internal​