ADR-162: Progressive Component Disclosure Architecture
Status
PROPOSED (2026-02-07)
Context
The Product Problem
CODITECT is a product framework delivered to customers. Each installation includes 3,392 components (770 agents, 431 skills, 364 commands). Claude Code loads all agent descriptions into the Task tool's subagent_type schema on every API call, consuming ~25.6k tokens -- 70% above the recommended 15k budget.
This is a product quality and scalability issue:
- Every customer session pays the full 25.6k token overhead
- Agent selection accuracy degrades catastrophically at 770 agents (estimated 20-40% accuracy vs. 90%+ at 25 agents)
- Context window available for customer work is reduced by 13% (growing to 37% by Q3 2026)
- Competitive disadvantage vs. GitHub Copilot (13 core tools), Cursor (RAG-based), and other tools using progressive disclosure
Loading Mechanism
Claude Code loads custom agents from the .claude/agents/ directory. In CODITECT:
.claude -> .coditect -> submodules/core/coditect-core/
All 770 agent .md files in coditect-core/agents/ are loaded at session start. Claude Code extracts name + description from frontmatter into the Task tool's function schema (~33 tokens/agent x 770 = ~25.6k tokens). The full agent body (~589 tokens avg) is loaded only on dispatch.
There is no configuration mechanism for selective loading. All .md files in .claude/agents/ are loaded unconditionally.
Dual Dispatch Mechanisms
CODITECT has two parallel agent dispatch paths:
- Claude Code native:
Task(subagent_type="agent-name", prompt="...")-- directly from Task schema - CODITECT proxy:
Task(subagent_type="general-purpose", prompt="...")viainvoke-agent.py-- injects full agent system prompt
Both coexist. The proxy pattern provides flexibility but does not reduce schema cost.
Impact Quantified
| Metric | Current | At Scale (Q3 2026) |
|---|---|---|
| Agent description tokens | 25,600 | 75,000+ |
| Context consumed (of 200k) | 13% | 37% |
| Agent selection accuracy | ~20-40% (estimated) | <15% |
| Wasted cost per 1,000 API calls | $0.38 | $1.13 |
| Session cost overhead (~200 calls) | $0.08 | $0.23 |
| At 100 customers x 5 sessions/day | $38.50/day | $115/day |
Evidence
Accuracy degradation is catastrophic, not gradual:
| Tool Count | Selection Accuracy | Source |
|---|---|---|
| 10-20 | 90-95% | Berkeley BFCL v4 |
| 50 | 84-90% | Speakeasy |
| 100 | 65-75% | ToolLLM |
| 770 (CODITECT) | ~20-40% | Extrapolated |
Anthropic's own defer_loading beta improved Opus 4 accuracy from 49% to 74% (+25%) by reducing the visible tool set. GitHub explicitly reduced from 40+ to 13 core tools for better performance at scale.
Industry Consensus
Comprehensive research (30+ sources, 5 academic papers, 8 industry case studies) confirms upfront loading of all tool descriptions is an anti-pattern at scale. Progressive disclosure + semantic routing is the industry standard for systems with 100+ tools.
Full analysis: internal/analysis/component-scaling/component-token-scaling-analysis.md
Decision
Adopt a three-tier Progressive Component Disclosure architecture that reduces agent description tokens from ~25,600 to ~800 (97% reduction) while maintaining full access to all 770+ agents through MCP-based discovery.
Architecture
Tier 0: Core Agents (Always in .claude/agents/)
~25 agents | ~800 tokens | Zero latency
Built-in: Explore, Plan, Bash, general-purpose
CODITECT: senior-architect, testing-specialist, code-reviewer,
debugger, devops-engineer, database-architect, etc.
Tier 1: Track Index (Loaded via MCP Tool on Discovery)
38 track summaries | ~600 tokens when queried | <50ms latency
"Track A - Backend API: 15 agents for REST, GraphQL, DB design"
Queried by: /which command, discover_agents MCP tool, or LLM request
Tier 2: Agent Spec (Loaded on Dispatch via MCP Tool)
Individual agent | ~50-200 tokens | <100ms latency
Full description, tools, model, capabilities
Loaded only when agent is about to be dispatched
Implementation
Phase 1: Directory Restructuring (Week 1)
1a. Reorganize coditect-core/ directory:
coditect-core/
├── agents/ # Tier 0: ~25 core agents (loaded by Claude Code)
│ ├── senior-architect.md
│ ├── testing-specialist.md
│ ├── code-reviewer.md
│ ├── debugger.md
│ ├── devops-engineer.md
│ ├── database-architect.md
│ └── ... (~25 files total)
├── agents-extended/ # Tier 1-2: ~745 agents (NOT auto-loaded)
│ ├── enterprise/ # PCF business agents (~500)
│ ├── specialist/ # Domain specialists (~80)
│ ├── orchestrator/ # Orchestrators (~20)
│ └── other/ # Remaining agents
Because .claude -> coditect-core, only files in coditect-core/agents/ are loaded into the Task schema. Files in agents-extended/ are invisible to Claude Code's auto-loader.
1b. Define core agent list (~25 agents) based on session universality:
# config/core-agents.yaml
core_agents:
builtin:
- Explore
- Plan
- Bash
- general-purpose
coditect:
- senior-architect
- testing-specialist
- code-reviewer
- debugger
- devops-engineer
- database-architect
- security-specialist
- backend-development
- documentation-generation
- ci-pipeline
- release-manager
- web-search-researcher
- code-documentation
- frontend-react-typescript-expert
- architect-review
- backend-api-security
- monitoring-specialist
- incident-responder
- data-engineering
- cloud-architect
1c. Move ~745 non-core agents from agents/ to agents-extended/.
1d. Update all scripts that scan agents/:
scripts/validate-agent-structure.py-- scan both directoriesscripts/component-indexer.py-- index both directoriesconfig/framework-registry.jsongeneration -- include bothscripts/moe_classifier/-- classify from both
1e. Verify /agent command continues to work -- it uses invoke-agent.py which reads agent files by name, so it needs updated path resolution to check agents-extended/ as fallback.
Phase 2: MCP Discovery Tools (Weeks 2-3)
2a. Add two MCP tools on the coditect-call-graph server (or new coditect-agent-router server):
# Tool 1: discover_agents
def discover_agents(query: str, track: str = None, limit: int = 5) -> list:
"""Search agent registry by keyword, semantic similarity, or track.
Searches both agents/ and agents-extended/ directories.
Returns: [{name, description, track, capabilities, tier}]
"""
# Tool 2: get_agent_spec
def get_agent_spec(agent_name: str) -> dict:
"""Load full agent specification for dispatch.
Returns: {name, description, tools, model, domain, full_prompt}
"""
2b. Discovery workflow:
- Claude recognizes it needs a specialized agent not in core set
- Claude calls
discover_agents(query="kubernetes troubleshooting") - MCP tool returns top-5 matching agents from
agents-extended/ - Claude calls
get_agent_spec("kubernetes-troubleshooter") - Claude dispatches via
Task(subagent_type="general-purpose", prompt="{full_prompt}")proxy pattern
2c. Enhance /which command to use discover_agents MCP tool.
Phase 3: Semantic Router (Weeks 3-4)
3a. Pre-compute embeddings for all agent descriptions:
- Input:
agents/*.mdandagents-extended/*.mdfrontmatter - Storage:
agent_embeddingstable inplatform.db - Model: sentence-transformers locally (fast, no API cost)
3b. Enhance discover_agents() with cosine similarity search.
Phase 4: Anthropic API Integration (When Stable)
4a. Adopt Anthropic's Tool Search Tool with defer_loading: true for core agents.
4b. This is additive -- combine with directory restructuring for maximum benefit.
Token Budget
| Tier | When Loaded | Tokens | Latency |
|---|---|---|---|
| Tier 0 (core agents in .claude/agents/) | Every API call | ~800 | 0ms |
| Tier 1 (track index via MCP) | On agent search | ~600 | <50ms |
| Tier 2 (agent spec via MCP) | On agent dispatch | ~200 | <100ms |
| Typical session | ~1,600 | <150ms | |
| Current | ~25,600 | 0ms |
Net savings: ~24,800 tokens/call (97% reduction for Tier 0 alone)
Leveraged Existing Infrastructure
| Component | How Used |
|---|---|
agent_registry.py | Capability-based discovery (AgentRegistry class) |
track_registry.py | 38-track taxonomy for Tier 1 index |
intelligent_track_mapper.py | Semantic keyword matching |
classify.py | MoE classification confidence scores |
validate-agent-structure.py | Ensure all agents have required frontmatter |
component-indexer.py | Searchable component index |
invoke-agent.py | Proxy dispatch pattern for non-core agents |
| Dynamic Capability Router skill | Intent-based routing decision tree |
Product Considerations
| Concern | Approach |
|---|---|
| Customer transparency | Core agents just work; discovery is seamless via MCP |
| Customer agent additions | Customers add to agents/ (core) or agents-extended/ (discoverable) |
| Enterprise configurability | config/core-agents.yaml is overridable per installation |
| Backward compatibility | /agent command unchanged; proxy pattern for extended agents |
| Multi-tenant scaling | Per-customer core-agents.yaml enables tenant-specific core sets |
| Plugin/marketplace agents | Plugins add to agents-extended/ with track metadata for discovery |
Consequences
Positive
- 97% token reduction in system prompt (25,600 -> ~800 for Tier 0)
- Dramatic accuracy improvement -- smaller tool set means correct agent selection (per Anthropic and GitHub data)
- Scales to 10,000+ agents without system prompt growth
- Reduced cost -- $0.36/1,000 calls saved at current scale; $14K/year at 100 customers
- Leverages existing infrastructure -- MoE, track registry, agent registry all reused
- Backward compatible -- core agents work identically; proxy pattern for extended agents
- Competitive parity -- matches GitHub Copilot's proven core-tool approach
Negative
- Extra round-trip for non-core agents -- 50-150ms latency for MCP discovery+load
- Directory restructuring -- one-time migration of 745 agent files; potential for broken references
- MCP dependency -- full agent access requires MCP server running
- Maintenance burden -- core agent list must be curated as usage patterns evolve
- Embedding computation -- one-time cost for Phase 3
Neutral
- Agent files (
agents/*.md,agents-extended/*.md) format unchanged - Existing
/agentcommand continues via proxy pattern /whichcommand enhanced but not broken- Component counts in
config/component-counts.jsonstill reflect all agents
Alternatives Considered
| Alternative | Token Reduction | Why Not Selected |
|---|---|---|
| Prune to <100 agents | 87% | Loses capability permanently; doesn't address growth |
| Compress descriptions only | 40-46% | Insufficient; still 14k+ tokens at current scale |
| Full MCP agent server | 99% | Over-engineered for Phase 1; adds infrastructure complexity |
| Semantic routing only | 95% | Requires embedding infra upfront; directory restructure is simpler first step |
| Do nothing | 0% | 75k+ tokens by Q3 2026; catastrophic accuracy degradation |
Compliance
- ADR-003 (Agent System): Extended, not replaced. Agent definition format unchanged. New
agents-extended/directory follows same conventions. - ADR-010 (Component Capability Schema): Leveraged for capability-based MCP discovery.
- ADR-026 (Intent Classification): Semantic router in Phase 3 builds on intent classification.
- ADR-052 (Intent-Aware Context): Progressive disclosure is context-aware by design.
Open Questions
- Usage analytics needed: Which ~25 agents should be in the core set? Need telemetry data from real sessions (proposed: PostToolUse hook logging
subagent_type). - Customer agent placement: Should customer-added agents go to
agents/(core) oragents-extended/(discoverable) by default? - Anthropic API timeline: When will
defer_loadingand Tool Search Tool be GA? Phase 4 depends on stability.
Review
- Architecture Review: Hal Casteel
- Implementation: Track H (Framework)
- Testing: Benchmark suite for agent selection accuracy at 770 vs. 25 agents
- Monitoring: Token counter hook for system prompt size
- Migration Plan: Script to safely move 745 agents to agents-extended/
Author: Claude (Opus 4.6) Date: 2026-02-07 (Updated)