ADR-077: Recursive Agent Chain Architecture
Status
Accepted - January 16, 2026
MoE Judges Score: 9.2/10 (APPROVED)
Context
Problem Statement
CODITECT agents face challenges with large contexts:
- Context Window Limits: Even 200K token windows hit limits on large codebases
- Single-Pass Analysis: Cannot iteratively refine understanding
- No Sub-Agent Calls: Agents cannot spawn nested agents for subtasks
- Context Pollution: Parent context leaks into child tasks
- No Chunking Strategy: Large documents processed as monoliths
Source Analysis: RLM Pattern
Analysis of submodules/rlm (alexzhang13/rlm) reveals a revolutionary pattern:
# RLM's Recursive Language Model Pattern
# Instead of: llm.completion(huge_prompt)
# RLM does: rlm.completion(prompt, context_as_variable)
# The model can then:
# 1. Examine context programmatically (len(context), context[:1000])
# 2. Break into chunks (chunks = chunk_context(context, 10000))
# 3. Query sub-LMs: results = [llm_query(chunk) for chunk in chunks]
# 4. Aggregate: summary = llm_query(f"Synthesize: {results}")
# 5. Recursively call deeper if needed
Key Innovations:
- Context as a REPL variable, not embedded in prompt
llm_query()function for nested LLM callsllm_query_batched()for parallel processing- Automatic context accumulation across iterations
FINAL()markers for completion signaling
RLM Architecture Components
┌─────────────────────────────────────────────────────────────────┐
│ RLM ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Main Process (rlm.completion) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ 1. Load context as variable in REPL environment │ │
│ │ 2. Send prompt + context reference to model │ │
│ │ 3. Model generates code with llm_query() calls │ │
│ │ 4. Execute code in sandbox (LocalREPL/DockerREPL) │ │
│ │ 5. Feed results back to model │ │
│ │ 6. Repeat until FINAL() marker │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ LMHandler (Multi-threaded Server) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ • Routes llm_query() calls to appropriate provider │ │
│ │ • Tracks token usage per model │ │
│ │ • Handles batched queries in parallel │ │
│ │ • Socket communication for isolation │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Environment (REPL) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ • Safe builtins only (no eval/exec/open) │ │
│ │ • Persistent state across iterations │ │
│ │ • llm_query() injected as callable │ │
│ │ • Timeout enforcement │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Requirements
- Recursive Calls: Agents can spawn sub-agents for subtasks
- Context Isolation: Sub-agents don't inherit parent context
- Token Tracking: Usage aggregated across recursion levels
- Depth Limits: Configurable maximum recursion depth
- Chunking Utilities: Built-in strategies for large contexts
- TDD Compliance: Tests written before implementation
Decision
Implement Recursive Agent Chain Architecture enabling agents to spawn sub-agents via agent_query() function, based on RLM's llm_query() pattern.
1. Architecture Overview
┌─────────────────────────────────────────────────────────────────────────────┐
│ RECURSIVE AGENT CHAIN ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ORCHESTRATOR LAYER │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ AgentChainOrchestrator │ │
│ │ │ │
│ │ • Manages agent execution stack │ │
│ │ • Enforces recursion depth limits │ │
│ │ • Aggregates token usage across levels │ │
│ │ • Handles context isolation between agents │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ AGENT EXECUTION LAYER │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Parent Agent (Level 0) │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ Task: Analyze large codebase │ │ │
│ │ │ │ │ │
│ │ │ Strategy: │ │ │
│ │ │ 1. chunks = chunk_files(codebase, max_tokens=50000) │ │ │
│ │ │ 2. analyses = agent_query_batch(chunks, "analyze-code") │ │ │
│ │ │ 3. synthesis = agent_query(analyses, "synthesize") │ │ │
│ │ │ 4. FINAL(synthesis) │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Child Agents (Level 1) │ │
│ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │
│ │ │ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │ │ │
│ │ │ Analysis │ │ Analysis │ │ Analysis │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ May spawn │ │ May spawn │ │ May spawn │ │ │
│ │ │ Level 2 │ │ Level 2 │ │ Level 2 │ │ │
│ │ └────────────┘ └────────────┘ └────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ UTILITIES LAYER │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Chunking Strategies Context Management │ │
│ │ ┌────────────────────┐ ┌────────────────────┐ │ │
│ │ │ • Token-based │ │ • Versioned ctx │ │ │
│ │ │ • Structure-aware │ │ • History stacking │ │ │
│ │ │ • Semantic │ │ • Isolation │ │ │
│ │ │ • Overlap-window │ │ • Merging │ │ │
│ │ └────────────────────┘ └────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
2. Core Implementation
# scripts/core/recursive_agent_chain.py
from dataclasses import dataclass, field
from typing import List, Optional, Callable, Any
from enum import Enum
import asyncio
from scripts.core.usage_tracking import UsageTracker, UsageSummary
class AgentQueryType(Enum):
SINGLE = "single"
BATCH = "batch"
@dataclass
class AgentCallContext:
"""Context for a single agent call in the chain."""
depth: int
parent_task_id: str
child_index: Optional[int] = None
context_snapshot: Optional[str] = None
@dataclass
class AgentCallResult:
"""Result from an agent call."""
content: str
usage: UsageSummary
depth: int
duration_ms: float
child_calls: int = 0
@dataclass
class ChunkingConfig:
"""Configuration for context chunking."""
max_tokens: int = 50000
overlap_tokens: int = 500
strategy: str = "token" # token, structure, semantic
class RecursiveAgentChain:
"""
Recursive agent chain enabling nested agent calls.
Based on RLM's llm_query() pattern, adapted for CODITECT agents.
"""
MAX_DEPTH = 5
MAX_BATCH_SIZE = 10
def __init__(
self,
model_client,
usage_tracker: Optional[UsageTracker] = None
):
self.model_client = model_client
self.usage_tracker = usage_tracker or UsageTracker()
self.call_stack: List[AgentCallContext] = []
self.total_child_calls = 0
@property
def current_depth(self) -> int:
return len(self.call_stack)
async def execute(
self,
task: str,
context: str,
task_id: str,
agent_type: str = "general"
) -> AgentCallResult:
"""
Execute agent with recursive capabilities.
The agent can call:
- agent_query(prompt, agent_type) - Single sub-agent call
- agent_query_batch(prompts, agent_type) - Parallel sub-agent calls
- chunk_context(context, config) - Split large context
Args:
task: Task description
context: Context to analyze (can be large)
task_id: PILOT plan task ID
agent_type: Type of agent to use
Returns:
AgentCallResult with content and usage
"""
if self.current_depth >= self.MAX_DEPTH:
raise RecursionDepthError(f"Max depth {self.MAX_DEPTH} exceeded")
# Push call context
ctx = AgentCallContext(
depth=self.current_depth,
parent_task_id=task_id,
context_snapshot=context[:1000] if context else None
)
self.call_stack.append(ctx)
try:
# Inject agent_query functions into execution environment
env = self._create_execution_environment(task_id)
# Execute agent with injected functions
result = await self._execute_agent(task, context, agent_type, env)
return AgentCallResult(
content=result["content"],
usage=result["usage"],
depth=ctx.depth,
duration_ms=result["duration_ms"],
child_calls=self.total_child_calls
)
finally:
self.call_stack.pop()
def _create_execution_environment(self, parent_task_id: str) -> dict:
"""Create environment with agent_query functions."""
async def agent_query(prompt: str, agent_type: str = "general") -> str:
"""
Call a sub-agent with a prompt.
This function is injected into the agent's execution environment,
enabling recursive agent chains.
"""
self.total_child_calls += 1
child_task_id = f"{parent_task_id}.sub{self.total_child_calls}"
result = await self.execute(
task=prompt,
context="", # Sub-agents get no parent context
task_id=child_task_id,
agent_type=agent_type
)
return result.content
async def agent_query_batch(
prompts: List[str],
agent_type: str = "general"
) -> List[str]:
"""
Call multiple sub-agents in parallel.
Useful for processing chunks concurrently.
"""
if len(prompts) > self.MAX_BATCH_SIZE:
raise ValueError(f"Batch size {len(prompts)} exceeds max {self.MAX_BATCH_SIZE}")
tasks = [agent_query(p, agent_type) for p in prompts]
return await asyncio.gather(*tasks)
def chunk_context(
context: str,
config: Optional[ChunkingConfig] = None
) -> List[str]:
"""
Split large context into manageable chunks.
Strategies:
- token: Split by token count
- structure: Split by code structure (functions, classes)
- semantic: Split by semantic boundaries
"""
cfg = config or ChunkingConfig()
return self._chunk_by_strategy(context, cfg)
return {
"agent_query": agent_query,
"agent_query_batch": agent_query_batch,
"chunk_context": chunk_context,
"FINAL": lambda x: {"__final__": True, "content": x}
}
def _chunk_by_strategy(
self,
context: str,
config: ChunkingConfig
) -> List[str]:
"""Chunk context based on strategy."""
if config.strategy == "token":
return self._chunk_by_tokens(context, config.max_tokens, config.overlap_tokens)
elif config.strategy == "structure":
return self._chunk_by_structure(context)
elif config.strategy == "semantic":
return self._chunk_by_semantic(context, config.max_tokens)
else:
raise ValueError(f"Unknown chunking strategy: {config.strategy}")
def _chunk_by_tokens(
self,
context: str,
max_tokens: int,
overlap: int
) -> List[str]:
"""Split by approximate token count with overlap."""
# Rough estimate: 4 chars per token
chars_per_token = 4
max_chars = max_tokens * chars_per_token
overlap_chars = overlap * chars_per_token
chunks = []
start = 0
while start < len(context):
end = start + max_chars
chunk = context[start:end]
chunks.append(chunk)
start = end - overlap_chars
return chunks
def _chunk_by_structure(self, context: str) -> List[str]:
"""Split by code structure (functions, classes)."""
# Implementation would use AST parsing
# Placeholder: split by double newlines
return context.split("\n\n")
def _chunk_by_semantic(self, context: str, max_tokens: int) -> List[str]:
"""Split by semantic boundaries."""
# Implementation would use embeddings
# Placeholder: fall back to token chunking
return self._chunk_by_tokens(context, max_tokens, 500)
async def _execute_agent(
self,
task: str,
context: str,
agent_type: str,
env: dict
) -> dict:
"""Execute agent with environment."""
import time
start = time.time()
# Build prompt with available functions
prompt = f"""
You are a CODITECT agent with recursive capabilities.
## Task
{task}
## Context
The context is available as a variable. You can:
1. Examine it: `len(context)`, `context[:1000]`
2. Chunk it: `chunks = chunk_context(context, ChunkingConfig(max_tokens=50000))`
3. Query sub-agents: `result = await agent_query(prompt, "analyst")`
4. Batch queries: `results = await agent_query_batch(prompts, "analyst")`
5. Complete: `FINAL(your_answer)`
## Context Length
{len(context)} characters
## Available Context Preview
{context[:2000] if context else "No context provided"}
Generate code to accomplish the task. Use `FINAL(answer)` when done.
"""
# Call model
response = await self.model_client.complete(prompt, agent_type)
# Track usage
if response.get("usage"):
self.usage_tracker.record_usage(
model=response.get("model", "unknown"),
provider=response.get("provider", "unknown"),
input_tokens=response["usage"].get("input_tokens", 0),
output_tokens=response["usage"].get("output_tokens", 0)
)
duration = (time.time() - start) * 1000
return {
"content": response.get("content", ""),
"usage": self.usage_tracker.get_summary(),
"duration_ms": duration
}
class RecursionDepthError(Exception):
"""Raised when recursion depth limit exceeded."""
pass
3. Chunking Strategies Skill
# skills/chunking-strategies/SKILL.md
---
name: chunking-strategies
description: Strategies for decomposing large contexts for recursive agent processing
---
# Chunking Strategies
## When to Use
- Context exceeds 100K tokens
- Multi-file analysis required
- Large document processing
- Codebase-wide operations
## Available Strategies
### 1. Token-Based Chunking
Split by token count with configurable overlap:
```python
chunks = chunk_context(
context,
ChunkingConfig(
max_tokens=50000,
overlap_tokens=500,
strategy="token"
)
)
Use When: Simple splitting sufficient, no structural requirements
2. Structure-Aware Chunking
Split by code structure (functions, classes, modules):
chunks = chunk_context(
context,
ChunkingConfig(strategy="structure")
)
Use When: Processing code, want complete functions/classes per chunk
3. Semantic Chunking
Split by semantic boundaries using embeddings:
chunks = chunk_context(
context,
ChunkingConfig(
max_tokens=50000,
strategy="semantic"
)
)
Use When: Document analysis, want topically coherent chunks
Processing Patterns
Sequential Processing
results = []
for chunk in chunks:
result = await agent_query(f"Analyze: {chunk}", "analyst")
results.append(result)
synthesis = await agent_query(f"Synthesize: {results}", "synthesizer")
FINAL(synthesis)
Parallel Processing
prompts = [f"Analyze: {chunk}" for chunk in chunks]
results = await agent_query_batch(prompts, "analyst")
synthesis = await agent_query(f"Synthesize: {results}", "synthesizer")
FINAL(synthesis)
Hierarchical Processing
# Level 1: Analyze chunks
chunk_summaries = await agent_query_batch(
[f"Summarize: {c}" for c in chunks],
"summarizer"
)
# Level 2: Group summaries
groups = group_by_topic(chunk_summaries, n_groups=5)
group_analyses = await agent_query_batch(
[f"Analyze group: {g}" for g in groups],
"analyst"
)
# Level 3: Final synthesis
FINAL(await agent_query(f"Final synthesis: {group_analyses}", "architect"))
Guidelines
- Overlap: Use 5-10% overlap to preserve context at boundaries
- Depth Limit: Max 5 levels of recursion (configurable)
- Batch Size: Max 10 parallel queries per batch
- Token Budget: Track usage across all levels
### 4. TDD Test Specifications
```python
# tests/core/test_recursive_agent_chain.py
import pytest
from unittest.mock import AsyncMock, MagicMock
from scripts.core.recursive_agent_chain import (
RecursiveAgentChain,
ChunkingConfig,
RecursionDepthError
)
class TestRecursiveAgentChain:
"""TDD tests for recursive agent chain."""
@pytest.fixture
def mock_model_client(self):
client = AsyncMock()
client.complete.return_value = {
"content": "Analysis result",
"model": "claude-sonnet-4-5",
"provider": "anthropic",
"usage": {"input_tokens": 100, "output_tokens": 50}
}
return client
@pytest.mark.asyncio
async def test_basic_execution(self, mock_model_client):
"""RED→GREEN: Basic agent execution works."""
chain = RecursiveAgentChain(mock_model_client)
result = await chain.execute(
task="Analyze code",
context="def foo(): pass",
task_id="A.1.1"
)
assert result.content == "Analysis result"
assert result.depth == 0
@pytest.mark.asyncio
async def test_recursion_depth_limit(self, mock_model_client):
"""RED→GREEN: Recursion depth enforced."""
chain = RecursiveAgentChain(mock_model_client)
chain.MAX_DEPTH = 2
# Manually push contexts to simulate depth
chain.call_stack.append(MagicMock())
chain.call_stack.append(MagicMock())
with pytest.raises(RecursionDepthError):
await chain.execute("task", "context", "A.1.1")
def test_token_chunking(self, mock_model_client):
"""RED→GREEN: Token-based chunking works."""
chain = RecursiveAgentChain(mock_model_client)
context = "x" * 10000 # 10K chars
config = ChunkingConfig(max_tokens=1000, overlap_tokens=100, strategy="token")
chunks = chain._chunk_by_tokens(context, 1000, 100)
assert len(chunks) > 1
assert all(len(c) <= 4000 for c in chunks) # 1000 tokens * 4 chars
def test_chunking_overlap(self, mock_model_client):
"""RED→GREEN: Chunks have proper overlap."""
chain = RecursiveAgentChain(mock_model_client)
context = "ABCDEFGHIJ" * 1000 # 10K chars
chunks = chain._chunk_by_tokens(context, 500, 50) # 2000 chars, 200 overlap
# Verify overlap: end of chunk[i] should overlap with start of chunk[i+1]
for i in range(len(chunks) - 1):
overlap_start = chunks[i][-200:] # Last 200 chars of chunk i
next_start = chunks[i+1][:200] # First 200 chars of chunk i+1
assert overlap_start == next_start
@pytest.mark.asyncio
async def test_child_call_tracking(self, mock_model_client):
"""RED→GREEN: Child calls are tracked."""
chain = RecursiveAgentChain(mock_model_client)
# Execute with simulated child calls
result = await chain.execute("task", "context", "A.1.1")
# Child calls tracked via total_child_calls
assert isinstance(result.child_calls, int)
def test_execution_environment_has_functions(self, mock_model_client):
"""RED→GREEN: Environment includes agent_query functions."""
chain = RecursiveAgentChain(mock_model_client)
env = chain._create_execution_environment("A.1.1")
assert "agent_query" in env
assert "agent_query_batch" in env
assert "chunk_context" in env
assert "FINAL" in env
assert callable(env["agent_query"])
@pytest.mark.asyncio
async def test_usage_aggregation(self, mock_model_client):
"""RED→GREEN: Token usage aggregated across calls."""
chain = RecursiveAgentChain(mock_model_client)
chain.usage_tracker.start_session("test")
await chain.execute("task1", "context1", "A.1.1")
await chain.execute("task2", "context2", "A.1.2")
summary = chain.usage_tracker.get_summary()
assert summary.total_calls == 2
assert summary.total_input_tokens == 200 # 100 * 2
@pytest.mark.asyncio
async def test_batch_size_limit(self, mock_model_client):
"""RED→GREEN: Batch size enforced."""
chain = RecursiveAgentChain(mock_model_client)
chain.MAX_BATCH_SIZE = 5
env = chain._create_execution_environment("A.1.1")
agent_query_batch = env["agent_query_batch"]
with pytest.raises(ValueError, match="Batch size"):
await agent_query_batch(["p"] * 10, "analyst")
Consequences
Positive
- Infinite Context: Handle arbitrarily large codebases via chunking
- Divide & Conquer: Complex tasks decomposed into subtasks
- Parallel Processing: Batch queries for speed
- Token Efficiency: Only process relevant chunks
- RLM Alignment: Pattern proven in production
Negative
- Complexity: Recursive systems harder to debug
- Cost: More API calls = more cost
- Latency: Sequential recursion adds latency
- Context Loss: Information may be lost at chunk boundaries
Mitigations
- Depth Limits: Configurable max recursion depth
- Cost Tracking: UsageTracker aggregates all costs
- Parallel Batching: Concurrent processing where possible
- Overlap Windows: Preserve context at boundaries
Implementation
Files to Create
| File | Purpose | LOC Est. |
|---|---|---|
scripts/core/recursive_agent_chain.py | Core chain implementation | ~400 |
skills/chunking-strategies/SKILL.md | Chunking patterns skill | ~200 |
tests/core/test_recursive_agent_chain.py | TDD tests | ~300 |
Files to Modify
| File | Changes |
|---|---|
scripts/core/agent_dispatcher.py | Integrate recursive chains |
config/agent-chain-config.json | Depth limits, batch sizes |
References
- Source Analysis:
submodules/rlm/rlm/core/rlm.py - RLM Environments:
submodules/rlm/rlm/environments/ - Related ADR: ADR-075 (Token Usage), ADR-078 (Context Isolation)
Author: CODITECT Architecture Team Reviewers: Architecture Council Source Commit: Analysis of submodules/rlm