ADR-077: Recursive Agent Chain Architecture

Status

Accepted - January 16, 2026

MoE Judges Score: 9.2/10 (APPROVED)

Context

Problem Statement

CODITECT agents face challenges with large contexts:

Context Window Limits: Even 200K token windows hit limits on large codebases
Single-Pass Analysis: Cannot iteratively refine understanding
No Sub-Agent Calls: Agents cannot spawn nested agents for subtasks
Context Pollution: Parent context leaks into child tasks
No Chunking Strategy: Large documents processed as monoliths

Source Analysis: RLM Pattern

Analysis of submodules/rlm (alexzhang13/rlm) reveals a revolutionary pattern:

# RLM's Recursive Language Model Pattern

# Instead of: llm.completion(huge_prompt)
# RLM does:   rlm.completion(prompt, context_as_variable)

# The model can then:
# 1. Examine context programmatically (len(context), context[:1000])
# 2. Break into chunks (chunks = chunk_context(context, 10000))
# 3. Query sub-LMs: results = [llm_query(chunk) for chunk in chunks]
# 4. Aggregate: summary = llm_query(f"Synthesize: {results}")
# 5. Recursively call deeper if needed

Key Innovations:

Context as a REPL variable, not embedded in prompt
llm_query() function for nested LLM calls
llm_query_batched() for parallel processing
Automatic context accumulation across iterations
FINAL() markers for completion signaling

RLM Architecture Components

┌─────────────────────────────────────────────────────────────────┐
│                        RLM ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│   Main Process (rlm.completion)                                  │
│   ┌───────────────────────────────────────────────────────────┐ │
│   │ 1. Load context as variable in REPL environment           │ │
│   │ 2. Send prompt + context reference to model               │ │
│   │ 3. Model generates code with llm_query() calls            │ │
│   │ 4. Execute code in sandbox (LocalREPL/DockerREPL)         │ │
│   │ 5. Feed results back to model                             │ │
│   │ 6. Repeat until FINAL() marker                            │ │
│   └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│                              ▼                                   │
│   LMHandler (Multi-threaded Server)                              │
│   ┌───────────────────────────────────────────────────────────┐ │
│   │ • Routes llm_query() calls to appropriate provider        │ │
│   │ • Tracks token usage per model                            │ │
│   │ • Handles batched queries in parallel                     │ │
│   │ • Socket communication for isolation                      │ │
│   └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│                              ▼                                   │
│   Environment (REPL)                                             │
│   ┌───────────────────────────────────────────────────────────┐ │
│   │ • Safe builtins only (no eval/exec/open)                  │ │
│   │ • Persistent state across iterations                      │ │
│   │ • llm_query() injected as callable                        │ │
│   │ • Timeout enforcement                                     │ │
│   └───────────────────────────────────────────────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Requirements

Recursive Calls: Agents can spawn sub-agents for subtasks
Context Isolation: Sub-agents don't inherit parent context
Token Tracking: Usage aggregated across recursion levels
Depth Limits: Configurable maximum recursion depth
Chunking Utilities: Built-in strategies for large contexts
TDD Compliance: Tests written before implementation

Decision

Implement Recursive Agent Chain Architecture enabling agents to spawn sub-agents via agent_query() function, based on RLM's llm_query() pattern.

1. Architecture Overview

┌─────────────────────────────────────────────────────────────────────────────┐
│                  RECURSIVE AGENT CHAIN ARCHITECTURE                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ORCHESTRATOR LAYER                                                         │
│   ┌──────────────────────────────────────────────────────────────────────┐ │
│   │ AgentChainOrchestrator                                                │ │
│   │                                                                        │ │
│   │ • Manages agent execution stack                                       │ │
│   │ • Enforces recursion depth limits                                     │ │
│   │ • Aggregates token usage across levels                                │ │
│   │ • Handles context isolation between agents                            │ │
│   └──────────────────────────────────────────────────────────────────────┘ │
│                              │                                               │
│                              ▼                                               │
│   AGENT EXECUTION LAYER                                                      │
│   ┌──────────────────────────────────────────────────────────────────────┐ │
│   │                                                                        │ │
│   │   Parent Agent (Level 0)                                              │ │
│   │   ┌─────────────────────────────────────────────────────────────┐    │ │
│   │   │ Task: Analyze large codebase                                 │    │ │
│   │   │                                                              │    │ │
│   │   │ Strategy:                                                    │    │ │
│   │   │ 1. chunks = chunk_files(codebase, max_tokens=50000)         │    │ │
│   │   │ 2. analyses = agent_query_batch(chunks, "analyze-code")     │    │ │
│   │   │ 3. synthesis = agent_query(analyses, "synthesize")          │    │ │
│   │   │ 4. FINAL(synthesis)                                         │    │ │
│   │   └─────────────────────────────────────────────────────────────┘    │ │
│   │                              │                                        │ │
│   │                              ▼                                        │ │
│   │   Child Agents (Level 1)                                              │ │
│   │   ┌────────────┐  ┌────────────┐  ┌────────────┐                    │ │
│   │   │ Chunk 1    │  │ Chunk 2    │  │ Chunk 3    │                    │ │
│   │   │ Analysis   │  │ Analysis   │  │ Analysis   │                    │ │
│   │   │            │  │            │  │            │                    │ │
│   │   │ May spawn  │  │ May spawn  │  │ May spawn  │                    │ │
│   │   │ Level 2    │  │ Level 2    │  │ Level 2    │                    │ │
│   │   └────────────┘  └────────────┘  └────────────┘                    │ │
│   │                                                                        │ │
│   └──────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
│   UTILITIES LAYER                                                            │
│   ┌──────────────────────────────────────────────────────────────────────┐ │
│   │                                                                        │ │
│   │   Chunking Strategies              Context Management                  │ │
│   │   ┌────────────────────┐          ┌────────────────────┐             │ │
│   │   │ • Token-based      │          │ • Versioned ctx    │             │ │
│   │   │ • Structure-aware  │          │ • History stacking │             │ │
│   │   │ • Semantic         │          │ • Isolation        │             │ │
│   │   │ • Overlap-window   │          │ • Merging          │             │ │
│   │   └────────────────────┘          └────────────────────┘             │ │
│   │                                                                        │ │
│   └──────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

2. Core Implementation

# scripts/core/recursive_agent_chain.py

from dataclasses import dataclass, field
from typing import List, Optional, Callable, Any
from enum import Enum
import asyncio
from scripts.core.usage_tracking import UsageTracker, UsageSummary

class AgentQueryType(Enum):
    SINGLE = "single"
    BATCH = "batch"

@dataclass
class AgentCallContext:
    """Context for a single agent call in the chain."""
    depth: int
    parent_task_id: str
    child_index: Optional[int] = None
    context_snapshot: Optional[str] = None

@dataclass
class AgentCallResult:
    """Result from an agent call."""
    content: str
    usage: UsageSummary
    depth: int
    duration_ms: float
    child_calls: int = 0

@dataclass
class ChunkingConfig:
    """Configuration for context chunking."""
    max_tokens: int = 50000
    overlap_tokens: int = 500
    strategy: str = "token"  # token, structure, semantic

class RecursiveAgentChain:
    """
    Recursive agent chain enabling nested agent calls.

    Based on RLM's llm_query() pattern, adapted for CODITECT agents.
    """

    MAX_DEPTH = 5
    MAX_BATCH_SIZE = 10

    def __init__(
        self,
        model_client,
        usage_tracker: Optional[UsageTracker] = None
    ):
        self.model_client = model_client
        self.usage_tracker = usage_tracker or UsageTracker()
        self.call_stack: List[AgentCallContext] = []
        self.total_child_calls = 0

    @property
    def current_depth(self) -> int:
        return len(self.call_stack)

    async def execute(
        self,
        task: str,
        context: str,
        task_id: str,
        agent_type: str = "general"
    ) -> AgentCallResult:
        """
        Execute agent with recursive capabilities.

        The agent can call:
        - agent_query(prompt, agent_type) - Single sub-agent call
        - agent_query_batch(prompts, agent_type) - Parallel sub-agent calls
        - chunk_context(context, config) - Split large context

        Args:
            task: Task description
            context: Context to analyze (can be large)
            task_id: PILOT plan task ID
            agent_type: Type of agent to use

        Returns:
            AgentCallResult with content and usage
        """
        if self.current_depth >= self.MAX_DEPTH:
            raise RecursionDepthError(f"Max depth {self.MAX_DEPTH} exceeded")

        # Push call context
        ctx = AgentCallContext(
            depth=self.current_depth,
            parent_task_id=task_id,
            context_snapshot=context[:1000] if context else None
        )
        self.call_stack.append(ctx)

        try:
            # Inject agent_query functions into execution environment
            env = self._create_execution_environment(task_id)

            # Execute agent with injected functions
            result = await self._execute_agent(task, context, agent_type, env)

            return AgentCallResult(
                content=result["content"],
                usage=result["usage"],
                depth=ctx.depth,
                duration_ms=result["duration_ms"],
                child_calls=self.total_child_calls
            )
        finally:
            self.call_stack.pop()

    def _create_execution_environment(self, parent_task_id: str) -> dict:
        """Create environment with agent_query functions."""

        async def agent_query(prompt: str, agent_type: str = "general") -> str:
            """
            Call a sub-agent with a prompt.

            This function is injected into the agent's execution environment,
            enabling recursive agent chains.
            """
            self.total_child_calls += 1
            child_task_id = f"{parent_task_id}.sub{self.total_child_calls}"

            result = await self.execute(
                task=prompt,
                context="",  # Sub-agents get no parent context
                task_id=child_task_id,
                agent_type=agent_type
            )
            return result.content

        async def agent_query_batch(
            prompts: List[str],
            agent_type: str = "general"
        ) -> List[str]:
            """
            Call multiple sub-agents in parallel.

            Useful for processing chunks concurrently.
            """
            if len(prompts) > self.MAX_BATCH_SIZE:
                raise ValueError(f"Batch size {len(prompts)} exceeds max {self.MAX_BATCH_SIZE}")

            tasks = [agent_query(p, agent_type) for p in prompts]
            return await asyncio.gather(*tasks)

        def chunk_context(
            context: str,
            config: Optional[ChunkingConfig] = None
        ) -> List[str]:
            """
            Split large context into manageable chunks.

            Strategies:
            - token: Split by token count
            - structure: Split by code structure (functions, classes)
            - semantic: Split by semantic boundaries
            """
            cfg = config or ChunkingConfig()
            return self._chunk_by_strategy(context, cfg)

        return {
            "agent_query": agent_query,
            "agent_query_batch": agent_query_batch,
            "chunk_context": chunk_context,
            "FINAL": lambda x: {"__final__": True, "content": x}
        }

    def _chunk_by_strategy(
        self,
        context: str,
        config: ChunkingConfig
    ) -> List[str]:
        """Chunk context based on strategy."""
        if config.strategy == "token":
            return self._chunk_by_tokens(context, config.max_tokens, config.overlap_tokens)
        elif config.strategy == "structure":
            return self._chunk_by_structure(context)
        elif config.strategy == "semantic":
            return self._chunk_by_semantic(context, config.max_tokens)
        else:
            raise ValueError(f"Unknown chunking strategy: {config.strategy}")

    def _chunk_by_tokens(
        self,
        context: str,
        max_tokens: int,
        overlap: int
    ) -> List[str]:
        """Split by approximate token count with overlap."""
        # Rough estimate: 4 chars per token
        chars_per_token = 4
        max_chars = max_tokens * chars_per_token
        overlap_chars = overlap * chars_per_token

        chunks = []
        start = 0

        while start < len(context):
            end = start + max_chars
            chunk = context[start:end]
            chunks.append(chunk)
            start = end - overlap_chars

        return chunks

    def _chunk_by_structure(self, context: str) -> List[str]:
        """Split by code structure (functions, classes)."""
        # Implementation would use AST parsing
        # Placeholder: split by double newlines
        return context.split("\n\n")

    def _chunk_by_semantic(self, context: str, max_tokens: int) -> List[str]:
        """Split by semantic boundaries."""
        # Implementation would use embeddings
        # Placeholder: fall back to token chunking
        return self._chunk_by_tokens(context, max_tokens, 500)

    async def _execute_agent(
        self,
        task: str,
        context: str,
        agent_type: str,
        env: dict
    ) -> dict:
        """Execute agent with environment."""
        import time
        start = time.time()

        # Build prompt with available functions
        prompt = f"""
You are a CODITECT agent with recursive capabilities.

## Task
{task}

## Context
The context is available as a variable. You can:
1. Examine it: `len(context)`, `context[:1000]`
2. Chunk it: `chunks = chunk_context(context, ChunkingConfig(max_tokens=50000))`
3. Query sub-agents: `result = await agent_query(prompt, "analyst")`
4. Batch queries: `results = await agent_query_batch(prompts, "analyst")`
5. Complete: `FINAL(your_answer)`

## Context Length
{len(context)} characters

## Available Context Preview
{context[:2000] if context else "No context provided"}

Generate code to accomplish the task. Use `FINAL(answer)` when done.
"""

        # Call model
        response = await self.model_client.complete(prompt, agent_type)

        # Track usage
        if response.get("usage"):
            self.usage_tracker.record_usage(
                model=response.get("model", "unknown"),
                provider=response.get("provider", "unknown"),
                input_tokens=response["usage"].get("input_tokens", 0),
                output_tokens=response["usage"].get("output_tokens", 0)
            )

        duration = (time.time() - start) * 1000

        return {
            "content": response.get("content", ""),
            "usage": self.usage_tracker.get_summary(),
            "duration_ms": duration
        }


class RecursionDepthError(Exception):
    """Raised when recursion depth limit exceeded."""
    pass

3. Chunking Strategies Skill

# skills/chunking-strategies/SKILL.md

---
name: chunking-strategies
description: Strategies for decomposing large contexts for recursive agent processing
---

# Chunking Strategies

## When to Use

- Context exceeds 100K tokens
- Multi-file analysis required
- Large document processing
- Codebase-wide operations

## Available Strategies

### 1. Token-Based Chunking

Split by token count with configurable overlap:

```python
chunks = chunk_context(
    context,
    ChunkingConfig(
        max_tokens=50000,
        overlap_tokens=500,
        strategy="token"
    )
)

Use When: Simple splitting sufficient, no structural requirements

2. Structure-Aware Chunking

Split by code structure (functions, classes, modules):

chunks = chunk_context(
    context,
    ChunkingConfig(strategy="structure")
)

Use When: Processing code, want complete functions/classes per chunk

3. Semantic Chunking

Split by semantic boundaries using embeddings:

chunks = chunk_context(
    context,
    ChunkingConfig(
        max_tokens=50000,
        strategy="semantic"
    )
)

Use When: Document analysis, want topically coherent chunks

Processing Patterns

Sequential Processing

results = []
for chunk in chunks:
    result = await agent_query(f"Analyze: {chunk}", "analyst")
    results.append(result)
synthesis = await agent_query(f"Synthesize: {results}", "synthesizer")
FINAL(synthesis)

Parallel Processing

prompts = [f"Analyze: {chunk}" for chunk in chunks]
results = await agent_query_batch(prompts, "analyst")
synthesis = await agent_query(f"Synthesize: {results}", "synthesizer")
FINAL(synthesis)

Hierarchical Processing

# Level 1: Analyze chunks
chunk_summaries = await agent_query_batch(
    [f"Summarize: {c}" for c in chunks],
    "summarizer"
)

# Level 2: Group summaries
groups = group_by_topic(chunk_summaries, n_groups=5)
group_analyses = await agent_query_batch(
    [f"Analyze group: {g}" for g in groups],
    "analyst"
)

# Level 3: Final synthesis
FINAL(await agent_query(f"Final synthesis: {group_analyses}", "architect"))

Guidelines

Overlap: Use 5-10% overlap to preserve context at boundaries
Depth Limit: Max 5 levels of recursion (configurable)
Batch Size: Max 10 parallel queries per batch
Token Budget: Track usage across all levels

### 4. TDD Test Specifications

```python
# tests/core/test_recursive_agent_chain.py

import pytest
from unittest.mock import AsyncMock, MagicMock
from scripts.core.recursive_agent_chain import (
    RecursiveAgentChain,
    ChunkingConfig,
    RecursionDepthError
)

class TestRecursiveAgentChain:
    """TDD tests for recursive agent chain."""

    @pytest.fixture
    def mock_model_client(self):
        client = AsyncMock()
        client.complete.return_value = {
            "content": "Analysis result",
            "model": "claude-sonnet-4-5",
            "provider": "anthropic",
            "usage": {"input_tokens": 100, "output_tokens": 50}
        }
        return client

    @pytest.mark.asyncio
    async def test_basic_execution(self, mock_model_client):
        """RED→GREEN: Basic agent execution works."""
        chain = RecursiveAgentChain(mock_model_client)

        result = await chain.execute(
            task="Analyze code",
            context="def foo(): pass",
            task_id="A.1.1"
        )

        assert result.content == "Analysis result"
        assert result.depth == 0

    @pytest.mark.asyncio
    async def test_recursion_depth_limit(self, mock_model_client):
        """RED→GREEN: Recursion depth enforced."""
        chain = RecursiveAgentChain(mock_model_client)
        chain.MAX_DEPTH = 2

        # Manually push contexts to simulate depth
        chain.call_stack.append(MagicMock())
        chain.call_stack.append(MagicMock())

        with pytest.raises(RecursionDepthError):
            await chain.execute("task", "context", "A.1.1")

    def test_token_chunking(self, mock_model_client):
        """RED→GREEN: Token-based chunking works."""
        chain = RecursiveAgentChain(mock_model_client)

        context = "x" * 10000  # 10K chars
        config = ChunkingConfig(max_tokens=1000, overlap_tokens=100, strategy="token")

        chunks = chain._chunk_by_tokens(context, 1000, 100)

        assert len(chunks) > 1
        assert all(len(c) <= 4000 for c in chunks)  # 1000 tokens * 4 chars

    def test_chunking_overlap(self, mock_model_client):
        """RED→GREEN: Chunks have proper overlap."""
        chain = RecursiveAgentChain(mock_model_client)

        context = "ABCDEFGHIJ" * 1000  # 10K chars
        chunks = chain._chunk_by_tokens(context, 500, 50)  # 2000 chars, 200 overlap

        # Verify overlap: end of chunk[i] should overlap with start of chunk[i+1]
        for i in range(len(chunks) - 1):
            overlap_start = chunks[i][-200:]  # Last 200 chars of chunk i
            next_start = chunks[i+1][:200]    # First 200 chars of chunk i+1
            assert overlap_start == next_start

    @pytest.mark.asyncio
    async def test_child_call_tracking(self, mock_model_client):
        """RED→GREEN: Child calls are tracked."""
        chain = RecursiveAgentChain(mock_model_client)

        # Execute with simulated child calls
        result = await chain.execute("task", "context", "A.1.1")

        # Child calls tracked via total_child_calls
        assert isinstance(result.child_calls, int)

    def test_execution_environment_has_functions(self, mock_model_client):
        """RED→GREEN: Environment includes agent_query functions."""
        chain = RecursiveAgentChain(mock_model_client)

        env = chain._create_execution_environment("A.1.1")

        assert "agent_query" in env
        assert "agent_query_batch" in env
        assert "chunk_context" in env
        assert "FINAL" in env
        assert callable(env["agent_query"])

    @pytest.mark.asyncio
    async def test_usage_aggregation(self, mock_model_client):
        """RED→GREEN: Token usage aggregated across calls."""
        chain = RecursiveAgentChain(mock_model_client)
        chain.usage_tracker.start_session("test")

        await chain.execute("task1", "context1", "A.1.1")
        await chain.execute("task2", "context2", "A.1.2")

        summary = chain.usage_tracker.get_summary()
        assert summary.total_calls == 2
        assert summary.total_input_tokens == 200  # 100 * 2

    @pytest.mark.asyncio
    async def test_batch_size_limit(self, mock_model_client):
        """RED→GREEN: Batch size enforced."""
        chain = RecursiveAgentChain(mock_model_client)
        chain.MAX_BATCH_SIZE = 5

        env = chain._create_execution_environment("A.1.1")
        agent_query_batch = env["agent_query_batch"]

        with pytest.raises(ValueError, match="Batch size"):
            await agent_query_batch(["p"] * 10, "analyst")

Consequences

Positive

Infinite Context: Handle arbitrarily large codebases via chunking
Divide & Conquer: Complex tasks decomposed into subtasks
Parallel Processing: Batch queries for speed
Token Efficiency: Only process relevant chunks
RLM Alignment: Pattern proven in production

Negative

Complexity: Recursive systems harder to debug
Cost: More API calls = more cost
Latency: Sequential recursion adds latency
Context Loss: Information may be lost at chunk boundaries

Mitigations

Depth Limits: Configurable max recursion depth
Cost Tracking: UsageTracker aggregates all costs
Parallel Batching: Concurrent processing where possible
Overlap Windows: Preserve context at boundaries

Implementation

Files to Create

File	Purpose	LOC Est.
`scripts/core/recursive_agent_chain.py`	Core chain implementation	~400
`skills/chunking-strategies/SKILL.md`	Chunking patterns skill	~200
`tests/core/test_recursive_agent_chain.py`	TDD tests	~300

Files to Modify

File	Changes
`scripts/core/agent_dispatcher.py`	Integrate recursive chains
`config/agent-chain-config.json`	Depth limits, batch sizes

References

Source Analysis: submodules/rlm/rlm/core/rlm.py
RLM Environments: submodules/rlm/rlm/environments/
Related ADR: ADR-075 (Token Usage), ADR-078 (Context Isolation)

Author: CODITECT Architecture Team Reviewers: Architecture Council Source Commit: Analysis of submodules/rlm

Status​

Context​

Problem Statement​

Source Analysis: RLM Pattern​

RLM Architecture Components​

Requirements​

Decision​

1. Architecture Overview​

2. Core Implementation​

3. Chunking Strategies Skill​

2. Structure-Aware Chunking​

3. Semantic Chunking​

Processing Patterns​

Sequential Processing​

Parallel Processing​

Hierarchical Processing​

Guidelines​

Consequences​

Positive​

Negative​

Mitigations​

Implementation​

Files to Create​

Files to Modify​

References​