Context Database Enhancement Analysis

Author: Claude Code Analysis Date: 2025-12-22 Status: Proposed Priority: P1 - Critical for MoE Agent Continuity

Executive Summary

The CODITECT context database system has a significant gap: agent session outputs are not being captured in the context database when run via /cx. This analysis identifies root causes and proposes targeted enhancements to context-db.py and unified-message-extractor.py.

Key Findings

Finding	Impact	Status
Agent JSONL files ARE discoverable (505 files)	N/A	✅ Working
Agent JSONL files CAN be extracted (19 messages from test file)	N/A	✅ Working
Agent-specific metadata NOT preserved (agentId, parentUuid, etc.)	Loss of agent lineage tracking	❌ Gap
Only 14 agent messages in 94K total messages (0.01%)	99.99% of agent work invisible	❌ Gap
Batch `/cx` not run after recent agent sessions	New content not indexed	❌ Gap

Root Cause

The extraction pipeline works correctly but has two gaps:

No automatic capture on agent completion - Agent outputs only enter the database if /cx is manually run
Agent metadata loss - Fields like agentId, parentUuid, isSidechain are not preserved

Current Architecture

Agent Runs → ~/.claude/projects/agent-*.jsonl (stored)
                    ↓
              [MANUAL] User runs /cx
                    ↓
unified-message-extractor.py → context-storage/unified_messages.jsonl
                    ↓
context-db.py index_messages() → context.db (SQLite FTS5)
                    ↓
              /cxq queries

Statistics

Metric	Value
Total JSONL files	1,040
Agent JSONL files	505 (48.6%)
Main session files	535 (51.4%)
unified_messages.jsonl lines	94,147
Messages with agentId	14 (0.01%)
Last extraction	Dec 22, 2025 15:05

Gap Analysis

Gap 1: Agent Metadata Not Preserved

Agent JSONL entry structure:

{
  "parentUuid": "b101bc78-e0c9-4824-b815-3e94f6cc3ba3",
  "isSidechain": true,
  "userType": "external",
  "sessionId": "60cacdf6-403f-4350-8abc-42111016a762",
  "agentId": "ab782dc",
  "slug": "kind-chasing-hennessy",
  "type": "user",
  "message": { "role": "user", "content": "..." },
  "uuid": "62a64845-97ca-4ee9-967b-930196b50567",
  "timestamp": "2025-12-23T03:52:49.876Z"
}

Unified message format (current):

{
  "hash": "...",
  "content": "...",
  "role": "user",
  "provenance": {
    "source_file": "/.../agent-ab782dc.jsonl",
    "session_id": "agent-ab782dc",  // ← Derived from filename only
    "source_line": 1
  }
}

Lost metadata:

parentUuid - Links to parent session conversation
agentId - Unique agent identifier
isSidechain - Whether this is a branching conversation
uuid - Individual message UUID for threading
slug - Human-readable session name
version - Claude Code version that ran the agent

Gap 2: No Automatic Agent Capture

Agent sessions complete without triggering context extraction. Users must manually run /cx to capture outputs, which is easily forgotten.

Impact: MoE expert panel outputs from today's session were NOT in the database until manually extracted during this analysis.

Gap 3: Query Limitations for Agent Content

Current context-db.py queries don't distinguish:

Agent vs main session messages
Parent-child relationships between sessions
Sidechain conversations

Enhancement Proposals

Enhancement 1: Preserve Agent Metadata (P0)

File: scripts/unified-message-extractor.py Function: _parse_entry() and create_unified_message()

Change: Add agent-specific fields to unified message format.

New unified format:

{
  "hash": "...",
  "content": "...",
  "role": "user",
  "provenance": {
    "source_type": "jsonl",
    "source_file": "...",
    "session_id": "60cacdf6-403f-4350-8abc-42111016a762",
    "source_line": 1
  },
  "agent_context": {
    "agent_id": "ab782dc",
    "parent_uuid": "b101bc78-e0c9-4824-b815-3e94f6cc3ba3",
    "is_sidechain": true,
    "uuid": "62a64845-97ca-4ee9-967b-930196b50567",
    "slug": "kind-chasing-hennessy"
  }
}

Implementation notes:

Add agent_context field in create_unified_message()
Extract from entry in _parse_entry() when fields present
Keep null/empty for main session messages

Enhancement 2: Add Agent Query Support (P0)

File: scripts/context-db.py

New flags:

/cxq --agents              # Show all agent messages only
/cxq --agent-id ab782dc    # Messages from specific agent
/cxq --parent-session UUID # Messages linked to parent session
/cxq --sidechains          # Show sidechain conversations

Schema change (new column):

ALTER TABLE messages ADD COLUMN agent_context TEXT;  -- JSON blob
CREATE INDEX idx_agent_id ON messages(json_extract(agent_context, '$.agent_id'));

Enhancement 3: Auto-Capture Agent Outputs (P1)

File: hooks/post-agent-completion.md (new)

Mechanism: Claude Code hook that triggers context extraction after Task tool completion.

# .claude/hooks/post-agent-completion.yaml
name: auto-capture-agent
trigger: task_completion
condition: agent_type != "main"
action: |
  python3 scripts/unified-message-extractor.py \
    --jsonl ~/.claude/projects/*-*/agent-${AGENT_ID}.jsonl \
    --no-index

Alternative: Background watcher daemon (more complex but more reliable).

Enhancement 4: Agent Lineage Visualization (P2)

New command: /cxq --agent-tree

Session: 60cacdf6-403f-4350-8abc-42111016a762
├─ User: "Run MoE pattern for CodiFlow"
├─ Assistant: "I'll coordinate 5 experts..."
│  ├─ [Agent: ab782dc] Architecture Judge
│  │  └─ 19 messages, 36/40 score
│  ├─ [Agent: a02eb04] Quality Judge
│  │  └─ 23 messages, 7/40 score
│  └─ ...
└─ Assistant: "Judges complete, synthesizing..."

Implementation Priority

Enhancement	Priority	Effort	Impact
Preserve agent metadata	P0	4h	Enables all other features
Add agent query support	P0	6h	Makes agent work discoverable
Auto-capture agent outputs	P1	8h	Prevents data loss
Agent lineage visualization	P2	8h	UX improvement

Total estimated effort: 26 hours

Immediate Workaround

Until enhancements are implemented, manually capture agent outputs:

# After completing MoE or multi-agent workflows:
/cx

# Verify agent outputs captured:
python3 scripts/context-db.py --recent 50 | grep -i agent

scripts/unified-message-extractor.py - Current extraction logic
scripts/context-db.py - Current query system
internal/architecture/adrs/ADR-005-* - Token tracking ADR
docs/guides/MEMORY-MANAGEMENT-GUIDE.md - User documentation

Decision Required

Recommendation: Implement Enhancements 1 and 2 (P0) immediately to address the MoE judge output capture issue discovered today.

Approval Status: PENDING

Last Updated: 2025-12-22T20:00:00Z Compliance: CODITECT Research Standard v1.0.0

Executive Summary​

Key Findings​

Root Cause​

Current Architecture​

Statistics​

Gap Analysis​

Gap 1: Agent Metadata Not Preserved​

Gap 2: No Automatic Agent Capture​

Gap 3: Query Limitations for Agent Content​

Enhancement Proposals​

Enhancement 1: Preserve Agent Metadata (P0)​

Enhancement 2: Add Agent Query Support (P0)​

Enhancement 3: Auto-Capture Agent Outputs (P1)​

Enhancement 4: Agent Lineage Visualization (P2)​

Implementation Priority​

Immediate Workaround​

Related Documentation​

Decision Required​