ADR-151: Context Graph Evolution Architecture
Status
PROPOSED (2026-02-03)
Executive Summary
This ADR proposes evolving CODITECT's database architecture from a document-centric flat store to a relationship-aware context graph that enables:
- Task-specific context graphs - Small, policy-aware subgraphs for each agent turn
- Knowledge graph backbone - Persistent entity/relationship model across sessions
- Governed decision surfaces - What an agent CAN do, not just what it knows
- Multi-hop reasoning - Graph traversal for complex queries
- Lineage and audit trails - Track how knowledge influenced decisions
Key Distinction:
- Knowledge Graph: Persistent, global model of entities and relationships (institutional memory)
- Context Graph: Task-specific, ephemeral subgraph representing "what matters right now"
Context
Problem Statement
CODITECT currently stores valuable data across four databases (ADR-118):
| Database | Content | Current Structure |
|---|---|---|
org.db | Decisions, skill learnings, error solutions | Flat tables, weak relationships |
sessions.db | Messages, tool analytics, call graph | Tables + emerging graph (call_graph_*) |
platform.db | Components, capabilities | Flat with FTS5 search |
projects.db | Project metadata, embeddings | Flat with vector indices |
Current Limitations:
- No entity relationships - We store decisions but can't traverse "which decisions affected which components?"
- No session-to-entity linking - Messages reference files but not semantically
- No task-specific projections - Every query gets full database access, not relevant subgraphs
- No governance layer - No way to express "agent can access X under policy Y"
- No provenance tracking - Can't trace "this error solution came from this session discussing this ADR"
Evidence: Existing Graph Foundation
We already have a call graph structure in sessions.db:
-- Existing tables (10,204 functions, 73,319 edges)
call_graph_functions (node_id, name, file_path, start_line, end_line, language, signature, docstring, class_name)
call_graph_edges (caller_id, callee_name, call_line, call_file, arguments)
call_graph_memory (node_id, message_id, session_id, change_type, timestamp)
This proves the graph model works. We need to extend it beyond code to all CODITECT entities.
Research Foundation
Analysis of context graph vs knowledge graph patterns reveals:
| Aspect | Knowledge Graph | Context Graph |
|---|---|---|
| Scope | Large, comprehensive (millions of nodes) | Small, focused (10-100 nodes) |
| Lifecycle | Persistent, versioned | Ephemeral per task/turn |
| Purpose | What exists, factual relations | What's relevant NOW + allowed actions |
| Update | Slow, governed, curated | Built dynamically per request |
Key Insight: CODITECT needs BOTH:
- A knowledge graph as semantic backbone (org.db + enriched schemas)
- A context graph builder that projects task-specific subgraphs
Decision
1. Entity-Relationship Model (Knowledge Graph Layer)
Extend the four-tier architecture with a unified entity model:
┌─────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH BACKBONE │
│ │
│ Entity Types (Nodes): │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Component │ │ Session │ │ Decision │ │ Error │ │
│ │ (agent, │ │ (messages, │ │ (arch, │ │ Solution │ │
│ │ skill, │ │ tool calls)│ │ api, ui) │ │ │ │
│ │ command) │ └─────────────┘ └─────────────┘ └────────────┘ │
│ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Project │ │ File │ │ Track │ │ User │ │
│ │ (metadata, │ │ (path, │ │ (A-N, │ │ (tenant, │ │
│ │ config) │ │ language) │ │ task IDs) │ │ team) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Policy │ │ ADR │ │ AuditEvent │ │
│ │ (rules, │ │ (status, │ │ (who, what,│ │
│ │ constraints│ │ rationale) │ │ when, why) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ EDGE TYPES (Relationships) │
│ │
│ Structural: │
│ • INVOKES (Session → Component) │
│ • PRODUCES (Session → Decision) │
│ • SOLVES (ErrorSolution → Error pattern) │
│ • BELONGS_TO (Task → Track) │
│ • DEFINES (ADR → Decision) │
│ │
│ Semantic: │
│ • SIMILAR_TO (Component ↔ Component, Decision ↔ Decision) │
│ • REFERENCES (Decision → File, Decision → ADR) │
│ • CALLS (Function → Function) [existing] │
│ • USES (Component → File) │
│ │
│ Governance: │
│ • GOVERNED_BY (Entity → Policy) │
│ • RECORDED_IN (AuditEvent → Session) │
│ • CREATED_BY (Entity → User) │
└─────────────────────────────────────────────────────────────────┘
2. Database Schema Evolution
Phase 1: Add kg_nodes and kg_edges Tables (org.db)
-- TIER 2: org.db (Critical Knowledge)
-- Universal node table (all entities get a node)
CREATE TABLE kg_nodes (
node_id TEXT PRIMARY KEY, -- Format: "{type}:{id}" e.g., "decision:42"
node_type TEXT NOT NULL, -- Entity type
name TEXT NOT NULL, -- Human-readable name
properties TEXT, -- JSON properties bag
embedding_id TEXT, -- Vector embedding reference
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP,
-- Multi-tenant (ADR-053)
tenant_id TEXT,
project_id TEXT,
-- Indexing
INDEX idx_kg_nodes_type (node_type),
INDEX idx_kg_nodes_tenant (tenant_id)
);
-- Typed edges with properties
CREATE TABLE kg_edges (
edge_id INTEGER PRIMARY KEY AUTOINCREMENT,
edge_type TEXT NOT NULL, -- Relationship type
from_node TEXT NOT NULL, -- Source node_id
to_node TEXT NOT NULL, -- Target node_id
properties TEXT, -- JSON edge properties
weight REAL DEFAULT 1.0, -- Relevance/confidence score
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (from_node) REFERENCES kg_nodes(node_id) ON DELETE CASCADE,
FOREIGN KEY (to_node) REFERENCES kg_nodes(node_id) ON DELETE CASCADE,
INDEX idx_kg_edges_type (edge_type),
INDEX idx_kg_edges_from (from_node),
INDEX idx_kg_edges_to (to_node),
INDEX idx_kg_edges_pair (from_node, to_node)
);
-- Full-text search over nodes
CREATE VIRTUAL TABLE kg_nodes_fts USING fts5(
name, properties,
content='kg_nodes',
content_rowid='rowid'
);
Phase 2: Add Context Graph Builder Tables (sessions.db)
-- TIER 3: sessions.db (Regenerable)
-- Per-session context graphs (task-specific projections)
CREATE TABLE context_graphs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
task_id TEXT, -- ADR-054 format (A.1.1)
projection_mode TEXT DEFAULT 'anchor', -- anchor, semantic, policy_first
anchor_nodes TEXT, -- JSON array of anchor node_ids
-- Budget/limits
max_nodes INTEGER DEFAULT 128,
max_depth INTEGER DEFAULT 2,
actual_node_count INTEGER,
actual_edge_count INTEGER,
-- Serialized graph (for replay/audit)
graph_json TEXT, -- Serialized KGSubgraph
-- Governance
policies_applied TEXT, -- JSON array of policy node_ids
phi_node_count INTEGER DEFAULT 0, -- For compliance tracking
created_at TEXT DEFAULT CURRENT_TIMESTAMP
);
-- Track which context graphs influenced which tool calls
CREATE TABLE context_graph_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
context_graph_id INTEGER NOT NULL,
tool_call_id TEXT, -- Reference to tool_analytics
usage_type TEXT, -- 'prompt_injection', 'tool_filter', 'validation'
nodes_used TEXT, -- JSON array of nodes actually referenced
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (context_graph_id) REFERENCES context_graphs(id)
);
3. Context Graph Builder Service
Python service that builds task-specific context graphs:
# scripts/context_graph/builder.py
class CODITECTContextGraphBuilder:
"""
Builds task-specific context graphs from the knowledge graph.
Projection Modes:
- anchor: Ego network around anchor nodes (task, session, component)
- semantic: Vector similarity + KG neighborhood expansion
- policy_first: Start from policies, intersect with relevant entities
"""
def build_context_graph(
self,
anchors: List[str], # Node IDs to anchor from
mode: str = "anchor",
max_nodes: int = 128,
max_depth: int = 2,
policies: Optional[List[str]] = None,
) -> KGSubgraph:
"""
Build a task-specific context graph.
Returns:
KGSubgraph with nodes, edges, and governance metadata
"""
nodes: Dict[str, KGNode] = {}
edges: Dict[str, KGEdge] = {}
# 1) Pull structural neighborhood
for anchor in anchors:
ego = self._neighborhood(anchor, depth=max_depth, limit=max_nodes)
nodes.update({n.node_id: n for n in ego.nodes})
edges.update({e.edge_id: e for e in ego.edges})
# 2) Add semantic neighbors (optional)
if mode == "semantic":
semantic = self._semantic_expansion(list(nodes.keys()), k=20)
nodes.update({n.node_id: n for n in semantic.nodes})
edges.update({e.edge_id: e for e in semantic.edges})
# 3) Apply governance overlay
if policies:
policy_nodes = self._get_policy_nodes(policies)
nodes.update({n.node_id: n for n in policy_nodes})
# Add GOVERNED_BY edges
return KGSubgraph(
nodes=list(nodes.values())[:max_nodes],
edges=list(edges.values())[:max_nodes * 2],
)
4. MCP Server Extension
Extend the existing coditect-call-graph MCP server:
# Current MCP tools (already working):
# - index_file, index_directory
# - get_callers, get_callees
# - call_chain
# - memory_linked_search
# - call_graph_stats
# New tools to add:
@server.tool()
def build_context_graph(
anchors: List[str],
mode: str = "anchor",
max_nodes: int = 128,
max_depth: int = 2
) -> Dict[str, Any]:
"""Build a task-specific context graph from anchor nodes."""
...
@server.tool()
def query_knowledge_graph(
query: str,
node_types: Optional[List[str]] = None,
limit: int = 20
) -> Dict[str, Any]:
"""Full-text + semantic search over knowledge graph nodes."""
...
@server.tool()
def get_entity_neighbors(
node_id: str,
edge_types: Optional[List[str]] = None,
depth: int = 1
) -> Dict[str, Any]:
"""Get neighbors of an entity in the knowledge graph."""
...
@server.tool()
def trace_decision_lineage(
decision_id: str,
direction: str = "both" # upstream, downstream, both
) -> Dict[str, Any]:
"""Trace what influenced a decision and what it influenced."""
...
5. Integration with Existing Systems
┌─────────────────────────────────────────────────────────────────┐
│ INTEGRATION ARCHITECTURE │
└─────────────────────────────────────────────────────────────────┘
/cx (ADR-020) /cxq (ADR-021)
│ │
│ Extract messages │ Query messages
│ + entities + relationships │ + knowledge graph
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Entity Extractor │ │ Context Graph Query │
│ │ │ │
│ • Decision entities │ │ • /cxq --graph │
│ • Error entities │ │ • /cxq --neighbors │
│ • Component refs │ │ • /cxq --lineage │
│ • File references │ │ • /cxq --subgraph │
└─────────────────────┘ └─────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH (org.db) │
│ kg_nodes + kg_edges + existing tables (decisions, etc.) │
└─────────────────────────────────────────────────────────────────┘
│
│ Project per task
▼
┌─────────────────────────────────────────────────────────────────┐
│ CONTEXT GRAPH (sessions.db) │
│ context_graphs + context_graph_usage │
└─────────────────────────────────────────────────────────────────┘
│
│ Inject into agent
▼
┌─────────────────────────────────────────────────────────────────┐
│ AGENT TURN │
│ • Context graph serialized into prompt │
│ • Tool calls filtered by policy nodes │
│ • Audit events recorded back to KG │
└─────────────────────────────────────────────────────────────────┘
Implementation Phases
Phase 1: Schema Foundation (2 weeks)
Tasks:
- J.16.3.1: Add
kg_nodesandkg_edgestables to org.db - J.16.3.2: Add
context_graphstables to sessions.db - J.16.3.3: Create entity extraction pipeline from existing tables
- J.16.3.4: Backfill nodes from decisions, error_solutions, skill_learnings
Migration Path:
-- Backfill kg_nodes from existing decisions
INSERT INTO kg_nodes (node_id, node_type, name, properties)
SELECT
'decision:' || id,
'Decision',
SUBSTR(decision, 1, 100),
json_object(
'decision_type', decision_type,
'confidence', confidence,
'rationale', rationale
)
FROM decisions;
Phase 2: Context Graph Builder (2 weeks)
Tasks:
- J.16.3.5: Implement
CODITECTContextGraphBuilderservice - J.16.3.6: Add projection modes (anchor, semantic, policy_first)
- J.16.3.7: Integrate with /cx for automatic entity extraction
- J.16.3.8: Add --graph flags to /cxq for graph queries
Phase 3: MCP Server Extension (1 week)
Tasks:
- J.16.3.9: Add
build_context_graphMCP tool - J.16.3.10: Add
query_knowledge_graphMCP tool - J.16.3.11: Add
get_entity_neighborsMCP tool - J.16.3.12: Add
trace_decision_lineageMCP tool
Phase 4: Agent Integration (2 weeks)
Tasks:
- J.16.3.13: Modify agent orchestrator to build context graphs per turn
- J.16.3.14: Serialize context graphs into prompts
- J.16.3.15: Implement governance/policy filtering
- J.16.3.16: Add audit event recording
Value Proposition
For CODITECT Internal Development
| Current State | With Context Graph |
|---|---|
| Decisions stored flat | Decisions linked to components, files, ADRs |
| Error solutions isolated | Error solutions linked to sessions that discovered them |
| No cross-reference | "Which decisions affected this component?" answerable |
| Full database queries | Task-specific subgraphs (90% token reduction) |
| No provenance | Full lineage tracking |
For CODITECT Customers
| Current State | With Context Graph |
|---|---|
| Session history is flat text | Entities extracted and linked automatically |
| No relationship discovery | "What files are related to this bug?" answerable |
| Context window limits | Small, relevant context graphs fit easily |
| No governance | Policy-aware context filtering |
| No audit trails | Full decision lineage for compliance |
Consequences
Positive
- Relationship-aware queries - "What influenced this decision?" becomes possible
- Token efficiency - Context graphs are 10-100 nodes vs entire database
- Governance ready - Policy nodes can constrain agent actions
- Audit trails - Full provenance for compliance
- Multi-hop reasoning - Traverse relationships for complex queries
- Foundation for GraphRAG - Enables graph-aware retrieval
Negative
- Schema complexity - Two new tables + entity extraction
- Migration effort - Backfill existing data into nodes/edges
- Query complexity - Graph traversal is more complex than flat queries
- Storage overhead - Edges add storage (~2x for relationships)
Mitigations
- Incremental migration - Build graph alongside existing tables
- Tiered storage - Keep flat tables as source of truth, graph as index
- Query abstraction - Hide graph complexity behind /cxq flags
- Lazy extraction - Only create nodes for entities that matter
Related
- ADR-118: Four-Tier Database Architecture (extends)
- ADR-020: Context Extraction System (integration point)
- ADR-021: Context Query System (new query types)
- ADR-053: Cloud Context Sync (graph sync to PostgreSQL)
- ADR-136: CODITECT Experience Framework (context graph per persona)
- ADR-148: Database Schema Documentation Standard (new tables)
- ADR-149: Query Language Evolution (graph query DSL)
Appendix: Schema Summary
New Tables in org.db (Tier 2 - Critical)
| Table | Purpose | Backup |
|---|---|---|
kg_nodes | All entities as graph nodes | CRITICAL |
kg_edges | Relationships between entities | CRITICAL |
kg_nodes_fts | Full-text search over nodes | Regenerable |
New Tables in sessions.db (Tier 3 - Regenerable)
| Table | Purpose | Backup |
|---|---|---|
context_graphs | Per-session context graph snapshots | Optional |
context_graph_usage | How context graphs influenced tool calls | Optional |
Entity Types
| Type | Source | Example node_id |
|---|---|---|
| Decision | decisions table | decision:42 |
| ErrorSolution | error_solutions table | error:abc123 |
| SkillLearning | skill_learnings table | learning:789 |
| Component | platform.db components | component:agent/orchestrator |
| Session | Session metadata | session:uuid-here |
| File | File references | file:/path/to/file.py |
| Function | call_graph_functions | function:file.py:funcname |
| Track | PILOT tracks | track:A |
| ADR | Architecture decisions | adr:ADR-150 |
| Policy | Governance rules | policy:no-force-push |
| AuditEvent | Audit trail | audit:event-123 |
Edge Types
| Type | From → To | Example |
|---|---|---|
| INVOKES | Session → Component | Session used orchestrator agent |
| PRODUCES | Session → Decision | Session resulted in architecture decision |
| SOLVES | ErrorSolution → Error | Solution fixes TypeError pattern |
| BELONGS_TO | Entity → Track | Task belongs to Track A |
| DEFINES | ADR → Decision | ADR-118 defines database split |
| REFERENCES | Decision → File | Decision references database.py |
| SIMILAR_TO | Entity ↔ Entity | Semantic similarity |
| CALLS | Function → Function | Code call graph (existing) |
| GOVERNED_BY | Entity → Policy | Component governed by policy |
| RECORDED_IN | AuditEvent → Session | Audit event from this session |
| CREATED_BY | Entity → User | Decision made by user |
Track: J (Memory Intelligence) Task: J.16.3 Author: Claude (Opus 4.5) Created: 2026-02-03