Skip to main content

ADR-151: Context Graph Evolution Architecture

Status

PROPOSED (2026-02-03)

Executive Summary

This ADR proposes evolving CODITECT's database architecture from a document-centric flat store to a relationship-aware context graph that enables:

  1. Task-specific context graphs - Small, policy-aware subgraphs for each agent turn
  2. Knowledge graph backbone - Persistent entity/relationship model across sessions
  3. Governed decision surfaces - What an agent CAN do, not just what it knows
  4. Multi-hop reasoning - Graph traversal for complex queries
  5. Lineage and audit trails - Track how knowledge influenced decisions

Key Distinction:

  • Knowledge Graph: Persistent, global model of entities and relationships (institutional memory)
  • Context Graph: Task-specific, ephemeral subgraph representing "what matters right now"

Context

Problem Statement

CODITECT currently stores valuable data across four databases (ADR-118):

DatabaseContentCurrent Structure
org.dbDecisions, skill learnings, error solutionsFlat tables, weak relationships
sessions.dbMessages, tool analytics, call graphTables + emerging graph (call_graph_*)
platform.dbComponents, capabilitiesFlat with FTS5 search
projects.dbProject metadata, embeddingsFlat with vector indices

Current Limitations:

  1. No entity relationships - We store decisions but can't traverse "which decisions affected which components?"
  2. No session-to-entity linking - Messages reference files but not semantically
  3. No task-specific projections - Every query gets full database access, not relevant subgraphs
  4. No governance layer - No way to express "agent can access X under policy Y"
  5. No provenance tracking - Can't trace "this error solution came from this session discussing this ADR"

Evidence: Existing Graph Foundation

We already have a call graph structure in sessions.db:

-- Existing tables (10,204 functions, 73,319 edges)
call_graph_functions (node_id, name, file_path, start_line, end_line, language, signature, docstring, class_name)
call_graph_edges (caller_id, callee_name, call_line, call_file, arguments)
call_graph_memory (node_id, message_id, session_id, change_type, timestamp)

This proves the graph model works. We need to extend it beyond code to all CODITECT entities.

Research Foundation

Analysis of context graph vs knowledge graph patterns reveals:

AspectKnowledge GraphContext Graph
ScopeLarge, comprehensive (millions of nodes)Small, focused (10-100 nodes)
LifecyclePersistent, versionedEphemeral per task/turn
PurposeWhat exists, factual relationsWhat's relevant NOW + allowed actions
UpdateSlow, governed, curatedBuilt dynamically per request

Key Insight: CODITECT needs BOTH:

  • A knowledge graph as semantic backbone (org.db + enriched schemas)
  • A context graph builder that projects task-specific subgraphs

Decision

1. Entity-Relationship Model (Knowledge Graph Layer)

Extend the four-tier architecture with a unified entity model:

┌─────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH BACKBONE │
│ │
│ Entity Types (Nodes): │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Component │ │ Session │ │ Decision │ │ Error │ │
│ │ (agent, │ │ (messages, │ │ (arch, │ │ Solution │ │
│ │ skill, │ │ tool calls)│ │ api, ui) │ │ │ │
│ │ command) │ └─────────────┘ └─────────────┘ └────────────┘ │
│ └─────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│ │ Project │ │ File │ │ Track │ │ User │ │
│ │ (metadata, │ │ (path, │ │ (A-N, │ │ (tenant, │ │
│ │ config) │ │ language) │ │ task IDs) │ │ team) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └────────────┘ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Policy │ │ ADR │ │ AuditEvent │ │
│ │ (rules, │ │ (status, │ │ (who, what,│ │
│ │ constraints│ │ rationale) │ │ when, why) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│ EDGE TYPES (Relationships) │
│ │
│ Structural: │
│ • INVOKES (Session → Component) │
│ • PRODUCES (Session → Decision) │
│ • SOLVES (ErrorSolution → Error pattern) │
│ • BELONGS_TO (Task → Track) │
│ • DEFINES (ADR → Decision) │
│ │
│ Semantic: │
│ • SIMILAR_TO (Component ↔ Component, Decision ↔ Decision) │
│ • REFERENCES (Decision → File, Decision → ADR) │
│ • CALLS (Function → Function) [existing] │
│ • USES (Component → File) │
│ │
│ Governance: │
│ • GOVERNED_BY (Entity → Policy) │
│ • RECORDED_IN (AuditEvent → Session) │
│ • CREATED_BY (Entity → User) │
└─────────────────────────────────────────────────────────────────┘

2. Database Schema Evolution

Phase 1: Add kg_nodes and kg_edges Tables (org.db)

-- TIER 2: org.db (Critical Knowledge)

-- Universal node table (all entities get a node)
CREATE TABLE kg_nodes (
node_id TEXT PRIMARY KEY, -- Format: "{type}:{id}" e.g., "decision:42"
node_type TEXT NOT NULL, -- Entity type
name TEXT NOT NULL, -- Human-readable name
properties TEXT, -- JSON properties bag
embedding_id TEXT, -- Vector embedding reference
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP,

-- Multi-tenant (ADR-053)
tenant_id TEXT,
project_id TEXT,

-- Indexing
INDEX idx_kg_nodes_type (node_type),
INDEX idx_kg_nodes_tenant (tenant_id)
);

-- Typed edges with properties
CREATE TABLE kg_edges (
edge_id INTEGER PRIMARY KEY AUTOINCREMENT,
edge_type TEXT NOT NULL, -- Relationship type
from_node TEXT NOT NULL, -- Source node_id
to_node TEXT NOT NULL, -- Target node_id
properties TEXT, -- JSON edge properties
weight REAL DEFAULT 1.0, -- Relevance/confidence score
created_at TEXT DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (from_node) REFERENCES kg_nodes(node_id) ON DELETE CASCADE,
FOREIGN KEY (to_node) REFERENCES kg_nodes(node_id) ON DELETE CASCADE,

INDEX idx_kg_edges_type (edge_type),
INDEX idx_kg_edges_from (from_node),
INDEX idx_kg_edges_to (to_node),
INDEX idx_kg_edges_pair (from_node, to_node)
);

-- Full-text search over nodes
CREATE VIRTUAL TABLE kg_nodes_fts USING fts5(
name, properties,
content='kg_nodes',
content_rowid='rowid'
);

Phase 2: Add Context Graph Builder Tables (sessions.db)

-- TIER 3: sessions.db (Regenerable)

-- Per-session context graphs (task-specific projections)
CREATE TABLE context_graphs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
task_id TEXT, -- ADR-054 format (A.1.1)
projection_mode TEXT DEFAULT 'anchor', -- anchor, semantic, policy_first
anchor_nodes TEXT, -- JSON array of anchor node_ids

-- Budget/limits
max_nodes INTEGER DEFAULT 128,
max_depth INTEGER DEFAULT 2,
actual_node_count INTEGER,
actual_edge_count INTEGER,

-- Serialized graph (for replay/audit)
graph_json TEXT, -- Serialized KGSubgraph

-- Governance
policies_applied TEXT, -- JSON array of policy node_ids
phi_node_count INTEGER DEFAULT 0, -- For compliance tracking

created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Track which context graphs influenced which tool calls
CREATE TABLE context_graph_usage (
id INTEGER PRIMARY KEY AUTOINCREMENT,
context_graph_id INTEGER NOT NULL,
tool_call_id TEXT, -- Reference to tool_analytics
usage_type TEXT, -- 'prompt_injection', 'tool_filter', 'validation'
nodes_used TEXT, -- JSON array of nodes actually referenced
created_at TEXT DEFAULT CURRENT_TIMESTAMP,

FOREIGN KEY (context_graph_id) REFERENCES context_graphs(id)
);

3. Context Graph Builder Service

Python service that builds task-specific context graphs:

# scripts/context_graph/builder.py

class CODITECTContextGraphBuilder:
"""
Builds task-specific context graphs from the knowledge graph.

Projection Modes:
- anchor: Ego network around anchor nodes (task, session, component)
- semantic: Vector similarity + KG neighborhood expansion
- policy_first: Start from policies, intersect with relevant entities
"""

def build_context_graph(
self,
anchors: List[str], # Node IDs to anchor from
mode: str = "anchor",
max_nodes: int = 128,
max_depth: int = 2,
policies: Optional[List[str]] = None,
) -> KGSubgraph:
"""
Build a task-specific context graph.

Returns:
KGSubgraph with nodes, edges, and governance metadata
"""
nodes: Dict[str, KGNode] = {}
edges: Dict[str, KGEdge] = {}

# 1) Pull structural neighborhood
for anchor in anchors:
ego = self._neighborhood(anchor, depth=max_depth, limit=max_nodes)
nodes.update({n.node_id: n for n in ego.nodes})
edges.update({e.edge_id: e for e in ego.edges})

# 2) Add semantic neighbors (optional)
if mode == "semantic":
semantic = self._semantic_expansion(list(nodes.keys()), k=20)
nodes.update({n.node_id: n for n in semantic.nodes})
edges.update({e.edge_id: e for e in semantic.edges})

# 3) Apply governance overlay
if policies:
policy_nodes = self._get_policy_nodes(policies)
nodes.update({n.node_id: n for n in policy_nodes})
# Add GOVERNED_BY edges

return KGSubgraph(
nodes=list(nodes.values())[:max_nodes],
edges=list(edges.values())[:max_nodes * 2],
)

4. MCP Server Extension

Extend the existing coditect-call-graph MCP server:

# Current MCP tools (already working):
# - index_file, index_directory
# - get_callers, get_callees
# - call_chain
# - memory_linked_search
# - call_graph_stats

# New tools to add:
@server.tool()
def build_context_graph(
anchors: List[str],
mode: str = "anchor",
max_nodes: int = 128,
max_depth: int = 2
) -> Dict[str, Any]:
"""Build a task-specific context graph from anchor nodes."""
...

@server.tool()
def query_knowledge_graph(
query: str,
node_types: Optional[List[str]] = None,
limit: int = 20
) -> Dict[str, Any]:
"""Full-text + semantic search over knowledge graph nodes."""
...

@server.tool()
def get_entity_neighbors(
node_id: str,
edge_types: Optional[List[str]] = None,
depth: int = 1
) -> Dict[str, Any]:
"""Get neighbors of an entity in the knowledge graph."""
...

@server.tool()
def trace_decision_lineage(
decision_id: str,
direction: str = "both" # upstream, downstream, both
) -> Dict[str, Any]:
"""Trace what influenced a decision and what it influenced."""
...

5. Integration with Existing Systems

┌─────────────────────────────────────────────────────────────────┐
│ INTEGRATION ARCHITECTURE │
└─────────────────────────────────────────────────────────────────┘

/cx (ADR-020) /cxq (ADR-021)
│ │
│ Extract messages │ Query messages
│ + entities + relationships │ + knowledge graph
▼ ▼
┌─────────────────────┐ ┌─────────────────────┐
│ Entity Extractor │ │ Context Graph Query │
│ │ │ │
│ • Decision entities │ │ • /cxq --graph │
│ • Error entities │ │ • /cxq --neighbors │
│ • Component refs │ │ • /cxq --lineage │
│ • File references │ │ • /cxq --subgraph │
└─────────────────────┘ └─────────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE GRAPH (org.db) │
│ kg_nodes + kg_edges + existing tables (decisions, etc.) │
└─────────────────────────────────────────────────────────────────┘

│ Project per task

┌─────────────────────────────────────────────────────────────────┐
│ CONTEXT GRAPH (sessions.db) │
│ context_graphs + context_graph_usage │
└─────────────────────────────────────────────────────────────────┘

│ Inject into agent

┌─────────────────────────────────────────────────────────────────┐
│ AGENT TURN │
│ • Context graph serialized into prompt │
│ • Tool calls filtered by policy nodes │
│ • Audit events recorded back to KG │
└─────────────────────────────────────────────────────────────────┘

Implementation Phases

Phase 1: Schema Foundation (2 weeks)

Tasks:

  • J.16.3.1: Add kg_nodes and kg_edges tables to org.db
  • J.16.3.2: Add context_graphs tables to sessions.db
  • J.16.3.3: Create entity extraction pipeline from existing tables
  • J.16.3.4: Backfill nodes from decisions, error_solutions, skill_learnings

Migration Path:

-- Backfill kg_nodes from existing decisions
INSERT INTO kg_nodes (node_id, node_type, name, properties)
SELECT
'decision:' || id,
'Decision',
SUBSTR(decision, 1, 100),
json_object(
'decision_type', decision_type,
'confidence', confidence,
'rationale', rationale
)
FROM decisions;

Phase 2: Context Graph Builder (2 weeks)

Tasks:

  • J.16.3.5: Implement CODITECTContextGraphBuilder service
  • J.16.3.6: Add projection modes (anchor, semantic, policy_first)
  • J.16.3.7: Integrate with /cx for automatic entity extraction
  • J.16.3.8: Add --graph flags to /cxq for graph queries

Phase 3: MCP Server Extension (1 week)

Tasks:

  • J.16.3.9: Add build_context_graph MCP tool
  • J.16.3.10: Add query_knowledge_graph MCP tool
  • J.16.3.11: Add get_entity_neighbors MCP tool
  • J.16.3.12: Add trace_decision_lineage MCP tool

Phase 4: Agent Integration (2 weeks)

Tasks:

  • J.16.3.13: Modify agent orchestrator to build context graphs per turn
  • J.16.3.14: Serialize context graphs into prompts
  • J.16.3.15: Implement governance/policy filtering
  • J.16.3.16: Add audit event recording

Value Proposition

For CODITECT Internal Development

Current StateWith Context Graph
Decisions stored flatDecisions linked to components, files, ADRs
Error solutions isolatedError solutions linked to sessions that discovered them
No cross-reference"Which decisions affected this component?" answerable
Full database queriesTask-specific subgraphs (90% token reduction)
No provenanceFull lineage tracking

For CODITECT Customers

Current StateWith Context Graph
Session history is flat textEntities extracted and linked automatically
No relationship discovery"What files are related to this bug?" answerable
Context window limitsSmall, relevant context graphs fit easily
No governancePolicy-aware context filtering
No audit trailsFull decision lineage for compliance

Consequences

Positive

  1. Relationship-aware queries - "What influenced this decision?" becomes possible
  2. Token efficiency - Context graphs are 10-100 nodes vs entire database
  3. Governance ready - Policy nodes can constrain agent actions
  4. Audit trails - Full provenance for compliance
  5. Multi-hop reasoning - Traverse relationships for complex queries
  6. Foundation for GraphRAG - Enables graph-aware retrieval

Negative

  1. Schema complexity - Two new tables + entity extraction
  2. Migration effort - Backfill existing data into nodes/edges
  3. Query complexity - Graph traversal is more complex than flat queries
  4. Storage overhead - Edges add storage (~2x for relationships)

Mitigations

  • Incremental migration - Build graph alongside existing tables
  • Tiered storage - Keep flat tables as source of truth, graph as index
  • Query abstraction - Hide graph complexity behind /cxq flags
  • Lazy extraction - Only create nodes for entities that matter
  • ADR-118: Four-Tier Database Architecture (extends)
  • ADR-020: Context Extraction System (integration point)
  • ADR-021: Context Query System (new query types)
  • ADR-053: Cloud Context Sync (graph sync to PostgreSQL)
  • ADR-136: CODITECT Experience Framework (context graph per persona)
  • ADR-148: Database Schema Documentation Standard (new tables)
  • ADR-149: Query Language Evolution (graph query DSL)

Appendix: Schema Summary

New Tables in org.db (Tier 2 - Critical)

TablePurposeBackup
kg_nodesAll entities as graph nodesCRITICAL
kg_edgesRelationships between entitiesCRITICAL
kg_nodes_ftsFull-text search over nodesRegenerable

New Tables in sessions.db (Tier 3 - Regenerable)

TablePurposeBackup
context_graphsPer-session context graph snapshotsOptional
context_graph_usageHow context graphs influenced tool callsOptional

Entity Types

TypeSourceExample node_id
Decisiondecisions tabledecision:42
ErrorSolutionerror_solutions tableerror:abc123
SkillLearningskill_learnings tablelearning:789
Componentplatform.db componentscomponent:agent/orchestrator
SessionSession metadatasession:uuid-here
FileFile referencesfile:/path/to/file.py
Functioncall_graph_functionsfunction:file.py:funcname
TrackPILOT trackstrack:A
ADRArchitecture decisionsadr:ADR-150
PolicyGovernance rulespolicy:no-force-push
AuditEventAudit trailaudit:event-123

Edge Types

TypeFrom → ToExample
INVOKESSession → ComponentSession used orchestrator agent
PRODUCESSession → DecisionSession resulted in architecture decision
SOLVESErrorSolution → ErrorSolution fixes TypeError pattern
BELONGS_TOEntity → TrackTask belongs to Track A
DEFINESADR → DecisionADR-118 defines database split
REFERENCESDecision → FileDecision references database.py
SIMILAR_TOEntity ↔ EntitySemantic similarity
CALLSFunction → FunctionCode call graph (existing)
GOVERNED_BYEntity → PolicyComponent governed by policy
RECORDED_INAuditEvent → SessionAudit event from this session
CREATED_BYEntity → UserDecision made by user

Track: J (Memory Intelligence) Task: J.16.3 Author: Claude (Opus 4.5) Created: 2026-02-03