ADR-151: Context Graph Evolution Architecture

Status

PROPOSED (2026-02-03)

Executive Summary

This ADR proposes evolving CODITECT's database architecture from a document-centric flat store to a relationship-aware context graph that enables:

Task-specific context graphs - Small, policy-aware subgraphs for each agent turn
Knowledge graph backbone - Persistent entity/relationship model across sessions
Governed decision surfaces - What an agent CAN do, not just what it knows
Multi-hop reasoning - Graph traversal for complex queries
Lineage and audit trails - Track how knowledge influenced decisions

Key Distinction:

Knowledge Graph: Persistent, global model of entities and relationships (institutional memory)
Context Graph: Task-specific, ephemeral subgraph representing "what matters right now"

Context

Problem Statement

CODITECT currently stores valuable data across four databases (ADR-118):

Database	Content	Current Structure
`org.db`	Decisions, skill learnings, error solutions	Flat tables, weak relationships
`sessions.db`	Messages, tool analytics, call graph	Tables + emerging graph (call_graph_*)
`platform.db`	Components, capabilities	Flat with FTS5 search
`projects.db`	Project metadata, embeddings	Flat with vector indices

Current Limitations:

No entity relationships - We store decisions but can't traverse "which decisions affected which components?"
No session-to-entity linking - Messages reference files but not semantically
No task-specific projections - Every query gets full database access, not relevant subgraphs
No governance layer - No way to express "agent can access X under policy Y"
No provenance tracking - Can't trace "this error solution came from this session discussing this ADR"

Evidence: Existing Graph Foundation

We already have a call graph structure in sessions.db:

-- Existing tables (10,204 functions, 73,319 edges)
call_graph_functions (node_id, name, file_path, start_line, end_line, language, signature, docstring, class_name)
call_graph_edges (caller_id, callee_name, call_line, call_file, arguments)
call_graph_memory (node_id, message_id, session_id, change_type, timestamp)

This proves the graph model works. We need to extend it beyond code to all CODITECT entities.

Research Foundation

Analysis of context graph vs knowledge graph patterns reveals:

Aspect	Knowledge Graph	Context Graph
Scope	Large, comprehensive (millions of nodes)	Small, focused (10-100 nodes)
Lifecycle	Persistent, versioned	Ephemeral per task/turn
Purpose	What exists, factual relations	What's relevant NOW + allowed actions
Update	Slow, governed, curated	Built dynamically per request

Key Insight: CODITECT needs BOTH:

A knowledge graph as semantic backbone (org.db + enriched schemas)
A context graph builder that projects task-specific subgraphs

Decision

1. Entity-Relationship Model (Knowledge Graph Layer)

Extend the four-tier architecture with a unified entity model:

┌─────────────────────────────────────────────────────────────────┐
│                 KNOWLEDGE GRAPH BACKBONE                         │
│                                                                  │
│  Entity Types (Nodes):                                          │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│  │  Component  │ │   Session   │ │  Decision   │ │   Error    │ │
│  │  (agent,    │ │  (messages, │ │  (arch,     │ │  Solution  │ │
│  │  skill,     │ │  tool calls)│ │  api, ui)   │ │            │ │
│  │  command)   │ └─────────────┘ └─────────────┘ └────────────┘ │
│  └─────────────┘                                                │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │
│  │   Project   │ │    File     │ │   Track     │ │   User     │ │
│  │  (metadata, │ │  (path,     │ │  (A-N,      │ │  (tenant,  │ │
│  │  config)    │ │  language)  │ │  task IDs)  │ │  team)     │ │
│  └─────────────┘ └─────────────┘ └─────────────┘ └────────────┘ │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐               │
│  │   Policy    │ │    ADR      │ │  AuditEvent │               │
│  │  (rules,    │ │  (status,   │ │  (who, what,│               │
│  │  constraints│ │  rationale) │ │  when, why) │               │
│  └─────────────┘ └─────────────┘ └─────────────┘               │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                 EDGE TYPES (Relationships)                       │
│                                                                  │
│  Structural:                                                     │
│  • INVOKES (Session → Component)                                │
│  • PRODUCES (Session → Decision)                                │
│  • SOLVES (ErrorSolution → Error pattern)                       │
│  • BELONGS_TO (Task → Track)                                    │
│  • DEFINES (ADR → Decision)                                     │
│                                                                  │
│  Semantic:                                                       │
│  • SIMILAR_TO (Component ↔ Component, Decision ↔ Decision)      │
│  • REFERENCES (Decision → File, Decision → ADR)                 │
│  • CALLS (Function → Function) [existing]                       │
│  • USES (Component → File)                                      │
│                                                                  │
│  Governance:                                                     │
│  • GOVERNED_BY (Entity → Policy)                                │
│  • RECORDED_IN (AuditEvent → Session)                           │
│  • CREATED_BY (Entity → User)                                   │
└─────────────────────────────────────────────────────────────────┘

2. Database Schema Evolution

Phase 1: Add `kg_nodes` and `kg_edges` Tables (org.db)

-- TIER 2: org.db (Critical Knowledge)

-- Universal node table (all entities get a node)
CREATE TABLE kg_nodes (
    node_id TEXT PRIMARY KEY,           -- Format: "{type}:{id}" e.g., "decision:42"
    node_type TEXT NOT NULL,            -- Entity type
    name TEXT NOT NULL,                 -- Human-readable name
    properties TEXT,                    -- JSON properties bag
    embedding_id TEXT,                  -- Vector embedding reference
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
    updated_at TEXT DEFAULT CURRENT_TIMESTAMP,

    -- Multi-tenant (ADR-053)
    tenant_id TEXT,
    project_id TEXT,

    -- Indexing
    INDEX idx_kg_nodes_type (node_type),
    INDEX idx_kg_nodes_tenant (tenant_id)
);

-- Typed edges with properties
CREATE TABLE kg_edges (
    edge_id INTEGER PRIMARY KEY AUTOINCREMENT,
    edge_type TEXT NOT NULL,            -- Relationship type
    from_node TEXT NOT NULL,            -- Source node_id
    to_node TEXT NOT NULL,              -- Target node_id
    properties TEXT,                    -- JSON edge properties
    weight REAL DEFAULT 1.0,            -- Relevance/confidence score
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (from_node) REFERENCES kg_nodes(node_id) ON DELETE CASCADE,
    FOREIGN KEY (to_node) REFERENCES kg_nodes(node_id) ON DELETE CASCADE,

    INDEX idx_kg_edges_type (edge_type),
    INDEX idx_kg_edges_from (from_node),
    INDEX idx_kg_edges_to (to_node),
    INDEX idx_kg_edges_pair (from_node, to_node)
);

-- Full-text search over nodes
CREATE VIRTUAL TABLE kg_nodes_fts USING fts5(
    name, properties,
    content='kg_nodes',
    content_rowid='rowid'
);

Phase 2: Add Context Graph Builder Tables (sessions.db)

-- TIER 3: sessions.db (Regenerable)

-- Per-session context graphs (task-specific projections)
CREATE TABLE context_graphs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    session_id TEXT NOT NULL,
    task_id TEXT,                       -- ADR-054 format (A.1.1)
    projection_mode TEXT DEFAULT 'anchor', -- anchor, semantic, policy_first
    anchor_nodes TEXT,                  -- JSON array of anchor node_ids

    -- Budget/limits
    max_nodes INTEGER DEFAULT 128,
    max_depth INTEGER DEFAULT 2,
    actual_node_count INTEGER,
    actual_edge_count INTEGER,

    -- Serialized graph (for replay/audit)
    graph_json TEXT,                    -- Serialized KGSubgraph

    -- Governance
    policies_applied TEXT,              -- JSON array of policy node_ids
    phi_node_count INTEGER DEFAULT 0,   -- For compliance tracking

    created_at TEXT DEFAULT CURRENT_TIMESTAMP
);

-- Track which context graphs influenced which tool calls
CREATE TABLE context_graph_usage (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    context_graph_id INTEGER NOT NULL,
    tool_call_id TEXT,                  -- Reference to tool_analytics
    usage_type TEXT,                    -- 'prompt_injection', 'tool_filter', 'validation'
    nodes_used TEXT,                    -- JSON array of nodes actually referenced
    created_at TEXT DEFAULT CURRENT_TIMESTAMP,

    FOREIGN KEY (context_graph_id) REFERENCES context_graphs(id)
);

3. Context Graph Builder Service

Python service that builds task-specific context graphs:

# scripts/context_graph/builder.py

class CODITECTContextGraphBuilder:
    """
    Builds task-specific context graphs from the knowledge graph.

    Projection Modes:
    - anchor: Ego network around anchor nodes (task, session, component)
    - semantic: Vector similarity + KG neighborhood expansion
    - policy_first: Start from policies, intersect with relevant entities
    """

    def build_context_graph(
        self,
        anchors: List[str],          # Node IDs to anchor from
        mode: str = "anchor",
        max_nodes: int = 128,
        max_depth: int = 2,
        policies: Optional[List[str]] = None,
    ) -> KGSubgraph:
        """
        Build a task-specific context graph.

        Returns:
            KGSubgraph with nodes, edges, and governance metadata
        """
        nodes: Dict[str, KGNode] = {}
        edges: Dict[str, KGEdge] = {}

        # 1) Pull structural neighborhood
        for anchor in anchors:
            ego = self._neighborhood(anchor, depth=max_depth, limit=max_nodes)
            nodes.update({n.node_id: n for n in ego.nodes})
            edges.update({e.edge_id: e for e in ego.edges})

        # 2) Add semantic neighbors (optional)
        if mode == "semantic":
            semantic = self._semantic_expansion(list(nodes.keys()), k=20)
            nodes.update({n.node_id: n for n in semantic.nodes})
            edges.update({e.edge_id: e for e in semantic.edges})

        # 3) Apply governance overlay
        if policies:
            policy_nodes = self._get_policy_nodes(policies)
            nodes.update({n.node_id: n for n in policy_nodes})
            # Add GOVERNED_BY edges

        return KGSubgraph(
            nodes=list(nodes.values())[:max_nodes],
            edges=list(edges.values())[:max_nodes * 2],
        )

4. MCP Server Extension

Extend the existing coditect-call-graph MCP server:

# Current MCP tools (already working):
# - index_file, index_directory
# - get_callers, get_callees
# - call_chain
# - memory_linked_search
# - call_graph_stats

# New tools to add:
@server.tool()
def build_context_graph(
    anchors: List[str],
    mode: str = "anchor",
    max_nodes: int = 128,
    max_depth: int = 2
) -> Dict[str, Any]:
    """Build a task-specific context graph from anchor nodes."""
    ...

@server.tool()
def query_knowledge_graph(
    query: str,
    node_types: Optional[List[str]] = None,
    limit: int = 20
) -> Dict[str, Any]:
    """Full-text + semantic search over knowledge graph nodes."""
    ...

@server.tool()
def get_entity_neighbors(
    node_id: str,
    edge_types: Optional[List[str]] = None,
    depth: int = 1
) -> Dict[str, Any]:
    """Get neighbors of an entity in the knowledge graph."""
    ...

@server.tool()
def trace_decision_lineage(
    decision_id: str,
    direction: str = "both"  # upstream, downstream, both
) -> Dict[str, Any]:
    """Trace what influenced a decision and what it influenced."""
    ...

5. Integration with Existing Systems

┌─────────────────────────────────────────────────────────────────┐
│                     INTEGRATION ARCHITECTURE                     │
└─────────────────────────────────────────────────────────────────┘

/cx (ADR-020)                    /cxq (ADR-021)
    │                                │
    │ Extract messages               │ Query messages
    │ + entities + relationships     │ + knowledge graph
    ▼                                ▼
┌─────────────────────┐      ┌─────────────────────┐
│ Entity Extractor    │      │ Context Graph Query │
│                     │      │                     │
│ • Decision entities │      │ • /cxq --graph      │
│ • Error entities    │      │ • /cxq --neighbors  │
│ • Component refs    │      │ • /cxq --lineage    │
│ • File references   │      │ • /cxq --subgraph   │
└─────────────────────┘      └─────────────────────┘
         │                           │
         ▼                           ▼
┌─────────────────────────────────────────────────────────────────┐
│                    KNOWLEDGE GRAPH (org.db)                      │
│  kg_nodes + kg_edges + existing tables (decisions, etc.)         │
└─────────────────────────────────────────────────────────────────┘
         │
         │ Project per task
         ▼
┌─────────────────────────────────────────────────────────────────┐
│                   CONTEXT GRAPH (sessions.db)                    │
│  context_graphs + context_graph_usage                            │
└─────────────────────────────────────────────────────────────────┘
         │
         │ Inject into agent
         ▼
┌─────────────────────────────────────────────────────────────────┐
│                        AGENT TURN                                │
│  • Context graph serialized into prompt                          │
│  • Tool calls filtered by policy nodes                          │
│  • Audit events recorded back to KG                              │
└─────────────────────────────────────────────────────────────────┘

Implementation Phases

Phase 1: Schema Foundation (2 weeks)

Tasks:

J.16.3.1: Add kg_nodes and kg_edges tables to org.db
J.16.3.2: Add context_graphs tables to sessions.db
J.16.3.3: Create entity extraction pipeline from existing tables
J.16.3.4: Backfill nodes from decisions, error_solutions, skill_learnings

Migration Path:

-- Backfill kg_nodes from existing decisions
INSERT INTO kg_nodes (node_id, node_type, name, properties)
SELECT
    'decision:' || id,
    'Decision',
    SUBSTR(decision, 1, 100),
    json_object(
        'decision_type', decision_type,
        'confidence', confidence,
        'rationale', rationale
    )
FROM decisions;

Phase 2: Context Graph Builder (2 weeks)

Tasks:

J.16.3.5: Implement CODITECTContextGraphBuilder service
J.16.3.6: Add projection modes (anchor, semantic, policy_first)
J.16.3.7: Integrate with /cx for automatic entity extraction
J.16.3.8: Add --graph flags to /cxq for graph queries

Phase 3: MCP Server Extension (1 week)

Tasks:

J.16.3.9: Add build_context_graph MCP tool
J.16.3.10: Add query_knowledge_graph MCP tool
J.16.3.11: Add get_entity_neighbors MCP tool
J.16.3.12: Add trace_decision_lineage MCP tool

Phase 4: Agent Integration (2 weeks)

Tasks:

J.16.3.13: Modify agent orchestrator to build context graphs per turn
J.16.3.14: Serialize context graphs into prompts
J.16.3.15: Implement governance/policy filtering
J.16.3.16: Add audit event recording

Value Proposition

For CODITECT Internal Development

Current State	With Context Graph
Decisions stored flat	Decisions linked to components, files, ADRs
Error solutions isolated	Error solutions linked to sessions that discovered them
No cross-reference	"Which decisions affected this component?" answerable
Full database queries	Task-specific subgraphs (90% token reduction)
No provenance	Full lineage tracking

For CODITECT Customers

Current State	With Context Graph
Session history is flat text	Entities extracted and linked automatically
No relationship discovery	"What files are related to this bug?" answerable
Context window limits	Small, relevant context graphs fit easily
No governance	Policy-aware context filtering
No audit trails	Full decision lineage for compliance

Consequences

Positive

Relationship-aware queries - "What influenced this decision?" becomes possible
Token efficiency - Context graphs are 10-100 nodes vs entire database
Governance ready - Policy nodes can constrain agent actions
Audit trails - Full provenance for compliance
Multi-hop reasoning - Traverse relationships for complex queries
Foundation for GraphRAG - Enables graph-aware retrieval

Negative

Schema complexity - Two new tables + entity extraction
Migration effort - Backfill existing data into nodes/edges
Query complexity - Graph traversal is more complex than flat queries
Storage overhead - Edges add storage (~2x for relationships)

Mitigations

Incremental migration - Build graph alongside existing tables
Tiered storage - Keep flat tables as source of truth, graph as index
Query abstraction - Hide graph complexity behind /cxq flags
Lazy extraction - Only create nodes for entities that matter

ADR-118: Four-Tier Database Architecture (extends)
ADR-020: Context Extraction System (integration point)
ADR-021: Context Query System (new query types)
ADR-053: Cloud Context Sync (graph sync to PostgreSQL)
ADR-136: CODITECT Experience Framework (context graph per persona)
ADR-148: Database Schema Documentation Standard (new tables)
ADR-149: Query Language Evolution (graph query DSL)

Appendix: Schema Summary

New Tables in org.db (Tier 2 - Critical)

Table	Purpose	Backup
`kg_nodes`	All entities as graph nodes	CRITICAL
`kg_edges`	Relationships between entities	CRITICAL
`kg_nodes_fts`	Full-text search over nodes	Regenerable

New Tables in sessions.db (Tier 3 - Regenerable)

Table	Purpose	Backup
`context_graphs`	Per-session context graph snapshots	Optional
`context_graph_usage`	How context graphs influenced tool calls	Optional

Entity Types

Type	Source	Example node_id
Decision	`decisions` table	`decision:42`
ErrorSolution	`error_solutions` table	`error:abc123`
SkillLearning	`skill_learnings` table	`learning:789`
Component	`platform.db` components	`component:agent/orchestrator`
Session	Session metadata	`session:uuid-here`
File	File references	`file:/path/to/file.py`
Function	`call_graph_functions`	`function:file.py:funcname`
Track	PILOT tracks	`track:A`
ADR	Architecture decisions	`adr:ADR-150`
Policy	Governance rules	`policy:no-force-push`
AuditEvent	Audit trail	`audit:event-123`

Edge Types

Type	From → To	Example
INVOKES	Session → Component	Session used orchestrator agent
PRODUCES	Session → Decision	Session resulted in architecture decision
SOLVES	ErrorSolution → Error	Solution fixes TypeError pattern
BELONGS_TO	Entity → Track	Task belongs to Track A
DEFINES	ADR → Decision	ADR-118 defines database split
REFERENCES	Decision → File	Decision references database.py
SIMILAR_TO	Entity ↔ Entity	Semantic similarity
CALLS	Function → Function	Code call graph (existing)
GOVERNED_BY	Entity → Policy	Component governed by policy
RECORDED_IN	AuditEvent → Session	Audit event from this session
CREATED_BY	Entity → User	Decision made by user

Track: J (Memory Intelligence) Task: J.16.3 Author: Claude (Opus 4.5) Created: 2026-02-03

Status​

Executive Summary​

Context​

Problem Statement​

Evidence: Existing Graph Foundation​

Research Foundation​

Decision​

1. Entity-Relationship Model (Knowledge Graph Layer)​

2. Database Schema Evolution​

Phase 1: Add kg_nodes and kg_edges Tables (org.db)​

Phase 2: Add Context Graph Builder Tables (sessions.db)​

3. Context Graph Builder Service​

4. MCP Server Extension​

5. Integration with Existing Systems​

Implementation Phases​

Phase 1: Schema Foundation (2 weeks)​

Phase 2: Context Graph Builder (2 weeks)​

Phase 3: MCP Server Extension (1 week)​

Phase 4: Agent Integration (2 weeks)​

Value Proposition​

For CODITECT Internal Development​

For CODITECT Customers​

Consequences​

Positive​

Negative​

Mitigations​

Related​

Appendix: Schema Summary​

New Tables in org.db (Tier 2 - Critical)​

New Tables in sessions.db (Tier 3 - Regenerable)​

Entity Types​

Edge Types​

Status

Executive Summary

Context

Problem Statement

Evidence: Existing Graph Foundation

Research Foundation

Decision

1. Entity-Relationship Model (Knowledge Graph Layer)

2. Database Schema Evolution

Phase 1: Add `kg_nodes` and `kg_edges` Tables (org.db)

Phase 2: Add Context Graph Builder Tables (sessions.db)

3. Context Graph Builder Service

4. MCP Server Extension

5. Integration with Existing Systems

Implementation Phases

Phase 1: Schema Foundation (2 weeks)

Phase 2: Context Graph Builder (2 weeks)

Phase 3: MCP Server Extension (1 week)

Phase 4: Agent Integration (2 weeks)

Value Proposition

For CODITECT Internal Development

For CODITECT Customers

Consequences

Positive

Negative

Mitigations

Related

Appendix: Schema Summary

New Tables in org.db (Tier 2 - Critical)

New Tables in sessions.db (Tier 3 - Regenerable)

Entity Types

Edge Types