Skip to main content

ADR-080: MCP Semantic Search with Hybrid RRF Fusion

Document: ADR-080-mcp-semantic-search-hybrid-rrf
Version: 1.0.0
Purpose: Document architectural decision for MCP-based semantic search with hybrid RRF fusion
Audience: Framework contributors, AI researchers, system architects
Date Created: 2026-01-17
Last Updated: 2026-01-17
Status: ACCEPTED
Task ID: H.5.5.1
Related ADRs: ADR-020 (Context Extraction), ADR-021 (Context Query), ADR-003 (ChromaDB)
Related Documents:
- tools/mcp-semantic-search/server.py
- tools/mcp-semantic-search/CLAUDE.md
- tests/tools/test_mcp_semantic_search.py

Context and Problem Statement

Background

The CODITECT framework has existing semantic search capabilities implemented in scripts/context-db.py (documented in ADR-021). However, these capabilities are:

  1. CLI-bound: Only accessible via /cxq command, not programmatically by AI agents
  2. Single-mode: FTS5 and semantic search operate independently, not combined
  3. Not MCP-enabled: Cannot be used as MCP tools by Claude Code or other MCP clients

Problem Statement

How do we expose semantic search capabilities to AI agents via MCP while improving search quality through hybrid fusion algorithms?

Requirements

Must-Have:

  • MCP protocol compliance for tool integration
  • Combine FTS5 and vector search for better results
  • Use existing infrastructure (context.db, embeddings)
  • Sub-second query latency for interactive use

Should-Have:

  • Configurable fusion weights
  • Multiple search modes (hybrid, semantic-only, keyword-only)
  • Decision and error knowledge base search

Nice-to-Have:

  • Streaming results
  • Search analytics
  • Result caching

Decision Drivers

Technical Constraints

  • T1: Must use existing SQLite database (context.db)
  • T2: Must use existing embeddings (all-MiniLM-L6-v2, 384 dimensions)
  • T3: Must be deployable without additional infrastructure
  • T4: Must support Claude Code MCP integration

Performance Goals

  • P1: Query latency <500ms for interactive use
  • P2: Handle 143K+ messages corpus efficiently
  • P3: Minimal memory footprint (<200MB)

User Experience

  • UX1: Consistent results across search modes
  • UX2: Relevance ranking that makes sense to users
  • UX3: Easy integration with existing workflows

Considered Options

Option A: ChromaDB Migration (Full Rewrite)

Description: Migrate from SQLite to ChromaDB, a purpose-built vector database.

Pros:

  • Native vector operations optimized for similarity search
  • Built-in HNSW indexing for scalability
  • Growing ecosystem and community

Cons:

  • Requires data migration from SQLite
  • Loses FTS5 capabilities (weaker text search)
  • Additional dependency and infrastructure
  • Significant implementation effort (16-24 hours)

Estimated Effort: 16-24 hours

Rejected: Too disruptive, loses FTS5 strengths, high effort.


Option B: Enhanced Hybrid Search (RRF Fusion) - SELECTED

Description: Add MCP server that combines existing FTS5 and vector search using Reciprocal Rank Fusion (RRF).

Pros:

  • Uses existing infrastructure (no migration)
  • Combines strengths of both search methods
  • RRF is proven algorithm (used by Elastic, Cohere)
  • Moderate implementation effort (6-8 hours)
  • MCP-native for AI agent access

Cons:

  • Brute-force vector search (no ANN indexing)
  • Limited to current scale (~150K messages)

Estimated Effort: 6-8 hours

Selected: Best value/effort ratio, builds on existing infrastructure.


Option C: Full AST-Based Chunking + Dedicated Vector Store

Description: Rewrite context extraction with AST-based chunking for code-aware semantic search, plus dedicated vector store.

Pros:

  • Optimal semantic chunks for code (functions, classes, methods)
  • Improved code search relevance
  • Future-proof architecture

Cons:

  • Requires rewriting extraction pipeline
  • Language-specific AST parsers needed
  • Significant implementation effort (16-24 hours)
  • Breaking change to existing data

Estimated Effort: 16-24 hours

Deferred: Good future direction (H.5.5.2-H.5.5.4), but too much scope for H.5.5.1.


Decision Outcome

CHOSEN: Option B - Enhanced Hybrid Search with RRF Fusion

Rationale

  1. Value vs Effort: Option B delivers 80% of the value at 30% of the effort
  2. Incremental: Builds on existing infrastructure without migration risk
  3. MCP-First: Designed for AI agent access from the start
  4. Proven Algorithm: RRF is battle-tested in production systems

Future Path

Option B enables Option C as a future enhancement:

  • H.5.5.2: Add call graph navigation tools
  • H.5.5.3: Create impact analysis MCP tool
  • H.5.5.4: Add document RAG integration (potential AST chunking)

Technical Implementation

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ MCP SEMANTIC SEARCH SERVER │
├─────────────────────────────────────────────────────────────────┤
│ │
│ MCP Client (Claude Code) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ MCP Tools ││
│ │ • hybrid_search - FTS5 + Vector with RRF fusion ││
│ │ • semantic_search - Vector similarity only ││
│ │ • keyword_search - FTS5 only ││
│ │ • search_decisions - Decision knowledge base ││
│ │ • search_errors - Error-solution pairs ││
│ │ • context_stats - Database statistics ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Reciprocal Rank Fusion (RRF) ││
│ │ ││
│ │ RRF_score = Σ (weight_i / (k + rank_i)) ││
│ │ ││
│ │ k = 60 (standard constant) ││
│ │ Default weights: FTS=0.4, Vector=0.6 ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Search Backends ││
│ │ ││
│ │ FTS5 Search Vector Search ││
│ │ ┌────────────┐ ┌────────────┐ ││
│ │ │ messages │ │ embeddings │ ││
│ │ │ _fts │ │ table │ ││
│ │ └────────────┘ └────────────┘ ││
│ │ │ │ ││
│ │ └─────────┬───────────┘ ││
│ │ │ ││
│ │ ▼ ││
│ │ context.db ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘

Reciprocal Rank Fusion (RRF) Algorithm

RRF combines ranked lists from multiple search methods into a single ranking:

def rrf_fusion(
fts_results: List[Dict],
vector_results: List[Dict],
k: int = 60,
fts_weight: float = 0.4,
vector_weight: float = 0.6
) -> List[Dict]:
"""
Reciprocal Rank Fusion algorithm.

RRF_score(d) = Σ (weight_i / (k + rank_i(d)))

Args:
fts_results: Ranked results from FTS5 search
vector_results: Ranked results from vector search
k: RRF constant (default 60, standard value)
fts_weight: Weight for FTS5 results (0.0-1.0)
vector_weight: Weight for vector results (0.0-1.0)

Returns:
Fused results sorted by combined RRF score
"""
scores = {}

# Add FTS5 contributions
for rank, result in enumerate(fts_results, start=1):
doc_id = result['id']
scores[doc_id] = scores.get(doc_id, 0) + fts_weight / (k + rank)

# Add vector contributions
for rank, result in enumerate(vector_results, start=1):
doc_id = result['id']
scores[doc_id] = scores.get(doc_id, 0) + vector_weight / (k + rank)

# Sort by combined score
return sorted(
[{'id': k, 'rrf_score': v} for k, v in scores.items()],
key=lambda x: x['rrf_score'],
reverse=True
)

Why k=60?

  • Standard value used by Elastic, Cohere, and academic research
  • Prevents high ranks from dominating (rank 1 gets score 1/61, not 1/1)
  • Empirically validated across many retrieval tasks

Default Weights (FTS=0.4, Vector=0.6):

  • Vector search better for semantic/conceptual queries
  • FTS5 better for exact matches and technical terms
  • 60/40 split favors semantic while preserving keyword precision

MCP Tool Definitions

@mcp_tool
def hybrid_search(
query: str,
limit: int = 20,
fts_weight: float = 0.4,
vector_weight: float = 0.6,
vector_threshold: float = 0.3
) -> List[Dict[str, Any]]:
"""
Hybrid search combining FTS5 keyword search and vector similarity.

Uses Reciprocal Rank Fusion (RRF) to combine rankings.
Best for general queries where you want both exact matches and semantic similarity.
"""

@mcp_tool
def semantic_search(
query: str,
limit: int = 20,
threshold: float = 0.3
) -> List[Dict[str, Any]]:
"""
Pure vector similarity search using embeddings.

Best for conceptual queries where exact keywords may not be present.
"""

@mcp_tool
def keyword_search(
query: str,
limit: int = 20
) -> List[Dict[str, Any]]:
"""
Pure FTS5 full-text search.

Best for exact term matching, error messages, or technical identifiers.
"""

@mcp_tool
def search_decisions(
query: str,
limit: int = 10,
decision_type: Optional[str] = None
) -> List[Dict[str, Any]]:
"""
Search the decision knowledge base.

Returns architectural and technical decisions with rationale.
"""

@mcp_tool
def search_errors(
query: str,
limit: int = 10
) -> List[Dict[str, Any]]:
"""
Search the error-solution knowledge base.

Returns error signatures and their verified solutions.
"""

@mcp_tool
def context_stats() -> Dict[str, Any]:
"""
Get database statistics.

Returns message counts, embedding coverage, knowledge base sizes.
"""

Configuration

MCP Server Registration (.mcp.json):

{
"mcpServers": {
"coditect-semantic-search": {
"command": "python3",
"args": [
"/Users/halcasteel/.coditect/tools/mcp-semantic-search/server.py",
"--mcp"
],
"env": {
"CONTEXT_DB_PATH": "~/PROJECTS/.coditect-data/context-storage/context.db"
}
}
}
}

CLI Usage (non-MCP):

# Hybrid search (default)
python3 server.py "authentication error" --limit 10

# Semantic only
python3 server.py "authentication error" --mode semantic

# Keyword only
python3 server.py "authentication error" --mode keyword

# Custom weights
python3 server.py "authentication error" --fts-weight 0.3 --vector-weight 0.7

Performance Characteristics

Query Latency (143,743 messages, 100% embedding coverage):

Search ModeLatencyNotes
FTS5 only<50msSQLite native
Vector only~200msBrute-force cosine similarity
Hybrid (RRF)~300msBoth + fusion

Memory Usage:

  • Embedding model: ~90MB (all-MiniLM-L6-v2)
  • Query processing: <50MB
  • Peak: ~150MB during embedding generation

Embedding Coverage:

  • Messages: 143,743 / 143,750 (99.99%)
  • Generated in: ~4 hours (batch processing)

Consequences

Positive Consequences

P1: MCP Integration

  • AI agents can now access semantic search programmatically
  • Claude Code can use search results for context injection
  • Enables autonomous knowledge retrieval

P2: Improved Search Quality

  • Hybrid search combines keyword precision + semantic recall
  • RRF fusion is proven algorithm with predictable behavior
  • Configurable weights allow tuning for specific use cases

P3: Minimal Disruption

  • Uses existing database and embeddings
  • No data migration required
  • Existing /cxq command remains functional

P4: Foundation for Future Work

  • H.5.5.2-H.5.5.4 can build on this infrastructure
  • Option C (AST chunking) remains viable future enhancement
  • MCP pattern can be applied to other tools

Negative Consequences

N1: Brute-Force Vector Search

  • O(n) complexity for vector similarity
  • Acceptable at current scale (143K messages)
  • May need ANN indexing at 500K+ messages

N2: No Real-Time Updates

  • Embeddings generated in batch
  • New messages require re-running /cx --with-embeddings
  • Acceptable for session-based workflow

N3: Single Model Dependency

  • Tied to all-MiniLM-L6-v2 embeddings
  • Model upgrade requires regenerating all embeddings
  • Mitigated by batch regeneration capability

Option B vs Option C: Value Analysis

DimensionOption B (Selected)Option C (Deferred)
Implementation Effort6-8 hours16-24 hours
Value Delivered80%100%
RiskLow (uses existing infra)Medium (migration required)
Breaking ChangesNoneYes (extraction pipeline)
MCP IntegrationYesYes
Code-Aware SearchNoYes (AST chunking)
Scalability~500K messagesMillions
Time to ValueImmediate2-3 weeks

Conclusion: Option B is the right choice for H.5.5.1. Option C features can be added incrementally in H.5.5.2-H.5.5.4 without disrupting the foundation.


Competitive Analysis (January 2026)

Market Landscape

Research conducted January 2026 reveals that AST-based code analysis and call graphs are now table stakes in the AI coding assistant market.

ToolAST AnalysisCall GraphImpact AnalysisMemoryLicense
Code PathfinderYes (tree-sitter)YesPartialNoAGPL-3.0
CursorYes (tree-sitter)LimitedNoNoProprietary
JetBrains AIYes (native)YesYesLimitedProprietary
Augment CodeYesYesLimitedLimitedProprietary
CODITECTPlannedPlannedPlannedUniqueProprietary

Code Pathfinder (Direct Competitor)

Repository: github.com/shivasurya/code-pathfinder

Code Pathfinder is an open-source (AGPL-3.0) MCP server offering:

  • Tree-sitter AST parsing (Java, Python, Dockerfile)
  • Call graph generation with 5-pass analysis
  • ANTLR-based query DSL
  • CI/CD integration with SARIF output

Key Technical Decisions (to learn from):

  • 5 parallel workers for file parsing
  • SHA-256 node IDs for determinism
  • Lazy loading with byte offsets (reduces memory from 2.32GB to 2.18GB)
  • Object pooling for environment maps

CODITECT's True Moat

What competitors have (table stakes):

  • AST parsing
  • Call graph navigation
  • Basic code search

What CODITECT NOW has (unique):

  • Cross-session memory (ADR-020, ADR-021)
  • Decision tracking with rationale
  • Error-solution knowledge base
  • Hybrid RRF semantic search (H.5.5.1 - this ADR)
  • Memory-linked call graph (H.5.5.2 - COMPLETE)
  • AST parsing via tree-sitter (Python, JavaScript, TypeScript)

The unique combination:

╔═══════════════════════════════════════════════════════════════════════════╗
║ CODITECT's True Moat ║
╠═══════════════════════════════════════════════════════════════════════════╣
║ Competitors (Cursor, JetBrains, Code Pathfinder): ║
║ ✓ AST parsing ← Table stakes ║
║ ✓ Call graph ← Table stakes ║
║ ✗ Cross-session memory ← Missing ║
║ ✗ Decision awareness ← Missing ║
╠═══════════════════════════════════════════════════════════════════════════╣
║ CODITECT: ║
║ ✓ AST parsing ← Now implemented (tree-sitter) ║
║ ✓ Call graph ← Now implemented (H.5.5.2) ║
║ ✓ Cross-session memory ← Unique (memory_linked_search) ║
║ ✓ Decision awareness ← Unique (ADR integration) ║
╚═══════════════════════════════════════════════════════════════════════════╝

No competitor can answer:

  • "What did I change last session that might cause this error?"
  • "Which architectural decisions constrain this refactoring?"
  • "Show me all times I've fixed this type of error"
  • "When was this function last discussed and what decisions affected it?" (H.5.5.2)

Strategic Implications for H.5.5.x

TaskOriginal GoalRevised Goal (Memory-Aware)Status
H.5.5.1Semantic searchHybrid RRF fusion✅ COMPLETE
H.5.5.2Call graph navigationMemory-linked call graph✅ COMPLETE
H.5.5.3Impact analysisDecision-aware impactPending
H.5.5.4Document RAGCross-session doc retrievalPending

H.5.5.2 Implementation Details (Completed Jan 17, 2026)

MCP Call Graph Server: tools/mcp-call-graph/server.py (700+ lines)

ToolDescription
index_fileIndex source file into call graph
index_directoryBatch index directory
get_callersFind functions that call a given function
get_calleesFind functions called by a given function
call_chainFind call paths between functions
memory_linked_searchCODITECT UNIQUE - call graph with memory context
call_graph_statsDatabase statistics

Current Index Stats:

  • Functions: 5,590
  • Call Edges: 55,548
  • Files Indexed: 441

Database Schema:

call_graph_functions  -- Functions with signatures, docstrings
call_graph_edges -- Call relationships
call_graph_memory -- Links to session messages (CODITECT UNIQUE)

Tests: 22 passing (tests/tools/test_mcp_call_graph.py)

Research References

Full academic research and competitive analysis available in:

Key papers informing this decision:

  • Cormack et al. (2009) - RRF algorithm foundation
  • Microsoft GraphRAG (2024) - Knowledge graph + RAG
  • TAILOR (2023) - Code Property Graph learning

Files Created/Modified

FilePurpose
tools/mcp-semantic-search/server.pyMCP server implementation (500 lines)
tools/mcp-semantic-search/CLAUDE.mdTool documentation
tools/mcp-semantic-search/requirements.txtDependencies
tests/tools/test_mcp_semantic_search.pyUnit tests (26 tests)
.mcp.json (rollout-master)MCP server registration

Validation

Test Coverage

tests/tools/test_mcp_semantic_search.py - 26 tests
├── TestEmbeddingUtils (5 tests)
├── TestFTS5Search (4 tests)
├── TestVectorSearch (3 tests)
├── TestHybridSearch (5 tests)
├── TestDecisionSearch (2 tests)
├── TestErrorSearch (2 tests)
├── TestDatabaseStats (3 tests)
└── TestRRFScoring (2 tests)

Acceptance Criteria

  • MCP server starts and registers tools
  • hybrid_search returns fused results
  • semantic_search returns vector-only results
  • keyword_search returns FTS5-only results
  • search_decisions queries decision knowledge base
  • search_errors queries error-solution pairs
  • context_stats returns database metrics
  • CLI mode works for testing
  • All 26 unit tests passing

Indexing Strategy (v1.3.0)

Problem: Duplicate Data on Reindex

When reindexing files, naive append strategies cause data duplication:

  • Functions: INSERT OR REPLACE works (upserts by key)
  • Edges: INSERT appends, causing duplicates on each reindex

Solution: Delete+Insert Strategy

Per dbt incremental strategies and Microsoft best practices:

# Before inserting edges for a file, delete old ones
cursor.execute("""
DELETE FROM call_graph_edges
WHERE caller_id IN (
SELECT node_id FROM call_graph_functions
WHERE file_path = ?
)
""", (file_path,))
# Then INSERT new edges

Indexing Strategy Matrix

OperationStrategyRationale
Initial IndexFull buildComplete scan, build all indexes
File ChangedDelete+InsertRemove old edges, add new for that file
Periodic MaintenanceVACUUM if fragmentation >30%SQLite optimization
Full Rebuild--full-refresh flagDrop and recreate tables

Tree-sitter Incremental Parsing

Tree-sitter provides native incremental parsing:

  • Shares unchanged portions of syntax tree
  • Only re-parses edited portions
  • Future optimization: track file edits, not full re-parse

Implementation Status

ComponentStrategyStatus
Messages (context.db)INSERT OR IGNORE (by hash)✅ Correct
Functions (call_graph)INSERT OR REPLACE (by node_id)✅ Correct
Edges (call_graph)Delete+Insert (by file)✅ Fixed in v1.3.0
EmbeddingsSkip if exists (by message_id)✅ Correct

References

Documentation

Research

  • Reciprocal Rank Fusion: Cormack et al., 2009 - "Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods"
  • Hybrid Search: Karpukhin et al., 2020 - "Dense Passage Retrieval for Open-Domain Question Answering"
  • MCP Protocol: Anthropic Model Context Protocol
  • Elastic Hybrid Search (uses RRF)
  • Cohere Rerank (uses RRF fusion)
  • Pinecone Hybrid Search

Status: ACCEPTED Decision Date: 2026-01-17 Implementation Status: COMPLETE Task ID: H.5.5.1 Maintainer: CODITECT Core Team Review Date: 2026-04-17 (quarterly review)


Changelog

v1.3.0 (2026-01-17)

  • Added Indexing Strategy section documenting delete+insert approach
  • Fixed duplicate edges bug in call graph reindex (commit 63582c71)
  • Documented Tree-sitter incremental parsing for future optimization
  • Added references to dbt and Microsoft best practices

v1.2.0 (2026-01-17)

  • H.5.5.2 COMPLETE: Memory-linked call graph MCP server implemented
  • Added competitive position matrix showing CODITECT's moat
  • Updated strategic implications table with implementation status
  • Added H.5.5.2 implementation details (5,590 functions, 55,548 edges, 441 files)
  • Tree-sitter AST parsing now implemented (Python, JavaScript, TypeScript)

v1.1.0 (2026-01-17)

  • Added comprehensive competitive analysis section
  • Analyzed Code Pathfinder as direct competitor
  • Identified CODITECT's unique competitive moat (Memory + Decisions)
  • Updated H.5.5.x strategic recommendations
  • Added reference to CODE-INTELLIGENCE-RESEARCH.md

v1.0.0 (2026-01-17)

  • Initial ADR documenting MCP semantic search with hybrid RRF fusion
  • Documented Option A/B/C analysis and selection rationale
  • Defined MCP tool interfaces
  • Documented RRF algorithm implementation
  • Added performance characteristics and validation criteria