ADR-080: MCP Semantic Search with Hybrid RRF Fusion
Document: ADR-080-mcp-semantic-search-hybrid-rrf
Version: 1.0.0
Purpose: Document architectural decision for MCP-based semantic search with hybrid RRF fusion
Audience: Framework contributors, AI researchers, system architects
Date Created: 2026-01-17
Last Updated: 2026-01-17
Status: ACCEPTED
Task ID: H.5.5.1
Related ADRs: ADR-020 (Context Extraction), ADR-021 (Context Query), ADR-003 (ChromaDB)
Related Documents:
- tools/mcp-semantic-search/server.py
- tools/mcp-semantic-search/CLAUDE.md
- tests/tools/test_mcp_semantic_search.py
Context and Problem Statement
Background
The CODITECT framework has existing semantic search capabilities implemented in scripts/context-db.py (documented in ADR-021). However, these capabilities are:
- CLI-bound: Only accessible via
/cxqcommand, not programmatically by AI agents - Single-mode: FTS5 and semantic search operate independently, not combined
- Not MCP-enabled: Cannot be used as MCP tools by Claude Code or other MCP clients
Problem Statement
How do we expose semantic search capabilities to AI agents via MCP while improving search quality through hybrid fusion algorithms?
Requirements
Must-Have:
- MCP protocol compliance for tool integration
- Combine FTS5 and vector search for better results
- Use existing infrastructure (context.db, embeddings)
- Sub-second query latency for interactive use
Should-Have:
- Configurable fusion weights
- Multiple search modes (hybrid, semantic-only, keyword-only)
- Decision and error knowledge base search
Nice-to-Have:
- Streaming results
- Search analytics
- Result caching
Decision Drivers
Technical Constraints
- T1: Must use existing SQLite database (context.db)
- T2: Must use existing embeddings (all-MiniLM-L6-v2, 384 dimensions)
- T3: Must be deployable without additional infrastructure
- T4: Must support Claude Code MCP integration
Performance Goals
- P1: Query latency <500ms for interactive use
- P2: Handle 143K+ messages corpus efficiently
- P3: Minimal memory footprint (<200MB)
User Experience
- UX1: Consistent results across search modes
- UX2: Relevance ranking that makes sense to users
- UX3: Easy integration with existing workflows
Considered Options
Option A: ChromaDB Migration (Full Rewrite)
Description: Migrate from SQLite to ChromaDB, a purpose-built vector database.
Pros:
- Native vector operations optimized for similarity search
- Built-in HNSW indexing for scalability
- Growing ecosystem and community
Cons:
- Requires data migration from SQLite
- Loses FTS5 capabilities (weaker text search)
- Additional dependency and infrastructure
- Significant implementation effort (16-24 hours)
Estimated Effort: 16-24 hours
Rejected: Too disruptive, loses FTS5 strengths, high effort.
Option B: Enhanced Hybrid Search (RRF Fusion) - SELECTED
Description: Add MCP server that combines existing FTS5 and vector search using Reciprocal Rank Fusion (RRF).
Pros:
- Uses existing infrastructure (no migration)
- Combines strengths of both search methods
- RRF is proven algorithm (used by Elastic, Cohere)
- Moderate implementation effort (6-8 hours)
- MCP-native for AI agent access
Cons:
- Brute-force vector search (no ANN indexing)
- Limited to current scale (~150K messages)
Estimated Effort: 6-8 hours
Selected: Best value/effort ratio, builds on existing infrastructure.
Option C: Full AST-Based Chunking + Dedicated Vector Store
Description: Rewrite context extraction with AST-based chunking for code-aware semantic search, plus dedicated vector store.
Pros:
- Optimal semantic chunks for code (functions, classes, methods)
- Improved code search relevance
- Future-proof architecture
Cons:
- Requires rewriting extraction pipeline
- Language-specific AST parsers needed
- Significant implementation effort (16-24 hours)
- Breaking change to existing data
Estimated Effort: 16-24 hours
Deferred: Good future direction (H.5.5.2-H.5.5.4), but too much scope for H.5.5.1.
Decision Outcome
CHOSEN: Option B - Enhanced Hybrid Search with RRF Fusion
Rationale
- Value vs Effort: Option B delivers 80% of the value at 30% of the effort
- Incremental: Builds on existing infrastructure without migration risk
- MCP-First: Designed for AI agent access from the start
- Proven Algorithm: RRF is battle-tested in production systems
Future Path
Option B enables Option C as a future enhancement:
- H.5.5.2: Add call graph navigation tools
- H.5.5.3: Create impact analysis MCP tool
- H.5.5.4: Add document RAG integration (potential AST chunking)
Technical Implementation
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ MCP SEMANTIC SEARCH SERVER │
├─────────────────────────────────────────────────────────────────┤
│ │
│ MCP Client (Claude Code) │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ MCP Tools ││
│ │ • hybrid_search - FTS5 + Vector with RRF fusion ││
│ │ • semantic_search - Vector similarity only ││
│ │ • keyword_search - FTS5 only ││
│ │ • search_decisions - Decision knowledge base ││
│ │ • search_errors - Error-solution pairs ││
│ │ • context_stats - Database statistics ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Reciprocal Rank Fusion (RRF) ││
│ │ ││
│ │ RRF_score = Σ (weight_i / (k + rank_i)) ││
│ │ ││
│ │ k = 60 (standard constant) ││
│ │ Default weights: FTS=0.4, Vector=0.6 ││
│ └─────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Search Backends ││
│ │ ││
│ │ FTS5 Search Vector Search ││
│ │ ┌────────────┐ ┌────────────┐ ││
│ │ │ messages │ │ embeddings │ ││
│ │ │ _fts │ │ table │ ││
│ │ └────────────┘ └────────────┘ ││
│ │ │ │ ││
│ │ └─────────┬───────────┘ ││
│ │ │ ││
│ │ ▼ ││
│ │ context.db ││
│ └─────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────┘
Reciprocal Rank Fusion (RRF) Algorithm
RRF combines ranked lists from multiple search methods into a single ranking:
def rrf_fusion(
fts_results: List[Dict],
vector_results: List[Dict],
k: int = 60,
fts_weight: float = 0.4,
vector_weight: float = 0.6
) -> List[Dict]:
"""
Reciprocal Rank Fusion algorithm.
RRF_score(d) = Σ (weight_i / (k + rank_i(d)))
Args:
fts_results: Ranked results from FTS5 search
vector_results: Ranked results from vector search
k: RRF constant (default 60, standard value)
fts_weight: Weight for FTS5 results (0.0-1.0)
vector_weight: Weight for vector results (0.0-1.0)
Returns:
Fused results sorted by combined RRF score
"""
scores = {}
# Add FTS5 contributions
for rank, result in enumerate(fts_results, start=1):
doc_id = result['id']
scores[doc_id] = scores.get(doc_id, 0) + fts_weight / (k + rank)
# Add vector contributions
for rank, result in enumerate(vector_results, start=1):
doc_id = result['id']
scores[doc_id] = scores.get(doc_id, 0) + vector_weight / (k + rank)
# Sort by combined score
return sorted(
[{'id': k, 'rrf_score': v} for k, v in scores.items()],
key=lambda x: x['rrf_score'],
reverse=True
)
Why k=60?
- Standard value used by Elastic, Cohere, and academic research
- Prevents high ranks from dominating (rank 1 gets score 1/61, not 1/1)
- Empirically validated across many retrieval tasks
Default Weights (FTS=0.4, Vector=0.6):
- Vector search better for semantic/conceptual queries
- FTS5 better for exact matches and technical terms
- 60/40 split favors semantic while preserving keyword precision
MCP Tool Definitions
@mcp_tool
def hybrid_search(
query: str,
limit: int = 20,
fts_weight: float = 0.4,
vector_weight: float = 0.6,
vector_threshold: float = 0.3
) -> List[Dict[str, Any]]:
"""
Hybrid search combining FTS5 keyword search and vector similarity.
Uses Reciprocal Rank Fusion (RRF) to combine rankings.
Best for general queries where you want both exact matches and semantic similarity.
"""
@mcp_tool
def semantic_search(
query: str,
limit: int = 20,
threshold: float = 0.3
) -> List[Dict[str, Any]]:
"""
Pure vector similarity search using embeddings.
Best for conceptual queries where exact keywords may not be present.
"""
@mcp_tool
def keyword_search(
query: str,
limit: int = 20
) -> List[Dict[str, Any]]:
"""
Pure FTS5 full-text search.
Best for exact term matching, error messages, or technical identifiers.
"""
@mcp_tool
def search_decisions(
query: str,
limit: int = 10,
decision_type: Optional[str] = None
) -> List[Dict[str, Any]]:
"""
Search the decision knowledge base.
Returns architectural and technical decisions with rationale.
"""
@mcp_tool
def search_errors(
query: str,
limit: int = 10
) -> List[Dict[str, Any]]:
"""
Search the error-solution knowledge base.
Returns error signatures and their verified solutions.
"""
@mcp_tool
def context_stats() -> Dict[str, Any]:
"""
Get database statistics.
Returns message counts, embedding coverage, knowledge base sizes.
"""
Configuration
MCP Server Registration (.mcp.json):
{
"mcpServers": {
"coditect-semantic-search": {
"command": "python3",
"args": [
"/Users/halcasteel/.coditect/tools/mcp-semantic-search/server.py",
"--mcp"
],
"env": {
"CONTEXT_DB_PATH": "~/PROJECTS/.coditect-data/context-storage/context.db"
}
}
}
}
CLI Usage (non-MCP):
# Hybrid search (default)
python3 server.py "authentication error" --limit 10
# Semantic only
python3 server.py "authentication error" --mode semantic
# Keyword only
python3 server.py "authentication error" --mode keyword
# Custom weights
python3 server.py "authentication error" --fts-weight 0.3 --vector-weight 0.7
Performance Characteristics
Query Latency (143,743 messages, 100% embedding coverage):
| Search Mode | Latency | Notes |
|---|---|---|
| FTS5 only | <50ms | SQLite native |
| Vector only | ~200ms | Brute-force cosine similarity |
| Hybrid (RRF) | ~300ms | Both + fusion |
Memory Usage:
- Embedding model: ~90MB (all-MiniLM-L6-v2)
- Query processing: <50MB
- Peak: ~150MB during embedding generation
Embedding Coverage:
- Messages: 143,743 / 143,750 (99.99%)
- Generated in: ~4 hours (batch processing)
Consequences
Positive Consequences
P1: MCP Integration
- AI agents can now access semantic search programmatically
- Claude Code can use search results for context injection
- Enables autonomous knowledge retrieval
P2: Improved Search Quality
- Hybrid search combines keyword precision + semantic recall
- RRF fusion is proven algorithm with predictable behavior
- Configurable weights allow tuning for specific use cases
P3: Minimal Disruption
- Uses existing database and embeddings
- No data migration required
- Existing /cxq command remains functional
P4: Foundation for Future Work
- H.5.5.2-H.5.5.4 can build on this infrastructure
- Option C (AST chunking) remains viable future enhancement
- MCP pattern can be applied to other tools
Negative Consequences
N1: Brute-Force Vector Search
- O(n) complexity for vector similarity
- Acceptable at current scale (143K messages)
- May need ANN indexing at 500K+ messages
N2: No Real-Time Updates
- Embeddings generated in batch
- New messages require re-running
/cx --with-embeddings - Acceptable for session-based workflow
N3: Single Model Dependency
- Tied to all-MiniLM-L6-v2 embeddings
- Model upgrade requires regenerating all embeddings
- Mitigated by batch regeneration capability
Option B vs Option C: Value Analysis
| Dimension | Option B (Selected) | Option C (Deferred) |
|---|---|---|
| Implementation Effort | 6-8 hours | 16-24 hours |
| Value Delivered | 80% | 100% |
| Risk | Low (uses existing infra) | Medium (migration required) |
| Breaking Changes | None | Yes (extraction pipeline) |
| MCP Integration | Yes | Yes |
| Code-Aware Search | No | Yes (AST chunking) |
| Scalability | ~500K messages | Millions |
| Time to Value | Immediate | 2-3 weeks |
Conclusion: Option B is the right choice for H.5.5.1. Option C features can be added incrementally in H.5.5.2-H.5.5.4 without disrupting the foundation.
Competitive Analysis (January 2026)
Market Landscape
Research conducted January 2026 reveals that AST-based code analysis and call graphs are now table stakes in the AI coding assistant market.
| Tool | AST Analysis | Call Graph | Impact Analysis | Memory | License |
|---|---|---|---|---|---|
| Code Pathfinder | Yes (tree-sitter) | Yes | Partial | No | AGPL-3.0 |
| Cursor | Yes (tree-sitter) | Limited | No | No | Proprietary |
| JetBrains AI | Yes (native) | Yes | Yes | Limited | Proprietary |
| Augment Code | Yes | Yes | Limited | Limited | Proprietary |
| CODITECT | Planned | Planned | Planned | Unique | Proprietary |
Code Pathfinder (Direct Competitor)
Repository: github.com/shivasurya/code-pathfinder
Code Pathfinder is an open-source (AGPL-3.0) MCP server offering:
- Tree-sitter AST parsing (Java, Python, Dockerfile)
- Call graph generation with 5-pass analysis
- ANTLR-based query DSL
- CI/CD integration with SARIF output
Key Technical Decisions (to learn from):
- 5 parallel workers for file parsing
- SHA-256 node IDs for determinism
- Lazy loading with byte offsets (reduces memory from 2.32GB to 2.18GB)
- Object pooling for environment maps
CODITECT's True Moat
What competitors have (table stakes):
- AST parsing
- Call graph navigation
- Basic code search
What CODITECT NOW has (unique):
- Cross-session memory (ADR-020, ADR-021)
- Decision tracking with rationale
- Error-solution knowledge base
- Hybrid RRF semantic search (H.5.5.1 - this ADR)
- Memory-linked call graph (H.5.5.2 - COMPLETE)
- AST parsing via tree-sitter (Python, JavaScript, TypeScript)
The unique combination:
╔═══════════════════════════════════════════════════════════════════════════╗
║ CODITECT's True Moat ║
╠═══════════════════════════════════════════════════════════════════════════╣
║ Competitors (Cursor, JetBrains, Code Pathfinder): ║
║ ✓ AST parsing ← Table stakes ║
║ ✓ Call graph ← Table stakes ║
║ ✗ Cross-session memory ← Missing ║
║ ✗ Decision awareness ← Missing ║
╠═══════════════════════════════════════════════════════════════════════════╣
║ CODITECT: ║
║ ✓ AST parsing ← Now implemented (tree-sitter) ║
║ ✓ Call graph ← Now implemented (H.5.5.2) ║
║ ✓ Cross-session memory ← Unique (memory_linked_search) ║
║ ✓ Decision awareness ← Unique (ADR integration) ║
╚═══════════════════════════════════════════════════════════════════════════╝
No competitor can answer:
- "What did I change last session that might cause this error?"
- "Which architectural decisions constrain this refactoring?"
- "Show me all times I've fixed this type of error"
- "When was this function last discussed and what decisions affected it?" (H.5.5.2)
Strategic Implications for H.5.5.x
| Task | Original Goal | Revised Goal (Memory-Aware) | Status |
|---|---|---|---|
| H.5.5.1 | Semantic search | Hybrid RRF fusion | ✅ COMPLETE |
| H.5.5.2 | Call graph navigation | Memory-linked call graph | ✅ COMPLETE |
| H.5.5.3 | Impact analysis | Decision-aware impact | Pending |
| H.5.5.4 | Document RAG | Cross-session doc retrieval | Pending |
H.5.5.2 Implementation Details (Completed Jan 17, 2026)
MCP Call Graph Server: tools/mcp-call-graph/server.py (700+ lines)
| Tool | Description |
|---|---|
index_file | Index source file into call graph |
index_directory | Batch index directory |
get_callers | Find functions that call a given function |
get_callees | Find functions called by a given function |
call_chain | Find call paths between functions |
memory_linked_search | CODITECT UNIQUE - call graph with memory context |
call_graph_stats | Database statistics |
Current Index Stats:
- Functions: 5,590
- Call Edges: 55,548
- Files Indexed: 441
Database Schema:
call_graph_functions -- Functions with signatures, docstrings
call_graph_edges -- Call relationships
call_graph_memory -- Links to session messages (CODITECT UNIQUE)
Tests: 22 passing (tests/tools/test_mcp_call_graph.py)
Research References
Full academic research and competitive analysis available in:
Key papers informing this decision:
- Cormack et al. (2009) - RRF algorithm foundation
- Microsoft GraphRAG (2024) - Knowledge graph + RAG
- TAILOR (2023) - Code Property Graph learning
Files Created/Modified
| File | Purpose |
|---|---|
tools/mcp-semantic-search/server.py | MCP server implementation (500 lines) |
tools/mcp-semantic-search/CLAUDE.md | Tool documentation |
tools/mcp-semantic-search/requirements.txt | Dependencies |
tests/tools/test_mcp_semantic_search.py | Unit tests (26 tests) |
.mcp.json (rollout-master) | MCP server registration |
Validation
Test Coverage
tests/tools/test_mcp_semantic_search.py - 26 tests
├── TestEmbeddingUtils (5 tests)
├── TestFTS5Search (4 tests)
├── TestVectorSearch (3 tests)
├── TestHybridSearch (5 tests)
├── TestDecisionSearch (2 tests)
├── TestErrorSearch (2 tests)
├── TestDatabaseStats (3 tests)
└── TestRRFScoring (2 tests)
Acceptance Criteria
- MCP server starts and registers tools
- hybrid_search returns fused results
- semantic_search returns vector-only results
- keyword_search returns FTS5-only results
- search_decisions queries decision knowledge base
- search_errors queries error-solution pairs
- context_stats returns database metrics
- CLI mode works for testing
- All 26 unit tests passing
Indexing Strategy (v1.3.0)
Problem: Duplicate Data on Reindex
When reindexing files, naive append strategies cause data duplication:
- Functions:
INSERT OR REPLACEworks (upserts by key) - Edges:
INSERTappends, causing duplicates on each reindex
Solution: Delete+Insert Strategy
Per dbt incremental strategies and Microsoft best practices:
# Before inserting edges for a file, delete old ones
cursor.execute("""
DELETE FROM call_graph_edges
WHERE caller_id IN (
SELECT node_id FROM call_graph_functions
WHERE file_path = ?
)
""", (file_path,))
# Then INSERT new edges
Indexing Strategy Matrix
| Operation | Strategy | Rationale |
|---|---|---|
| Initial Index | Full build | Complete scan, build all indexes |
| File Changed | Delete+Insert | Remove old edges, add new for that file |
| Periodic Maintenance | VACUUM if fragmentation >30% | SQLite optimization |
| Full Rebuild | --full-refresh flag | Drop and recreate tables |
Tree-sitter Incremental Parsing
Tree-sitter provides native incremental parsing:
- Shares unchanged portions of syntax tree
- Only re-parses edited portions
- Future optimization: track file edits, not full re-parse
Implementation Status
| Component | Strategy | Status |
|---|---|---|
| Messages (context.db) | INSERT OR IGNORE (by hash) | ✅ Correct |
| Functions (call_graph) | INSERT OR REPLACE (by node_id) | ✅ Correct |
| Edges (call_graph) | Delete+Insert (by file) | ✅ Fixed in v1.3.0 |
| Embeddings | Skip if exists (by message_id) | ✅ Correct |
References
Documentation
Research
- Reciprocal Rank Fusion: Cormack et al., 2009 - "Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods"
- Hybrid Search: Karpukhin et al., 2020 - "Dense Passage Retrieval for Open-Domain Question Answering"
- MCP Protocol: Anthropic Model Context Protocol
Related Systems
- Elastic Hybrid Search (uses RRF)
- Cohere Rerank (uses RRF fusion)
- Pinecone Hybrid Search
Status: ACCEPTED Decision Date: 2026-01-17 Implementation Status: COMPLETE Task ID: H.5.5.1 Maintainer: CODITECT Core Team Review Date: 2026-04-17 (quarterly review)
Changelog
v1.3.0 (2026-01-17)
- Added Indexing Strategy section documenting delete+insert approach
- Fixed duplicate edges bug in call graph reindex (commit 63582c71)
- Documented Tree-sitter incremental parsing for future optimization
- Added references to dbt and Microsoft best practices
v1.2.0 (2026-01-17)
- H.5.5.2 COMPLETE: Memory-linked call graph MCP server implemented
- Added competitive position matrix showing CODITECT's moat
- Updated strategic implications table with implementation status
- Added H.5.5.2 implementation details (5,590 functions, 55,548 edges, 441 files)
- Tree-sitter AST parsing now implemented (Python, JavaScript, TypeScript)
v1.1.0 (2026-01-17)
- Added comprehensive competitive analysis section
- Analyzed Code Pathfinder as direct competitor
- Identified CODITECT's unique competitive moat (Memory + Decisions)
- Updated H.5.5.x strategic recommendations
- Added reference to CODE-INTELLIGENCE-RESEARCH.md
v1.0.0 (2026-01-17)
- Initial ADR documenting MCP semantic search with hybrid RRF fusion
- Documented Option A/B/C analysis and selection rationale
- Defined MCP tool interfaces
- Documented RRF algorithm implementation
- Added performance characteristics and validation criteria