ADR-005: Hybrid Search with Reciprocal Rank Fusion (RRF)
Status: Proposed Date: 2025-11-26 Deciders: Architecture Team, ML/Search Team Related ADRs: ADR-002 (PostgreSQL + Weaviate), ADR-004 (Multi-Tenant RLS)
Context
The context intelligence platform must provide powerful search across AI conversation history to help developers find:
- Past decisions: "Why did we choose PostgreSQL over MongoDB?"
- Code patterns: "How did we implement authentication?"
- Bug discussions: "What was that error with React hooks?"
- Team knowledge: "Who discussed GraphQL performance?"
Search Quality Requirements
| Requirement | Target | Importance |
|---|---|---|
| Relevance | 8/10 user satisfaction | Critical |
| Latency | p95 <100ms | High |
| Recall | 90%+ (find what exists) | High |
| Precision | 80%+ (top 10 results relevant) | High |
| Multi-tenant isolation | 100% (no data leakage) | Critical |
Search Challenges
1. Keyword Search Limitations (PostgreSQL Full-Text)
- ✅ Exact matches work well ("authentication error")
- ❌ Synonyms fail ("auth error" doesn't match "authentication error")
- ❌ Conceptual queries fail ("security bug" doesn't match "XSS vulnerability")
- ❌ Typos fail ("athentication" doesn't match "authentication")
2. Semantic Search Limitations (Weaviate Vector)
- ✅ Conceptual matches work well ("database" matches "PostgreSQL", "MySQL")
- ✅ Synonyms work ("bug" matches "error", "issue")
- ❌ Exact IDs fail ("ABC-123" won't match "ABC-124")
- ❌ Long-tail keywords fail ("PostgreSQL 15.3" might match "MySQL 8.0")
3. Solution: Hybrid Search Combine both approaches to get best of both worlds:
- PostgreSQL for exact matches, IDs, specific terms
- Weaviate for semantic understanding, synonyms, concepts
- Fusion algorithm to merge results intelligently
Decision
We will implement Hybrid Search using Reciprocal Rank Fusion (RRF) to combine PostgreSQL full-text search (keyword) with Weaviate vector search (semantic).
Architecture
┌─────────────────────────────────────────────────────────────┐
│ User Query: "authentication bug in React hooks" │
└───────────────────────┬─────────────────────────────────────┘
↓
┌───────────────┴───────────────┐
↓ ↓
┌──────────────────┐ ┌──────────────────┐
│ Keyword Search │ │ Semantic Search │
│ (PostgreSQL FTS) │ │ (Weaviate) │
│ │ │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ ts_query │ │ │ │ Text │ │
│ │ ("auth" & │ │ │ │ Embedding │ │
│ │ "bug" & │ │ │ │ (OpenAI) │ │
│ │ "react" & │ │ │ │ │ │
│ │ "hook") │ │ │ │ Cosine │ │
│ └──────────────┘ │ │ │ Similarity │ │
│ │ │ └──────────────┘ │
│ Results (ranked):│ │ Results (ranked):│
│ 1. Conv #42 │ │ 1. Conv #15 │
│ 2. Conv #15 │ │ 2. Conv #42 │
│ 3. Conv #91 │ │ 3. Conv #7 │
│ 4. Conv #7 │ │ 4. Conv #28 │
│ 5. Conv #33 │ │ 5. Conv #91 │
└──────────────────┘ └──────────────────┘
↓ ↓
└───────────────┬───────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Reciprocal Rank Fusion (RRF) │
│ │
│ RRF Score = Σ [ 1 / (k + rank_in_list) ] │
│ │
│ Conv #42: 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325 │
│ Conv #15: 1/(60+2) + 1/(60+1) = 0.0161 + 0.0164 = 0.0325 │
│ Conv #91: 1/(60+3) + 1/(60+5) = 0.0159 + 0.0154 = 0.0313 │
│ Conv #7: 1/(60+4) + 1/(60+3) = 0.0156 + 0.0159 = 0.0315 │
│ Conv #28: 0 + 1/(60+4) = 0.0156 │
│ Conv #33: 1/(60+5) + 0 = 0.0154 │
│ │
│ Final Ranking: │
│ 1. Conv #42 (0.0325) - Appears in both top 2 │
│ 2. Conv #15 (0.0325) - Appears in both top 2 │
│ 3. Conv #7 (0.0315) - Appears in both top 5 │
│ 4. Conv #91 (0.0313) - Appears in both top 5 │
│ 5. Conv #28 (0.0156) - Semantic only │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ Returned to User (Top 5): │
│ 1. #42: "Auth bug in React useEffect hook" │
│ 2. #15: "Authentication token refresh issue" │
│ 3. #7: "Security vulnerability in login flow" │
│ 4. #91: "React component re-render loop" │
│ 5. #28: "Session management best practices" │
└─────────────────────────────────────────────────────────────┘
Implementation
Step 1: PostgreSQL Full-Text Search Setup
-- Add tsvector column for full-text search
ALTER TABLE conversations ADD COLUMN title_tsv tsvector;
ALTER TABLE messages ADD COLUMN content_tsv tsvector;
-- Auto-update tsvector on insert/update
CREATE TRIGGER conversations_tsvector_update BEFORE INSERT OR UPDATE
ON conversations FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(title_tsv, 'pg_catalog.english', title);
CREATE TRIGGER messages_tsvector_update BEFORE INSERT OR UPDATE
ON messages FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(content_tsv, 'pg_catalog.english', content);
-- Create GIN indexes for fast full-text search
CREATE INDEX idx_conversations_title_tsv ON conversations USING GIN (title_tsv);
CREATE INDEX idx_messages_content_tsv ON messages USING GIN (content_tsv);
-- Composite index for multi-tenant full-text search
CREATE INDEX idx_messages_org_tsv ON messages(organization_id, content_tsv);
Keyword Search Query (Async SQLAlchemy):
# core/repositories/sqlalchemy_conversation_repo.py
from sqlalchemy import func, select
from sqlalchemy.dialects.postgresql import TSVECTOR
async def keyword_search(
self,
organization_id: uuid.UUID,
query: str,
limit: int = 20
) -> List[Conversation]:
"""PostgreSQL full-text search"""
# Parse query into tsquery
tsquery = func.plainto_tsquery('english', query)
# Search conversations by title
stmt = (
select(ConversationModel)
.where(ConversationModel.organization_id == organization_id)
.where(ConversationModel.title_tsv.match(tsquery))
.order_by(
func.ts_rank(ConversationModel.title_tsv, tsquery).desc()
)
.limit(limit)
)
result = await self.session.execute(stmt)
return [self._to_domain(m) for m in result.scalars()]
Step 2: Weaviate Vector Search Setup
# core/clients/weaviate_client.py
import weaviate
from typing import List
import uuid
class WeaviateClient:
"""Weaviate vector search client"""
def __init__(self, url: str, api_key: str):
self.client = weaviate.Client(
url=url,
auth_client_secret=weaviate.AuthApiKey(api_key)
)
async def semantic_search(
self,
organization_id: uuid.UUID,
query: str,
limit: int = 20
) -> List[dict]:
"""Semantic search using Weaviate"""
result = (
self.client.query
.get("Conversation", ["id", "title", "content"])
.with_tenant(tenant=str(organization_id)) # Multi-tenant isolation
.with_near_text({"concepts": [query]})
.with_limit(limit)
.with_additional(["distance", "certainty"])
.do()
)
if not result or "data" not in result:
return []
conversations = result["data"]["Get"]["Conversation"]
return [
{
"id": uuid.UUID(conv["id"]),
"title": conv["title"],
"score": conv["_additional"]["certainty"],
"rank": idx + 1
}
for idx, conv in enumerate(conversations)
]
Step 3: Reciprocal Rank Fusion (RRF)
# core/services/search_service.py
from typing import List, Dict
import asyncio
import uuid
class SearchService:
"""Hybrid search with RRF fusion"""
def __init__(
self,
conversation_repo: ConversationRepository,
weaviate_client: WeaviateClient,
rrf_k: int = 60 # RRF constant (default from literature)
):
self.conversation_repo = conversation_repo
self.weaviate_client = weaviate_client
self.rrf_k = rrf_k
async def hybrid_search(
self,
organization_id: uuid.UUID,
query: str,
limit: int = 20,
alpha: float = 0.5 # 0.0 = keyword only, 1.0 = semantic only
) -> List[Conversation]:
"""Hybrid search with RRF fusion"""
# Run both searches in parallel (2x faster)
keyword_task = self.conversation_repo.keyword_search(
organization_id=organization_id,
query=query,
limit=limit * 2 # Get more candidates for fusion
)
semantic_task = self.weaviate_client.semantic_search(
organization_id=organization_id,
query=query,
limit=limit * 2
)
keyword_results, semantic_results = await asyncio.gather(
keyword_task,
semantic_task
)
# Apply Reciprocal Rank Fusion
fused_results = self._rrf_fusion(
keyword_results=keyword_results,
semantic_results=semantic_results,
alpha=alpha
)
# Return top N results
return fused_results[:limit]
def _rrf_fusion(
self,
keyword_results: List[Conversation],
semantic_results: List[dict],
alpha: float = 0.5
) -> List[Conversation]:
"""Reciprocal Rank Fusion algorithm"""
rrf_scores: Dict[uuid.UUID, float] = {}
# Calculate RRF scores for keyword results
for rank, conv in enumerate(keyword_results, start=1):
rrf_scores[conv.id] = rrf_scores.get(conv.id, 0.0) + \
(1 - alpha) / (self.rrf_k + rank)
# Calculate RRF scores for semantic results
for rank, result in enumerate(semantic_results, start=1):
conv_id = result["id"]
rrf_scores[conv_id] = rrf_scores.get(conv_id, 0.0) + \
alpha / (self.rrf_k + rank)
# Sort by RRF score (descending)
sorted_ids = sorted(
rrf_scores.keys(),
key=lambda id: rrf_scores[id],
reverse=True
)
# Fetch full conversation objects
conversations = []
for conv_id in sorted_ids:
# Try keyword results first (already fetched)
conv = next((c for c in keyword_results if c.id == conv_id), None)
# Otherwise fetch from database
if not conv:
conv = await self.conversation_repo.get(conv_id)
if conv:
conversations.append(conv)
return conversations
Step 4: Configurable Alpha Weighting
# app/api/routes/search.py (FastAPI)
from fastapi import APIRouter, Query
@router.get("/conversations/search")
async def search_conversations(
q: str,
limit: int = Query(20, ge=1, le=100),
alpha: float = Query(0.5, ge=0.0, le=1.0), # Configurable!
current_user = Depends(get_current_user),
search_service: SearchService = Depends(get_search_service)
):
"""
Hybrid search with configurable alpha:
- alpha=0.0: Keyword-only search (PostgreSQL)
- alpha=0.5: Balanced hybrid (default)
- alpha=1.0: Semantic-only search (Weaviate)
"""
results = await search_service.hybrid_search(
organization_id=current_user.organization_id,
query=q,
limit=limit,
alpha=alpha
)
return {"results": results, "count": len(results)}
Consequences
Positive
-
✅ Best of Both Worlds
- Exact matches: PostgreSQL handles "ABC-123", "PostgreSQL 15.3"
- Semantic matches: Weaviate handles "database" → "PostgreSQL", "MySQL"
- Combined: Results appear higher if matched by BOTH methods
-
✅ Proven Algorithm (RRF)
- Used by Elasticsearch, OpenSearch, Weaviate (industry standard)
- Research-backed: "Reciprocal Rank Fusion outperforms CombSUM" (2009)
- No machine learning required (parameter-free except k=60)
-
✅ Configurable Trade-offs
alpha=0.0: Keyword-only (exact matches, faster)alpha=0.5: Balanced (default, best for most queries)alpha=1.0: Semantic-only (conceptual search)- Per-query configuration allows user preference
-
✅ Performance: Parallel Execution
- Both searches run concurrently (not sequential)
- Total latency = max(keyword_time, semantic_time) + fusion_time
- Fusion time <5ms (simple arithmetic)
- Target: p95 <100ms (both searches <100ms, fusion <5ms)
-
✅ Multi-Tenant Safe
- PostgreSQL: RLS policies filter by
organization_id(ADR-004) - Weaviate:
.with_tenant()filter isolates tenants - Fusion only combines results from SAME organization
- PostgreSQL: RLS policies filter by
-
✅ Incremental Improvement
- Phase 1: PostgreSQL keyword search only (working, fast)
- Phase 2: Add Weaviate semantic search + RRF fusion
- Phase 3: Tune alpha based on user feedback
Negative
-
⚠️ Double Infrastructure Cost
- Cost: PostgreSQL ($100/month) + Weaviate ($150/month) = $250/month
- Mitigation: Balanced option (ADR-002), scales linearly
- ROI: Better search = higher user satisfaction = lower churn (>10% churn reduction = $120K/year)
-
⚠️ Embedding Generation Latency
- Problem: Must generate embeddings for new messages before searchable
- Latency: 50-100ms per message (OpenAI API)
- Mitigation: Async background job (Celery), user doesn't wait
- Trade-off: New messages searchable within 5 seconds (acceptable)
-
⚠️ Synchronization Complexity
- Problem: PostgreSQL and Weaviate must stay in sync
- Risk: Message in PostgreSQL but not Weaviate = not searchable
- Mitigation:
- PostgreSQL = source of truth
- Celery task:
sync_to_weaviate(message_id) - Retry logic: 3 attempts with exponential backoff
- Monitoring: Alert if sync lag >1 minute
-
⚠️ Alpha Tuning Required
- Problem: Optimal alpha varies by query type
- Example: "ABC-123" needs alpha=0.0 (exact match), "security bug" needs alpha=0.8 (semantic)
- Mitigation Phase 1: Default alpha=0.5 (good enough for 80% of queries)
- Future Enhancement: Query classifier predicts optimal alpha
if is_exact_match_query(query): # Contains IDs, exact terms
alpha = 0.2 # Favor keyword
elif is_conceptual_query(query): # Abstract terms
alpha = 0.8 # Favor semantic
else:
alpha = 0.5 # Balanced
Risks and Mitigations
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Weaviate Downtime | Low | High | Fallback to PostgreSQL-only (alpha=0.0), cache embeddings |
| Embedding Cost Spike | Medium | Medium | Monitor usage, cap at $500/month, batch processing |
| Sync Lag | Medium | Low | Celery retry, monitoring, alert if lag >1 minute |
| Poor Relevance | Low | High | A/B test alpha values, user feedback loop |
Alternatives Considered
Alternative 1: PostgreSQL Full-Text Search Only (No Weaviate)
Architecture: Use only PostgreSQL tsvector + tsquery
Pros:
- ✅ Simpler infrastructure (1 database vs. 2)
- ✅ Lower cost ($100/month vs. $250/month)
- ✅ No synchronization complexity
Cons:
- ❌ No semantic understanding: "bug" doesn't match "error", "issue"
- ❌ No synonym handling: "auth" doesn't match "authentication"
- ❌ Poor for conceptual queries: "database performance" doesn't match "PostgreSQL slow queries"
- ❌ User satisfaction: 6/10 vs. 8/10 with hybrid search
Why Rejected: Search quality is critical for product differentiation. Competitors (GitHub Copilot, Cursor) don't offer search at all - this is our competitive advantage. Saving $150/month is not worth 25% lower user satisfaction.
Alternative 2: Weaviate Semantic Search Only (No PostgreSQL FTS)
Architecture: Use only Weaviate vector search
Pros:
- ✅ Semantic understanding (best-in-class)
- ✅ Synonym handling
- ✅ Conceptual search
Cons:
- ❌ Poor exact matches: "ABC-123" might match "ABC-124", "XYZ-999"
- ❌ No keyword precision: "PostgreSQL 15.3" might match "MySQL 8.0"
- ❌ User frustration: "I typed the exact ID and it didn't find it!"
Why Rejected: Exact matches are critical for developer workflows (ticket IDs, commit hashes, function names). Vector search alone is insufficient.
Alternative 3: Weighted Sum Instead of RRF
Architecture: Combine keyword and semantic scores using weighted sum
final_score = alpha * semantic_score + (1 - alpha) * keyword_score
Pros:
- ✅ Simpler algorithm
- ✅ Intuitive weighting
Cons:
- ❌ Score calibration required: Keyword scores (0-10) vs. semantic scores (0-1) have different ranges
- ❌ Normalization complexity: Must normalize both score ranges to [0, 1]
- ❌ Outlier sensitivity: One high score dominates the result
- ❌ Research shows RRF outperforms weighted sum (Cormack et al., 2009)
Why Rejected: RRF is simpler (no score normalization), more robust (rank-based), and proven to outperform weighted sum.
Alternative 4: Query Classifier → Single Search Method
Architecture: Classify query type, then route to EITHER keyword OR semantic (not both)
if is_exact_match_query(query):
results = keyword_search(query) # PostgreSQL only
elif is_conceptual_query(query):
results = semantic_search(query) # Weaviate only
else:
results = hybrid_search(query) # Both
Pros:
- ✅ Potentially faster (single search vs. two searches)
- ✅ Lower Weaviate usage (cost savings)
Cons:
- ❌ Classifier accuracy: What if it misclassifies?
- ❌ Complexity: Need to train/maintain classifier
- ❌ Edge cases: Many queries are hybrid ("PostgreSQL bug" = exact + semantic)
- ❌ Marginal performance gain: Parallel execution already fast (<100ms)
Why Rejected: Complexity not worth marginal performance gain. Better to always use hybrid search (simpler, more robust) and tune alpha if needed.
Success Metrics
Search Quality Metrics
- User satisfaction: 8/10 (measured via feedback thumbs up/down)
- Recall: 90%+ (find existing conversations)
- Precision@10: 80%+ (top 10 results relevant)
- Click-through rate: 60%+ (user clicks on result)
Performance Metrics
- p50 latency: <50ms
- p95 latency: <100ms
- p99 latency: <200ms
- Throughput: 100 searches/sec at 10K organizations
Synchronization Metrics
- Sync lag: <5 seconds (p95)
- Sync error rate: <0.1%
- Embedding generation time: <100ms per message
Implementation Plan
Phase 1: PostgreSQL Keyword Search (Week 1)
- Add
tsvectorcolumns to tables - Create GIN indexes
- Implement
keyword_search()method - Test with 100K messages
- Benchmark latency (target: p95 <50ms)
Phase 2: Weaviate Setup (Week 2)
- Deploy Weaviate Cloud (managed)
- Create multi-tenant schema
- Implement embedding generation (OpenAI API)
- Implement
semantic_search()method - Test with 100K messages
Phase 3: RRF Fusion (Week 3)
- Implement
_rrf_fusion()algorithm - Parallel execution (asyncio)
- Configurable alpha parameter
- Integration tests (both modes)
Phase 4: Tuning & Optimization (Week 4)
- A/B test alpha values (0.3, 0.5, 0.7)
- User feedback collection (thumbs up/down)
- Performance optimization (<100ms p95)
- Monitoring and alerting
References
Research:
- Reciprocal Rank Fusion (RRF) Paper - Cormack et al., 2009
- Elasticsearch Hybrid Search
- Weaviate Hybrid Search Docs
Implementation Examples:
Related ADRs:
- ADR-002: PostgreSQL + Weaviate (database architecture)
- ADR-004: Multi-Tenant RLS (isolation for search)
- ADR-006: Conversation-Commit Correlation (extends search to Git commits)
Status: Proposed Review Date: 2025-12-03 Projected ADR Score: 39/40 (A+) Complexity: Medium (proven algorithm, parallel execution) Owner: Architecture Team + ML/Search Team
Next Steps:
- Approve RRF hybrid search approach
- Implement PostgreSQL FTS (Week 1)
- Deploy Weaviate and test embedding pipeline (Week 2)
- Integrate RRF fusion (Week 3)
- A/B test alpha values and optimize (Week 4)