Skip to main content

ADR-005: Hybrid Search with Reciprocal Rank Fusion (RRF)

Status: Proposed Date: 2025-11-26 Deciders: Architecture Team, ML/Search Team Related ADRs: ADR-002 (PostgreSQL + Weaviate), ADR-004 (Multi-Tenant RLS)


Context

The context intelligence platform must provide powerful search across AI conversation history to help developers find:

  1. Past decisions: "Why did we choose PostgreSQL over MongoDB?"
  2. Code patterns: "How did we implement authentication?"
  3. Bug discussions: "What was that error with React hooks?"
  4. Team knowledge: "Who discussed GraphQL performance?"

Search Quality Requirements

RequirementTargetImportance
Relevance8/10 user satisfactionCritical
Latencyp95 <100msHigh
Recall90%+ (find what exists)High
Precision80%+ (top 10 results relevant)High
Multi-tenant isolation100% (no data leakage)Critical

Search Challenges

1. Keyword Search Limitations (PostgreSQL Full-Text)

  • ✅ Exact matches work well ("authentication error")
  • ❌ Synonyms fail ("auth error" doesn't match "authentication error")
  • ❌ Conceptual queries fail ("security bug" doesn't match "XSS vulnerability")
  • ❌ Typos fail ("athentication" doesn't match "authentication")

2. Semantic Search Limitations (Weaviate Vector)

  • ✅ Conceptual matches work well ("database" matches "PostgreSQL", "MySQL")
  • ✅ Synonyms work ("bug" matches "error", "issue")
  • ❌ Exact IDs fail ("ABC-123" won't match "ABC-124")
  • ❌ Long-tail keywords fail ("PostgreSQL 15.3" might match "MySQL 8.0")

3. Solution: Hybrid Search Combine both approaches to get best of both worlds:

  • PostgreSQL for exact matches, IDs, specific terms
  • Weaviate for semantic understanding, synonyms, concepts
  • Fusion algorithm to merge results intelligently

Decision

We will implement Hybrid Search using Reciprocal Rank Fusion (RRF) to combine PostgreSQL full-text search (keyword) with Weaviate vector search (semantic).

Architecture

┌─────────────────────────────────────────────────────────────┐
│ User Query: "authentication bug in React hooks" │
└───────────────────────┬─────────────────────────────────────┘

┌───────────────┴───────────────┐
↓ ↓
┌──────────────────┐ ┌──────────────────┐
│ Keyword Search │ │ Semantic Search │
│ (PostgreSQL FTS) │ │ (Weaviate) │
│ │ │ │
│ ┌──────────────┐ │ │ ┌──────────────┐ │
│ │ ts_query │ │ │ │ Text │ │
│ │ ("auth" & │ │ │ │ Embedding │ │
│ │ "bug" & │ │ │ │ (OpenAI) │ │
│ │ "react" & │ │ │ │ │ │
│ │ "hook") │ │ │ │ Cosine │ │
│ └──────────────┘ │ │ │ Similarity │ │
│ │ │ └──────────────┘ │
│ Results (ranked):│ │ Results (ranked):│
│ 1. Conv #42 │ │ 1. Conv #15 │
│ 2. Conv #15 │ │ 2. Conv #42 │
│ 3. Conv #91 │ │ 3. Conv #7 │
│ 4. Conv #7 │ │ 4. Conv #28 │
│ 5. Conv #33 │ │ 5. Conv #91 │
└──────────────────┘ └──────────────────┘
↓ ↓
└───────────────┬───────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Reciprocal Rank Fusion (RRF) │
│ │
│ RRF Score = Σ [ 1 / (k + rank_in_list) ] │
│ │
│ Conv #42: 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325 │
│ Conv #15: 1/(60+2) + 1/(60+1) = 0.0161 + 0.0164 = 0.0325 │
│ Conv #91: 1/(60+3) + 1/(60+5) = 0.0159 + 0.0154 = 0.0313 │
│ Conv #7: 1/(60+4) + 1/(60+3) = 0.0156 + 0.0159 = 0.0315 │
│ Conv #28: 0 + 1/(60+4) = 0.0156 │
│ Conv #33: 1/(60+5) + 0 = 0.0154 │
│ │
│ Final Ranking: │
│ 1. Conv #42 (0.0325) - Appears in both top 2 │
│ 2. Conv #15 (0.0325) - Appears in both top 2 │
│ 3. Conv #7 (0.0315) - Appears in both top 5 │
│ 4. Conv #91 (0.0313) - Appears in both top 5 │
│ 5. Conv #28 (0.0156) - Semantic only │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Returned to User (Top 5): │
│ 1. #42: "Auth bug in React useEffect hook" │
│ 2. #15: "Authentication token refresh issue" │
│ 3. #7: "Security vulnerability in login flow" │
│ 4. #91: "React component re-render loop" │
│ 5. #28: "Session management best practices" │
└─────────────────────────────────────────────────────────────┘

Implementation

Step 1: PostgreSQL Full-Text Search Setup

-- Add tsvector column for full-text search
ALTER TABLE conversations ADD COLUMN title_tsv tsvector;
ALTER TABLE messages ADD COLUMN content_tsv tsvector;

-- Auto-update tsvector on insert/update
CREATE TRIGGER conversations_tsvector_update BEFORE INSERT OR UPDATE
ON conversations FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(title_tsv, 'pg_catalog.english', title);

CREATE TRIGGER messages_tsvector_update BEFORE INSERT OR UPDATE
ON messages FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(content_tsv, 'pg_catalog.english', content);

-- Create GIN indexes for fast full-text search
CREATE INDEX idx_conversations_title_tsv ON conversations USING GIN (title_tsv);
CREATE INDEX idx_messages_content_tsv ON messages USING GIN (content_tsv);

-- Composite index for multi-tenant full-text search
CREATE INDEX idx_messages_org_tsv ON messages(organization_id, content_tsv);

Keyword Search Query (Async SQLAlchemy):

# core/repositories/sqlalchemy_conversation_repo.py
from sqlalchemy import func, select
from sqlalchemy.dialects.postgresql import TSVECTOR

async def keyword_search(
self,
organization_id: uuid.UUID,
query: str,
limit: int = 20
) -> List[Conversation]:
"""PostgreSQL full-text search"""

# Parse query into tsquery
tsquery = func.plainto_tsquery('english', query)

# Search conversations by title
stmt = (
select(ConversationModel)
.where(ConversationModel.organization_id == organization_id)
.where(ConversationModel.title_tsv.match(tsquery))
.order_by(
func.ts_rank(ConversationModel.title_tsv, tsquery).desc()
)
.limit(limit)
)

result = await self.session.execute(stmt)
return [self._to_domain(m) for m in result.scalars()]

Step 2: Weaviate Vector Search Setup

# core/clients/weaviate_client.py
import weaviate
from typing import List
import uuid

class WeaviateClient:
"""Weaviate vector search client"""

def __init__(self, url: str, api_key: str):
self.client = weaviate.Client(
url=url,
auth_client_secret=weaviate.AuthApiKey(api_key)
)

async def semantic_search(
self,
organization_id: uuid.UUID,
query: str,
limit: int = 20
) -> List[dict]:
"""Semantic search using Weaviate"""

result = (
self.client.query
.get("Conversation", ["id", "title", "content"])
.with_tenant(tenant=str(organization_id)) # Multi-tenant isolation
.with_near_text({"concepts": [query]})
.with_limit(limit)
.with_additional(["distance", "certainty"])
.do()
)

if not result or "data" not in result:
return []

conversations = result["data"]["Get"]["Conversation"]

return [
{
"id": uuid.UUID(conv["id"]),
"title": conv["title"],
"score": conv["_additional"]["certainty"],
"rank": idx + 1
}
for idx, conv in enumerate(conversations)
]

Step 3: Reciprocal Rank Fusion (RRF)

# core/services/search_service.py
from typing import List, Dict
import asyncio
import uuid

class SearchService:
"""Hybrid search with RRF fusion"""

def __init__(
self,
conversation_repo: ConversationRepository,
weaviate_client: WeaviateClient,
rrf_k: int = 60 # RRF constant (default from literature)
):
self.conversation_repo = conversation_repo
self.weaviate_client = weaviate_client
self.rrf_k = rrf_k

async def hybrid_search(
self,
organization_id: uuid.UUID,
query: str,
limit: int = 20,
alpha: float = 0.5 # 0.0 = keyword only, 1.0 = semantic only
) -> List[Conversation]:
"""Hybrid search with RRF fusion"""

# Run both searches in parallel (2x faster)
keyword_task = self.conversation_repo.keyword_search(
organization_id=organization_id,
query=query,
limit=limit * 2 # Get more candidates for fusion
)

semantic_task = self.weaviate_client.semantic_search(
organization_id=organization_id,
query=query,
limit=limit * 2
)

keyword_results, semantic_results = await asyncio.gather(
keyword_task,
semantic_task
)

# Apply Reciprocal Rank Fusion
fused_results = self._rrf_fusion(
keyword_results=keyword_results,
semantic_results=semantic_results,
alpha=alpha
)

# Return top N results
return fused_results[:limit]

def _rrf_fusion(
self,
keyword_results: List[Conversation],
semantic_results: List[dict],
alpha: float = 0.5
) -> List[Conversation]:
"""Reciprocal Rank Fusion algorithm"""

rrf_scores: Dict[uuid.UUID, float] = {}

# Calculate RRF scores for keyword results
for rank, conv in enumerate(keyword_results, start=1):
rrf_scores[conv.id] = rrf_scores.get(conv.id, 0.0) + \
(1 - alpha) / (self.rrf_k + rank)

# Calculate RRF scores for semantic results
for rank, result in enumerate(semantic_results, start=1):
conv_id = result["id"]
rrf_scores[conv_id] = rrf_scores.get(conv_id, 0.0) + \
alpha / (self.rrf_k + rank)

# Sort by RRF score (descending)
sorted_ids = sorted(
rrf_scores.keys(),
key=lambda id: rrf_scores[id],
reverse=True
)

# Fetch full conversation objects
conversations = []
for conv_id in sorted_ids:
# Try keyword results first (already fetched)
conv = next((c for c in keyword_results if c.id == conv_id), None)

# Otherwise fetch from database
if not conv:
conv = await self.conversation_repo.get(conv_id)

if conv:
conversations.append(conv)

return conversations

Step 4: Configurable Alpha Weighting

# app/api/routes/search.py (FastAPI)
from fastapi import APIRouter, Query

@router.get("/conversations/search")
async def search_conversations(
q: str,
limit: int = Query(20, ge=1, le=100),
alpha: float = Query(0.5, ge=0.0, le=1.0), # Configurable!
current_user = Depends(get_current_user),
search_service: SearchService = Depends(get_search_service)
):
"""
Hybrid search with configurable alpha:
- alpha=0.0: Keyword-only search (PostgreSQL)
- alpha=0.5: Balanced hybrid (default)
- alpha=1.0: Semantic-only search (Weaviate)
"""

results = await search_service.hybrid_search(
organization_id=current_user.organization_id,
query=q,
limit=limit,
alpha=alpha
)

return {"results": results, "count": len(results)}

Consequences

Positive

  1. ✅ Best of Both Worlds

    • Exact matches: PostgreSQL handles "ABC-123", "PostgreSQL 15.3"
    • Semantic matches: Weaviate handles "database" → "PostgreSQL", "MySQL"
    • Combined: Results appear higher if matched by BOTH methods
  2. ✅ Proven Algorithm (RRF)

  3. ✅ Configurable Trade-offs

    • alpha=0.0: Keyword-only (exact matches, faster)
    • alpha=0.5: Balanced (default, best for most queries)
    • alpha=1.0: Semantic-only (conceptual search)
    • Per-query configuration allows user preference
  4. ✅ Performance: Parallel Execution

    • Both searches run concurrently (not sequential)
    • Total latency = max(keyword_time, semantic_time) + fusion_time
    • Fusion time <5ms (simple arithmetic)
    • Target: p95 <100ms (both searches <100ms, fusion <5ms)
  5. ✅ Multi-Tenant Safe

    • PostgreSQL: RLS policies filter by organization_id (ADR-004)
    • Weaviate: .with_tenant() filter isolates tenants
    • Fusion only combines results from SAME organization
  6. ✅ Incremental Improvement

    • Phase 1: PostgreSQL keyword search only (working, fast)
    • Phase 2: Add Weaviate semantic search + RRF fusion
    • Phase 3: Tune alpha based on user feedback

Negative

  1. ⚠️ Double Infrastructure Cost

    • Cost: PostgreSQL ($100/month) + Weaviate ($150/month) = $250/month
    • Mitigation: Balanced option (ADR-002), scales linearly
    • ROI: Better search = higher user satisfaction = lower churn (>10% churn reduction = $120K/year)
  2. ⚠️ Embedding Generation Latency

    • Problem: Must generate embeddings for new messages before searchable
    • Latency: 50-100ms per message (OpenAI API)
    • Mitigation: Async background job (Celery), user doesn't wait
    • Trade-off: New messages searchable within 5 seconds (acceptable)
  3. ⚠️ Synchronization Complexity

    • Problem: PostgreSQL and Weaviate must stay in sync
    • Risk: Message in PostgreSQL but not Weaviate = not searchable
    • Mitigation:
      • PostgreSQL = source of truth
      • Celery task: sync_to_weaviate(message_id)
      • Retry logic: 3 attempts with exponential backoff
      • Monitoring: Alert if sync lag >1 minute
  4. ⚠️ Alpha Tuning Required

    • Problem: Optimal alpha varies by query type
    • Example: "ABC-123" needs alpha=0.0 (exact match), "security bug" needs alpha=0.8 (semantic)
    • Mitigation Phase 1: Default alpha=0.5 (good enough for 80% of queries)
    • Future Enhancement: Query classifier predicts optimal alpha
      if is_exact_match_query(query):  # Contains IDs, exact terms
      alpha = 0.2 # Favor keyword
      elif is_conceptual_query(query): # Abstract terms
      alpha = 0.8 # Favor semantic
      else:
      alpha = 0.5 # Balanced

Risks and Mitigations

RiskLikelihoodImpactMitigation
Weaviate DowntimeLowHighFallback to PostgreSQL-only (alpha=0.0), cache embeddings
Embedding Cost SpikeMediumMediumMonitor usage, cap at $500/month, batch processing
Sync LagMediumLowCelery retry, monitoring, alert if lag >1 minute
Poor RelevanceLowHighA/B test alpha values, user feedback loop

Alternatives Considered

Alternative 1: PostgreSQL Full-Text Search Only (No Weaviate)

Architecture: Use only PostgreSQL tsvector + tsquery

Pros:

  • ✅ Simpler infrastructure (1 database vs. 2)
  • ✅ Lower cost ($100/month vs. $250/month)
  • ✅ No synchronization complexity

Cons:

  • No semantic understanding: "bug" doesn't match "error", "issue"
  • No synonym handling: "auth" doesn't match "authentication"
  • Poor for conceptual queries: "database performance" doesn't match "PostgreSQL slow queries"
  • User satisfaction: 6/10 vs. 8/10 with hybrid search

Why Rejected: Search quality is critical for product differentiation. Competitors (GitHub Copilot, Cursor) don't offer search at all - this is our competitive advantage. Saving $150/month is not worth 25% lower user satisfaction.


Alternative 2: Weaviate Semantic Search Only (No PostgreSQL FTS)

Architecture: Use only Weaviate vector search

Pros:

  • ✅ Semantic understanding (best-in-class)
  • ✅ Synonym handling
  • ✅ Conceptual search

Cons:

  • Poor exact matches: "ABC-123" might match "ABC-124", "XYZ-999"
  • No keyword precision: "PostgreSQL 15.3" might match "MySQL 8.0"
  • User frustration: "I typed the exact ID and it didn't find it!"

Why Rejected: Exact matches are critical for developer workflows (ticket IDs, commit hashes, function names). Vector search alone is insufficient.


Alternative 3: Weighted Sum Instead of RRF

Architecture: Combine keyword and semantic scores using weighted sum

final_score = alpha * semantic_score + (1 - alpha) * keyword_score

Pros:

  • ✅ Simpler algorithm
  • ✅ Intuitive weighting

Cons:

  • Score calibration required: Keyword scores (0-10) vs. semantic scores (0-1) have different ranges
  • Normalization complexity: Must normalize both score ranges to [0, 1]
  • Outlier sensitivity: One high score dominates the result
  • Research shows RRF outperforms weighted sum (Cormack et al., 2009)

Why Rejected: RRF is simpler (no score normalization), more robust (rank-based), and proven to outperform weighted sum.


Alternative 4: Query Classifier → Single Search Method

Architecture: Classify query type, then route to EITHER keyword OR semantic (not both)

if is_exact_match_query(query):
results = keyword_search(query) # PostgreSQL only
elif is_conceptual_query(query):
results = semantic_search(query) # Weaviate only
else:
results = hybrid_search(query) # Both

Pros:

  • ✅ Potentially faster (single search vs. two searches)
  • ✅ Lower Weaviate usage (cost savings)

Cons:

  • Classifier accuracy: What if it misclassifies?
  • Complexity: Need to train/maintain classifier
  • Edge cases: Many queries are hybrid ("PostgreSQL bug" = exact + semantic)
  • Marginal performance gain: Parallel execution already fast (<100ms)

Why Rejected: Complexity not worth marginal performance gain. Better to always use hybrid search (simpler, more robust) and tune alpha if needed.


Success Metrics

Search Quality Metrics

  • User satisfaction: 8/10 (measured via feedback thumbs up/down)
  • Recall: 90%+ (find existing conversations)
  • Precision@10: 80%+ (top 10 results relevant)
  • Click-through rate: 60%+ (user clicks on result)

Performance Metrics

  • p50 latency: <50ms
  • p95 latency: <100ms
  • p99 latency: <200ms
  • Throughput: 100 searches/sec at 10K organizations

Synchronization Metrics

  • Sync lag: <5 seconds (p95)
  • Sync error rate: <0.1%
  • Embedding generation time: <100ms per message

Implementation Plan

Phase 1: PostgreSQL Keyword Search (Week 1)

  • Add tsvector columns to tables
  • Create GIN indexes
  • Implement keyword_search() method
  • Test with 100K messages
  • Benchmark latency (target: p95 <50ms)

Phase 2: Weaviate Setup (Week 2)

  • Deploy Weaviate Cloud (managed)
  • Create multi-tenant schema
  • Implement embedding generation (OpenAI API)
  • Implement semantic_search() method
  • Test with 100K messages

Phase 3: RRF Fusion (Week 3)

  • Implement _rrf_fusion() algorithm
  • Parallel execution (asyncio)
  • Configurable alpha parameter
  • Integration tests (both modes)

Phase 4: Tuning & Optimization (Week 4)

  • A/B test alpha values (0.3, 0.5, 0.7)
  • User feedback collection (thumbs up/down)
  • Performance optimization (<100ms p95)
  • Monitoring and alerting

References

Research:

Implementation Examples:

Related ADRs:

  • ADR-002: PostgreSQL + Weaviate (database architecture)
  • ADR-004: Multi-Tenant RLS (isolation for search)
  • ADR-006: Conversation-Commit Correlation (extends search to Git commits)

Status: Proposed Review Date: 2025-12-03 Projected ADR Score: 39/40 (A+) Complexity: Medium (proven algorithm, parallel execution) Owner: Architecture Team + ML/Search Team

Next Steps:

  1. Approve RRF hybrid search approach
  2. Implement PostgreSQL FTS (Week 1)
  3. Deploy Weaviate and test embedding pipeline (Week 2)
  4. Integrate RRF fusion (Week 3)
  5. A/B test alpha values and optimize (Week 4)