Open-Source Search Solutions for Coditect: Production Implementation Guide
Executive Summary
Mandate: 100% open-source, self-hosted, no proprietary dependencies
Recommended Stack:
- Primary: Meilisearch (MIT License) - Production knowledge retrieval for agents
- Backup: Typesense (GPL v3) - Equivalent alternative
- Lightweight: Lunr.js (Apache 2.0) - Client-side documentation search
- Modern Static: Pagefind (MIT) - Zero-dependency static site search
Strategic Rationale: Meilisearch + Lunr.js dual deployment
- Meilisearch: Agent knowledge retrieval (server-side, high-performance)
- Lunr.js: Public documentation (client-side, zero infrastructure)
Open-Source Search Landscape
Comparison Matrix
| Solution | License | Deployment | Indexing | Language | Scalability | Use Case |
|---|---|---|---|---|---|---|
| Meilisearch | MIT | Self-hosted | Server-side | Rust | Millions of docs | Agent knowledge, APIs |
| Typesense | GPL v3 | Self-hosted | Server-side | C++ | Millions of docs | Agent knowledge, APIs |
| Lunr.js | Apache 2.0 | Client-side | Build-time | JavaScript | ~10K docs | Public documentation |
| Pagefind | MIT | Client-side | Build-time | Rust | 100K+ docs | Static documentation |
| FlexSearch | Apache 2.0 | Client-side | Build-time | JavaScript | ~50K docs | Documentation |
| Stork | Apache 2.0 | Client-side | Build-time | Rust | ~100K docs | Documentation |
| Elasticsearch | Elastic License 2.0* | Self-hosted | Server-side | Java | Billions of docs | Enterprise search |
*Note: Elasticsearch changed from Apache 2.0 to Elastic License 2.0 (not OSI-approved) in 2021. OpenSearch (AWS fork) remains Apache 2.0.
Recommended Architecture for Coditect
coditect_search_architecture:
public_documentation:
solution: lunr.js # or Pagefind
deployment: static_bundle
location: coditect.ai/docs
cost: zero_infrastructure
agent_knowledge_retrieval:
solution: meilisearch
deployment: self_hosted_gke
location: internal_cluster
cost: compute_only
customer_portal:
solution: meilisearch
deployment: multi_tenant_instance
isolation: tenant_id_filtering
cost: shared_infrastructure
compliance_documentation:
solution: meilisearch
deployment: air_gapped_instance
location: customer_premises
cost: customer_managed
Solution 1: Meilisearch (PRIMARY RECOMMENDATION)
Why Meilisearch for Coditect
Technical Fit:
- Rust-based: Performance, memory safety, aligns with Coditect's Rust stack
- MIT License: True open source, no licensing concerns
- Typo tolerance: Critical for agent queries with technical terms
- Fast indexing: Millions of documents, sub-second updates
- Multi-tenancy: Built-in tenant filtering via
filterparameter - Simple deployment: Single binary, minimal dependencies
Coditect-Specific Advantages:
// Meilisearch aligns with Coditect's Rust ecosystem
// Agent knowledge retrieval in native language
use meilisearch_sdk::{Client, IndexConfig};
use serde::{Deserialize, Serialize};
#[derive(Serialize, Deserialize, Debug)]
struct CoditectKnowledgeDocument {
id: String,
document_type: String, // "adr", "pattern", "compliance"
title: String,
content: String,
summary: String,
agent_types: Vec<String>,
compliance_frameworks: Vec<String>,
token_estimate: u32,
priority_score: f32,
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize Meilisearch client
let client = Client::new(
"http://meilisearch:7700",
Some("MASTER_KEY")
);
// Create index for Coditect knowledge base
let index = client
.create_index("coditect_knowledge", Some("id"))
.await?;
// Configure searchable and filterable attributes
index.set_searchable_attributes(&[
"title",
"summary",
"content",
]).await?;
index.set_filterable_attributes(&[
"agent_types",
"document_type",
"compliance_frameworks",
]).await?;
index.set_ranking_rules(&[
"words",
"typo",
"priority_score:desc",
"proximity",
"attribute",
"exactness",
]).await?;
Ok(())
}
Production Deployment Architecture
meilisearch_production_deployment:
infrastructure:
platform: google_kubernetes_engine
deployment: statefulset
replicas: 3 # HA configuration
storage:
type: persistent_volume
size: 100Gi # Scales with knowledge base
storage_class: ssd
networking:
service_type: ClusterIP
internal_only: true # Not exposed to internet
tls: mutual_tls
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
backup:
schedule: "0 */6 * * *" # Every 6 hours
retention: 7_days
destination: gcs_bucket
monitoring:
prometheus: true
grafana_dashboard: true
alerts:
- search_latency_p95 > 200ms
- index_size > 80GB
- memory_usage > 3.5Gi
Kubernetes Deployment
# meilisearch-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: meilisearch
namespace: coditect-platform
spec:
serviceName: meilisearch
replicas: 3
selector:
matchLabels:
app: meilisearch
template:
metadata:
labels:
app: meilisearch
spec:
containers:
- name: meilisearch
image: getmeili/meilisearch:v1.5
ports:
- containerPort: 7700
name: http
env:
- name: MEILI_MASTER_KEY
valueFrom:
secretKeyRef:
name: meilisearch-secrets
key: master-key
- name: MEILI_ENV
value: "production"
- name: MEILI_NO_ANALYTICS
value: "true" # Privacy-focused
- name: MEILI_LOG_LEVEL
value: "INFO"
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
volumeMounts:
- name: data
mountPath: /meili_data
livenessProbe:
httpGet:
path: /health
port: 7700
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 7700
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ssd
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
name: meilisearch
namespace: coditect-platform
spec:
clusterIP: None # Headless service
selector:
app: meilisearch
ports:
- port: 7700
targetPort: 7700
name: http
Agent Knowledge Retrieval Implementation
// src/knowledge/meilisearch_retrieval.rs
use meilisearch_sdk::{Client, SearchQuery, SearchResults};
use std::collections::HashMap;
pub struct CoditectKnowledgeRetriever {
client: Client,
index_name: String,
}
impl CoditectKnowledgeRetriever {
pub fn new(host: &str, api_key: &str) -> Self {
let client = Client::new(host, Some(api_key));
Self {
client,
index_name: "coditect_knowledge".to_string(),
}
}
pub async fn search_for_agent(
&self,
query: &str,
agent_type: &str,
compliance_frameworks: Vec<&str>,
max_tokens: u32,
) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
let index = self.client.index(&self.index_name);
// Build filter string
let mut filters = vec![
format!("agent_types = {}", agent_type)
];
if !compliance_frameworks.is_empty() {
let compliance_filter = compliance_frameworks
.iter()
.map(|fw| format!("compliance_frameworks = {}", fw))
.collect::<Vec<_>>()
.join(" OR ");
filters.push(format!("({})", compliance_filter));
}
let filter_str = filters.join(" AND ");
// Execute search
let search_query = SearchQuery::new(&index)
.with_query(query)
.with_filter(&filter_str)
.with_limit(50) // Get more, filter by token budget
.build();
let results: SearchResults<CoditectKnowledgeDocument> =
index.search().execute::<CoditectKnowledgeDocument>().await?;
// Token-conscious selection
let mut selected = Vec::new();
let mut token_budget = max_tokens;
for hit in results.hits {
let doc = hit.result;
if doc.token_estimate <= token_budget {
token_budget -= doc.token_estimate;
selected.push(doc);
}
if token_budget < 500 {
break;
}
}
Ok(selected)
}
pub async fn search_with_multi_tenancy(
&self,
query: &str,
tenant_id: &str,
) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
let index = self.client.index(&self.index_name);
let search_query = SearchQuery::new(&index)
.with_query(query)
.with_filter(&format!("tenant_id = {}", tenant_id))
.build();
let results = index.search()
.execute::<CoditectKnowledgeDocument>()
.await?;
Ok(results.hits.into_iter().map(|h| h.result).collect())
}
}
// Usage in agent orchestration
#[tokio::main]
async fn main() {
let retriever = CoditectKnowledgeRetriever::new(
"http://meilisearch:7700",
&std::env::var("MEILI_API_KEY").unwrap()
);
// Architect agent searching for design patterns
let knowledge = retriever.search_for_agent(
"event-driven architecture patterns",
"architect",
vec!["fda_21_cfr_11"],
15000 // 15K token budget
).await.unwrap();
println!("Retrieved {} knowledge documents", knowledge.len());
for doc in knowledge {
println!("- {} ({} tokens)", doc.title, doc.token_estimate);
}
}
Indexing Pipeline (CI/CD Integration)
// scripts/index_knowledge_base/src/main.rs
use meilisearch_sdk::Client;
use std::path::Path;
use walkdir::WalkDir;
use anthropic_sdk::Client as AnthropicClient;
struct KnowledgeIndexer {
meilisearch: Client,
anthropic: AnthropicClient,
documents: Vec<CoditectKnowledgeDocument>,
}
impl KnowledgeIndexer {
pub async fn build_index(&mut self, repo_path: &Path) -> Result<(), Box<dyn std::error::Error>> {
// 1. Extract ADRs
self.extract_adrs(repo_path.join("docs/adr")).await?;
// 2. Extract compliance rules
self.extract_compliance_rules(repo_path.join("compliance")).await?;
// 3. Extract code patterns
self.extract_patterns(repo_path.join("src")).await?;
// 4. Generate summaries with Claude
self.generate_summaries().await?;
// 5. Calculate token estimates
self.calculate_token_estimates();
// 6. Upload to Meilisearch
self.upload_to_meilisearch().await?;
Ok(())
}
async fn extract_adrs(&mut self, adr_path: std::path::PathBuf) -> Result<(), Box<dyn std::error::Error>> {
for entry in WalkDir::new(adr_path) {
let entry = entry?;
if entry.path().extension().and_then(|s| s.to_str()) == Some("md") {
let content = std::fs::read_to_string(entry.path())?;
// Parse ADR structure
let adr_data = self.parse_adr(&content)?;
// Generate summary with Claude
let summary = self.generate_summary_with_claude(
&content,
"adr",
"Coditect architecture decision"
).await?;
self.documents.push(CoditectKnowledgeDocument {
id: format!("adr_{}", entry.path().file_stem().unwrap().to_str().unwrap()),
document_type: "adr".to_string(),
title: adr_data.title,
content,
summary,
agent_types: vec!["architect".to_string(), "orchestrator".to_string()],
compliance_frameworks: vec![],
token_estimate: 0, // Calculated later
priority_score: 1.0,
});
}
}
Ok(())
}
async fn generate_summary_with_claude(
&self,
content: &str,
doc_type: &str,
context: &str
) -> Result<String, Box<dyn std::error::Error>> {
let prompt = format!(
"Generate a concise 2-3 sentence summary of this {} for AI agent knowledge retrieval.\n\
Context: {}\n\
Requirements:\n\
- Focus on key decisions and rationale\n\
- Include relevant technical terms\n\
- Keep under 100 tokens\n\n\
Document:\n{}",
doc_type,
context,
&content[..content.len().min(5000)]
);
let response = self.anthropic.messages()
.create()
.model("claude-sonnet-4-20250514")
.max_tokens(500)
.user_message(&prompt)
.send()
.await?;
Ok(response.content[0].text.clone())
}
fn calculate_token_estimates(&mut self) {
for doc in &mut self.documents {
// Rough estimate: 1 token ≈ 4 characters
doc.token_estimate = (doc.content.len() / 4) as u32;
}
}
async fn upload_to_meilisearch(&self) -> Result<(), Box<dyn std::error::Error>> {
let index = self.meilisearch.index("coditect_knowledge");
// Configure index
index.set_searchable_attributes(&[
"title", "summary", "content"
]).await?;
index.set_filterable_attributes(&[
"agent_types", "document_type", "compliance_frameworks"
]).await?;
index.set_ranking_rules(&[
"words",
"typo",
"priority_score:desc",
"proximity",
"attribute",
"exactness",
]).await?;
// Batch upload
const BATCH_SIZE: usize = 100;
for chunk in self.documents.chunks(BATCH_SIZE) {
index.add_documents(chunk, Some("id")).await?;
}
println!("Indexed {} documents", self.documents.len());
Ok(())
}
}
// CI/CD integration
// .github/workflows/index-knowledge.yml
# .github/workflows/index-knowledge.yml
name: Index Knowledge Base
on:
push:
branches: [main]
paths:
- 'docs/adr/**'
- 'compliance/**'
- 'src/**/*.rs'
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
- name: Build indexer
run: |
cd scripts/index_knowledge_base
cargo build --release
- name: Run indexing
env:
MEILI_HOST: ${{ secrets.MEILI_HOST }}
MEILI_ADMIN_KEY: ${{ secrets.MEILI_ADMIN_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
./scripts/index_knowledge_base/target/release/index_knowledge_base \
--repo-path . \
--environment production
- name: Verify index
run: |
curl -H "Authorization: Bearer ${{ secrets.MEILI_ADMIN_KEY }}" \
"${{ secrets.MEILI_HOST }}/indexes/coditect_knowledge/stats"
Multi-Tenant Implementation
// Multi-tenant knowledge base with tenant isolation
pub struct MultiTenantKnowledgeRetriever {
retriever: CoditectKnowledgeRetriever,
}
impl MultiTenantKnowledgeRetriever {
pub async fn search_for_tenant(
&self,
query: &str,
tenant_id: &str,
agent_type: &str,
) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
let index = self.retriever.client.index(&self.retriever.index_name);
// Strict tenant isolation
let filter = format!(
"tenant_id = {} AND agent_types = {}",
tenant_id,
agent_type
);
let search_query = SearchQuery::new(&index)
.with_query(query)
.with_filter(&filter)
.build();
let results = index.search()
.execute::<CoditectKnowledgeDocument>()
.await?;
Ok(results.hits.into_iter().map(|h| h.result).collect())
}
}
Solution 2: Lunr.js (PUBLIC DOCUMENTATION)
Why Lunr.js for Public Docs
Technical Fit:
- Zero infrastructure: No server required
- Privacy-first: No external services, no tracking
- Offline-capable: Works in air-gapped environments
- Apache 2.0 license: True open source
- Battle-tested: Used by Hugo, Jekyll, Gatsby
Limitations:
- Not suitable for >10K documents
- No real-time updates (build-time only)
- Client-side index increases bundle size
- Basic ranking algorithm
Docusaurus Integration
// docusaurus.config.js
module.exports = {
plugins: [
[
require.resolve('@cmfcmf/docusaurus-search-local'),
{
indexDocs: true,
indexBlog: true,
indexPages: false,
language: ['en'],
style: undefined,
lunr: {
tokenizerSeparator: /[\s\-]+/,
// Stemming for better matches
b: 0.75,
k1: 1.2,
},
// Coditect-specific customization
indexDocSidebarParentCategories: 2,
docsRouteBasePath: '/docs',
blogRouteBasePath: '/blog',
},
],
],
};
Custom Indexing for Technical Terms
// scripts/build-custom-index.js
const lunr = require('lunr');
const fs = require('fs');
const path = require('path');
// Custom stemmer for technical terms
lunr.Pipeline.registerFunction(function technicalTermPreserver(token) {
// Don't stem technical terms
const technicalTerms = [
'foundationdb',
'wasm',
'oauth',
'kubernetes',
'rust',
'typescript'
];
if (technicalTerms.includes(token.toString().toLowerCase())) {
return token;
}
// Default stemming for other words
return lunr.stemmer(token);
}, 'technicalTermPreserver');
// Build index with custom pipeline
const idx = lunr(function() {
this.ref('id');
this.field('title', { boost: 10 });
this.field('content');
// Add custom pipeline step
this.pipeline.remove(lunr.stemmer);
this.pipeline.add(technicalTermPreserver);
// Add documents
documents.forEach(doc => {
this.add(doc);
});
});
// Save index
fs.writeFileSync(
path.join(__dirname, '../static/search-index.json'),
JSON.stringify(idx)
);
Solution 3: Pagefind (MODERN STATIC ALTERNATIVE)
Why Pagefind
Technical Advantages over Lunr.js:
- Rust-based: 10x faster indexing, smaller bundle
- Lazy loading: Only loads index chunks as needed
- Better scaling: Handles 100K+ pages efficiently
- MIT licensed: True open source
- Zero JavaScript: Optional JS enhancement
Integration with Docusaurus
# Install Pagefind
npm install -D pagefind
# Add to package.json
{
"scripts": {
"build": "docusaurus build",
"postbuild": "pagefind --source build --bundle-dir build/pagefind"
}
}
// Custom SearchBar with Pagefind
// src/theme/SearchBar/index.tsx
import React, { useEffect, useRef, useState } from 'react';
export default function SearchBar() {
const searchRef = useRef(null);
const [pagefind, setPagefind] = useState(null);
useEffect(() => {
// Lazy load Pagefind
const loadPagefind = async () => {
const pf = await import('/pagefind/pagefind.js');
await pf.options({
baseUrl: '/',
});
setPagefind(pf);
};
loadPagefind();
}, []);
const handleSearch = async (query) => {
if (!pagefind || !query) return;
const results = await pagefind.search(query);
// Render results
};
return (
<div ref={searchRef}>
<input
type="search"
placeholder="Search docs..."
onChange={(e) => handleSearch(e.target.value)}
/>
</div>
);
}
Solution 4: Typesense (MEILISEARCH ALTERNATIVE)
Why Consider Typesense
Comparison to Meilisearch:
| Feature | Meilisearch | Typesense |
|---|---|---|
| License | MIT | GPL v3 |
| Language | Rust | C++ |
| Performance | Excellent | Excellent |
| Typo tolerance | Yes | Yes |
| Cloud option | Yes (proprietary) | Yes (proprietary) |
| Multi-tenancy | Filter-based | Collection-based |
When to choose Typesense over Meilisearch:
- Already using C++ ecosystem
- Need collection-level isolation (vs filter-based)
- Prefer GPL v3 licensing
Deployment (Nearly Identical to Meilisearch)
# typesense-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: typesense
spec:
serviceName: typesense
replicas: 3
template:
spec:
containers:
- name: typesense
image: typesense/typesense:0.25.2
args:
- "--data-dir=/data"
- "--api-key=$(TYPESENSE_API_KEY)"
- "--enable-cors"
env:
- name: TYPESENSE_API_KEY
valueFrom:
secretKeyRef:
name: typesense-secrets
key: api-key
Recommendation: Stick with Meilisearch unless GPL v3 is specifically required. Meilisearch's MIT license is more permissive for commercial use.
Cost Analysis: Open-Source Self-Hosted
Infrastructure Costs (GKE)
meilisearch_production_costs:
compute:
instance_type: e2-standard-4 # 4 vCPU, 16GB RAM
instances: 3
cost_per_instance_month: $122
total_compute_month: $366
storage:
persistent_volumes: 3
size_per_volume: 100Gi
storage_class: ssd
cost_per_gb_month: $0.17
total_storage_month: $51
networking:
egress_gb_month: 100
cost_per_gb: $0.12
total_networking_month: $12
backup:
gcs_storage_gb: 300 # 3x100GB snapshots
cost_per_gb_month: $0.02
total_backup_month: $6
total_monthly_cost: $435
cost_per_search_query: $0.0000145 # Assuming 30M queries/month
vs_algolia_equivalent:
algolia_cost_30m_queries: $2400/month
savings: $1965/month
roi: 351%
Comparison to Proprietary Alternatives
| Solution | Monthly Cost (30M queries) | Licensing | Data Control |
|---|---|---|---|
| Meilisearch (self-hosted) | $435 | Free (MIT) | Full |
| Typesense (self-hosted) | $435 | Free (GPL v3) | Full |
| Algolia | $2,400 | Pay-per-use | None |
| Orama Cloud | $1,200 | Pay-per-use | Limited |
| Elasticsearch Cloud | $1,800 | Pay-per-use | Limited |
ROI for Coditect:
- Infrastructure cost: $435/month
- Engineering time: 2 weeks initial setup, 4 hours/month maintenance
- Total annual cost: ~$5,220 infrastructure + ~$15K engineering = ~$20K
- Algolia equivalent: ~$29K annually
- Savings: ~$9K/year
- Plus: Full data control, air-gap capability, no vendor lock-in
Implementation Roadmap for Coditect
Phase 1: Public Documentation (Week 1-2)
Goal: Launch coditect.ai/docs with search
# Day 1-2: Setup Docusaurus
npx create-docusaurus@latest coditect-docs classic
cd coditect-docs
# Day 3-4: Add Lunr.js search
npm install @cmfcmf/docusaurus-search-local
# Configure in docusaurus.config.js
# (see config above)
# Day 5-7: Write initial documentation
# - Getting Started
# - Architecture Overview
# - Agent Types
# - API Reference
# Day 8-10: Deploy to Vercel/Cloudflare Pages
npm run build
# Deploy build/ directory
# Search works immediately, zero infrastructure!
Success Criteria:
- ✅ Documentation site live
- ✅ Search functional
- ✅ <2s page load time
- ✅ Zero infrastructure cost
Phase 2: Agent Knowledge Retrieval (Week 3-6)
Goal: Meilisearch for autonomous agent knowledge
// Week 3: Deploy Meilisearch to GKE
// kubectl apply -f meilisearch-statefulset.yaml
// Week 4: Build indexing pipeline
// - Extract ADRs, patterns, compliance rules
// - Generate Claude summaries
// - Calculate token estimates
// - Upload to Meilisearch
// Week 5: Integrate with agents
// - Implement KnowledgeRetriever
// - Wire into Architect agent
// - Add token-conscious filtering
// - Measure performance
// Week 6: Expand to all agent types
// - Orchestrator knowledge retrieval
// - Implementer pattern lookup
// - Reviewer compliance checks
Success Criteria:
- ✅ Meilisearch deployed and stable
- ✅ Knowledge base indexed (1000+ documents)
- ✅ Agent retrieval latency <100ms p95
- ✅ Token reduction >30%
Phase 3: Multi-Tenant Customer Portal (Week 7-10)
Goal: Tenant-isolated knowledge base
// Week 7-8: Multi-tenant architecture
// - Implement tenant_id filtering
// - Build tenant data isolation
// - Add customer-specific documentation
// Week 9: Customer portal UI
// - Docusaurus with custom SearchBar
// - Meilisearch backend
// - Tenant authentication
// Week 10: Production hardening
// - Load testing
// - Security audit
// - Monitoring dashboards
Success Criteria:
- ✅ 10+ customers with isolated knowledge bases
- ✅ Zero cross-tenant data leakage
- ✅ <200ms search latency
- ✅ 99.9% uptime
Monitoring & Operations
Prometheus Metrics
# meilisearch-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: meilisearch
spec:
selector:
matchLabels:
app: meilisearch
endpoints:
- port: http
path: /metrics
// Custom metrics for Coditect
use prometheus::{Counter, Histogram, Registry};
lazy_static! {
static ref SEARCH_QUERIES: Counter = Counter::new(
"coditect_search_queries_total",
"Total search queries"
).unwrap();
static ref SEARCH_LATENCY: Histogram = Histogram::new(
"coditect_search_latency_seconds",
"Search query latency"
).unwrap();
static ref TOKEN_BUDGET_USAGE: Histogram = Histogram::new(
"coditect_token_budget_usage",
"Token budget usage per search"
).unwrap();
}
Grafana Dashboard
{
"dashboard": {
"title": "Coditect Knowledge Retrieval",
"panels": [
{
"title": "Search QPS",
"targets": [{
"expr": "rate(coditect_search_queries_total[5m])"
}]
},
{
"title": "P95 Latency",
"targets": [{
"expr": "histogram_quantile(0.95, coditect_search_latency_seconds)"
}]
},
{
"title": "Token Budget Efficiency",
"targets": [{
"expr": "avg(coditect_token_budget_usage)"
}]
}
]
}
}
Air-Gapped Deployment (Healthcare Clients)
Offline Meilisearch Setup
# On internet-connected machine:
# 1. Download Meilisearch binary
wget https://github.com/meilisearch/meilisearch/releases/download/v1.5.0/meilisearch-linux-amd64
# 2. Build knowledge index
cargo run --release --bin index_knowledge_base -- \
--repo-path /path/to/coditect \
--output knowledge_index.meilisearch
# 3. Package for air-gapped deployment
tar czf coditect-knowledge-airgap.tar.gz \
meilisearch-linux-amd64 \
knowledge_index.meilisearch \
deploy.sh
# On air-gapped machine:
tar xzf coditect-knowledge-airgap.tar.gz
./deploy.sh --data-dir /opt/meilisearch/data
Air-Gapped Update Process
# scripts/airgap-update.sh
#!/bin/bash
set -e
# 1. Export updated index from staging
./meilisearch-exporter \
--host http://staging-meilisearch:7700 \
--index coditect_knowledge \
--output knowledge_index_v2.meilisearch
# 2. Package with version
VERSION=$(date +%Y%m%d-%H%M%S)
tar czf coditect-knowledge-update-${VERSION}.tar.gz \
knowledge_index_v2.meilisearch \
update.sh
# 3. Transfer via approved method (USB, secure file transfer)
# Customer applies update on air-gapped network
# 4. On air-gapped machine:
./update.sh --backup-existing
Compliance Considerations
Data Residency
meilisearch_compliance:
fda_21_cfr_part_11:
audit_trail: enabled
implementation: meilisearch_logs + external_audit_system
retention: 7_years
hipaa:
encryption_at_rest: enabled # Persistent volume encryption
encryption_in_transit: mutual_tls
access_control: rbac
phi_handling: no_phi_in_search_index
soc2:
monitoring: prometheus_metrics
alerting: grafana_alerts
backup: automated_every_6_hours
disaster_recovery: multi_az_deployment
Search Query Logging
// Compliance-aware search logging
pub async fn search_with_audit_log(
query: &str,
user_id: &str,
tenant_id: &str,
) -> Result<Vec<Document>, Error> {
// Log search query for compliance
audit_log::record(AuditEvent {
event_type: "search_query",
user_id: user_id.to_string(),
tenant_id: tenant_id.to_string(),
query: query.to_string(),
timestamp: Utc::now(),
ip_address: request_ip,
});
// Execute search
let results = search_internal(query, tenant_id).await?;
// Log results count (not content)
audit_log::record(AuditEvent {
event_type: "search_results",
user_id: user_id.to_string(),
results_count: results.len(),
timestamp: Utc::now(),
});
Ok(results)
}
Migration Path from Proprietary to Open-Source
If Currently Using Algolia
# scripts/migrate_from_algolia.py
import os
from algoliasearch.search_client import SearchClient as AlgoliaClient
from meilisearch import Client as MeilisearchClient
def migrate_algolia_to_meilisearch():
# Connect to both
algolia = AlgoliaClient.create(
os.environ['ALGOLIA_APP_ID'],
os.environ['ALGOLIA_ADMIN_KEY']
)
meilisearch = MeilisearchClient(
os.environ['MEILI_HOST'],
os.environ['MEILI_ADMIN_KEY']
)
# Export from Algolia
algolia_index = algolia.init_index('coditect_docs')
documents = []
for hit in algolia_index.browse_objects():
documents.append(hit)
# Transform for Meilisearch
meili_docs = transform_documents(documents)
# Import to Meilisearch
meili_index = meilisearch.index('coditect_knowledge')
meili_index.add_documents(meili_docs)
print(f"Migrated {len(documents)} documents")
def transform_documents(algolia_docs):
"""Transform Algolia schema to Meilisearch"""
return [
{
'id': doc['objectID'],
'title': doc.get('title'),
'content': doc.get('content'),
# Map other fields
}
for doc in algolia_docs
]
Final Recommendation for Coditect
Dual Architecture
recommended_implementation:
public_documentation:
solution: lunr.js
reasoning:
- Zero infrastructure cost
- Perfect for <10K docs
- Privacy-first (no external calls)
- Works offline
deployment: docusaurus + lunr plugin
timeline: 1-2 weeks
agent_knowledge_retrieval:
solution: meilisearch_self_hosted
reasoning:
- Rust-based (matches Coditect stack)
- MIT license (true open source)
- Excellent performance
- Multi-tenant capable
- Air-gap compatible
deployment: kubernetes_statefulset
timeline: 4-6 weeks
customer_portal:
solution: meilisearch_multi_tenant
reasoning:
- Same infrastructure as agent knowledge
- Tenant isolation built-in
- Customer-specific documentation
deployment: shared_meilisearch_cluster
timeline: 2-3 weeks
Total Cost of Ownership (3 Years)
open_source_tco:
year_1:
infrastructure: $5,220
engineering: $30,000 # Initial setup + 3 months learning
total: $35,220
year_2:
infrastructure: $5,220
engineering: $12,000 # 1 hour/week maintenance
total: $17,220
year_3:
infrastructure: $5,220
engineering: $12,000
total: $17,220
three_year_total: $69,660
proprietary_alternative_tco:
year_1:
algolia_fees: $28,800
engineering: $5,000 # Easier setup
total: $33,800
year_2:
algolia_fees: $34,560 # 20% growth
engineering: $2,000
total: $36,560
year_3:
algolia_fees: $41,472
engineering: $2,000
total: $43,472
three_year_total: $113,832
savings_with_open_source: $44,172
plus_benefits:
- Full data control
- No vendor lock-in
- Air-gap capability
- Compliance-native
- Unlimited scaling
Strategic Value
Beyond Cost Savings:
- Competitive Advantage: "100% open-source AI development platform"
- Customer Trust: No proprietary lock-in for regulated industries
- Compliance: Air-gap deployments for FDA/HIPAA clients
- Control: Customize ranking, filtering, indexing for Coditect's needs
- Scaling: No per-query fees as usage grows
Conclusion
Primary Recommendation: Meilisearch (self-hosted) + Lunr.js (public docs)
Implementation Priority:
- Week 1-2: Deploy Lunr.js for coditect.ai/docs
- Week 3-6: Deploy Meilisearch for agent knowledge retrieval
- Week 7-10: Expand to multi-tenant customer portal
Success Metrics:
- Public docs search: >60% usage rate
- Agent knowledge retrieval: <100ms p95 latency
- Token savings: >30% reduction
- Infrastructure cost: <$500/month
- Customer satisfaction: "Search just works"
Open-Source Wins:
- ✅ No licensing fees
- ✅ Full source code access
- ✅ Community-driven development
- ✅ No vendor lock-in
- ✅ Air-gap deployment capability
- ✅ Compliance-friendly
- ✅ Unlimited customization
This architecture gives Coditect a production-grade, fully open-source search solution that scales from public documentation to enterprise multi-tenant deployments, all while maintaining the autonomy and control critical for a platform serving regulated industries.