Skip to main content

Open-Source Search Solutions for Coditect: Production Implementation Guide

Executive Summary

Mandate: 100% open-source, self-hosted, no proprietary dependencies

Recommended Stack:

  1. Primary: Meilisearch (MIT License) - Production knowledge retrieval for agents
  2. Backup: Typesense (GPL v3) - Equivalent alternative
  3. Lightweight: Lunr.js (Apache 2.0) - Client-side documentation search
  4. Modern Static: Pagefind (MIT) - Zero-dependency static site search

Strategic Rationale: Meilisearch + Lunr.js dual deployment

  • Meilisearch: Agent knowledge retrieval (server-side, high-performance)
  • Lunr.js: Public documentation (client-side, zero infrastructure)

Open-Source Search Landscape

Comparison Matrix

SolutionLicenseDeploymentIndexingLanguageScalabilityUse Case
MeilisearchMITSelf-hostedServer-sideRustMillions of docsAgent knowledge, APIs
TypesenseGPL v3Self-hostedServer-sideC++Millions of docsAgent knowledge, APIs
Lunr.jsApache 2.0Client-sideBuild-timeJavaScript~10K docsPublic documentation
PagefindMITClient-sideBuild-timeRust100K+ docsStatic documentation
FlexSearchApache 2.0Client-sideBuild-timeJavaScript~50K docsDocumentation
StorkApache 2.0Client-sideBuild-timeRust~100K docsDocumentation
ElasticsearchElastic License 2.0*Self-hostedServer-sideJavaBillions of docsEnterprise search

*Note: Elasticsearch changed from Apache 2.0 to Elastic License 2.0 (not OSI-approved) in 2021. OpenSearch (AWS fork) remains Apache 2.0.


coditect_search_architecture:

public_documentation:
solution: lunr.js # or Pagefind
deployment: static_bundle
location: coditect.ai/docs
cost: zero_infrastructure

agent_knowledge_retrieval:
solution: meilisearch
deployment: self_hosted_gke
location: internal_cluster
cost: compute_only

customer_portal:
solution: meilisearch
deployment: multi_tenant_instance
isolation: tenant_id_filtering
cost: shared_infrastructure

compliance_documentation:
solution: meilisearch
deployment: air_gapped_instance
location: customer_premises
cost: customer_managed

Solution 1: Meilisearch (PRIMARY RECOMMENDATION)

Why Meilisearch for Coditect

Technical Fit:

  • Rust-based: Performance, memory safety, aligns with Coditect's Rust stack
  • MIT License: True open source, no licensing concerns
  • Typo tolerance: Critical for agent queries with technical terms
  • Fast indexing: Millions of documents, sub-second updates
  • Multi-tenancy: Built-in tenant filtering via filter parameter
  • Simple deployment: Single binary, minimal dependencies

Coditect-Specific Advantages:

// Meilisearch aligns with Coditect's Rust ecosystem
// Agent knowledge retrieval in native language

use meilisearch_sdk::{Client, IndexConfig};
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
struct CoditectKnowledgeDocument {
id: String,
document_type: String, // "adr", "pattern", "compliance"
title: String,
content: String,
summary: String,
agent_types: Vec<String>,
compliance_frameworks: Vec<String>,
token_estimate: u32,
priority_score: f32,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Initialize Meilisearch client
let client = Client::new(
"http://meilisearch:7700",
Some("MASTER_KEY")
);

// Create index for Coditect knowledge base
let index = client
.create_index("coditect_knowledge", Some("id"))
.await?;

// Configure searchable and filterable attributes
index.set_searchable_attributes(&[
"title",
"summary",
"content",
]).await?;

index.set_filterable_attributes(&[
"agent_types",
"document_type",
"compliance_frameworks",
]).await?;

index.set_ranking_rules(&[
"words",
"typo",
"priority_score:desc",
"proximity",
"attribute",
"exactness",
]).await?;

Ok(())
}

Production Deployment Architecture

meilisearch_production_deployment:

infrastructure:
platform: google_kubernetes_engine
deployment: statefulset
replicas: 3 # HA configuration

storage:
type: persistent_volume
size: 100Gi # Scales with knowledge base
storage_class: ssd

networking:
service_type: ClusterIP
internal_only: true # Not exposed to internet
tls: mutual_tls

resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi

backup:
schedule: "0 */6 * * *" # Every 6 hours
retention: 7_days
destination: gcs_bucket

monitoring:
prometheus: true
grafana_dashboard: true
alerts:
- search_latency_p95 > 200ms
- index_size > 80GB
- memory_usage > 3.5Gi

Kubernetes Deployment

# meilisearch-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: meilisearch
namespace: coditect-platform
spec:
serviceName: meilisearch
replicas: 3
selector:
matchLabels:
app: meilisearch
template:
metadata:
labels:
app: meilisearch
spec:
containers:
- name: meilisearch
image: getmeili/meilisearch:v1.5
ports:
- containerPort: 7700
name: http
env:
- name: MEILI_MASTER_KEY
valueFrom:
secretKeyRef:
name: meilisearch-secrets
key: master-key
- name: MEILI_ENV
value: "production"
- name: MEILI_NO_ANALYTICS
value: "true" # Privacy-focused
- name: MEILI_LOG_LEVEL
value: "INFO"
resources:
requests:
cpu: 1000m
memory: 2Gi
limits:
cpu: 2000m
memory: 4Gi
volumeMounts:
- name: data
mountPath: /meili_data
livenessProbe:
httpGet:
path: /health
port: 7700
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 7700
initialDelaySeconds: 5
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: ssd
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
name: meilisearch
namespace: coditect-platform
spec:
clusterIP: None # Headless service
selector:
app: meilisearch
ports:
- port: 7700
targetPort: 7700
name: http

Agent Knowledge Retrieval Implementation

// src/knowledge/meilisearch_retrieval.rs

use meilisearch_sdk::{Client, SearchQuery, SearchResults};
use std::collections::HashMap;

pub struct CoditectKnowledgeRetriever {
client: Client,
index_name: String,
}

impl CoditectKnowledgeRetriever {
pub fn new(host: &str, api_key: &str) -> Self {
let client = Client::new(host, Some(api_key));
Self {
client,
index_name: "coditect_knowledge".to_string(),
}
}

pub async fn search_for_agent(
&self,
query: &str,
agent_type: &str,
compliance_frameworks: Vec<&str>,
max_tokens: u32,
) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
let index = self.client.index(&self.index_name);

// Build filter string
let mut filters = vec![
format!("agent_types = {}", agent_type)
];

if !compliance_frameworks.is_empty() {
let compliance_filter = compliance_frameworks
.iter()
.map(|fw| format!("compliance_frameworks = {}", fw))
.collect::<Vec<_>>()
.join(" OR ");
filters.push(format!("({})", compliance_filter));
}

let filter_str = filters.join(" AND ");

// Execute search
let search_query = SearchQuery::new(&index)
.with_query(query)
.with_filter(&filter_str)
.with_limit(50) // Get more, filter by token budget
.build();

let results: SearchResults<CoditectKnowledgeDocument> =
index.search().execute::<CoditectKnowledgeDocument>().await?;

// Token-conscious selection
let mut selected = Vec::new();
let mut token_budget = max_tokens;

for hit in results.hits {
let doc = hit.result;
if doc.token_estimate <= token_budget {
token_budget -= doc.token_estimate;
selected.push(doc);
}

if token_budget < 500 {
break;
}
}

Ok(selected)
}

pub async fn search_with_multi_tenancy(
&self,
query: &str,
tenant_id: &str,
) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
let index = self.client.index(&self.index_name);

let search_query = SearchQuery::new(&index)
.with_query(query)
.with_filter(&format!("tenant_id = {}", tenant_id))
.build();

let results = index.search()
.execute::<CoditectKnowledgeDocument>()
.await?;

Ok(results.hits.into_iter().map(|h| h.result).collect())
}
}

// Usage in agent orchestration
#[tokio::main]
async fn main() {
let retriever = CoditectKnowledgeRetriever::new(
"http://meilisearch:7700",
&std::env::var("MEILI_API_KEY").unwrap()
);

// Architect agent searching for design patterns
let knowledge = retriever.search_for_agent(
"event-driven architecture patterns",
"architect",
vec!["fda_21_cfr_11"],
15000 // 15K token budget
).await.unwrap();

println!("Retrieved {} knowledge documents", knowledge.len());
for doc in knowledge {
println!("- {} ({} tokens)", doc.title, doc.token_estimate);
}
}

Indexing Pipeline (CI/CD Integration)

// scripts/index_knowledge_base/src/main.rs

use meilisearch_sdk::Client;
use std::path::Path;
use walkdir::WalkDir;
use anthropic_sdk::Client as AnthropicClient;

struct KnowledgeIndexer {
meilisearch: Client,
anthropic: AnthropicClient,
documents: Vec<CoditectKnowledgeDocument>,
}

impl KnowledgeIndexer {
pub async fn build_index(&mut self, repo_path: &Path) -> Result<(), Box<dyn std::error::Error>> {
// 1. Extract ADRs
self.extract_adrs(repo_path.join("docs/adr")).await?;

// 2. Extract compliance rules
self.extract_compliance_rules(repo_path.join("compliance")).await?;

// 3. Extract code patterns
self.extract_patterns(repo_path.join("src")).await?;

// 4. Generate summaries with Claude
self.generate_summaries().await?;

// 5. Calculate token estimates
self.calculate_token_estimates();

// 6. Upload to Meilisearch
self.upload_to_meilisearch().await?;

Ok(())
}

async fn extract_adrs(&mut self, adr_path: std::path::PathBuf) -> Result<(), Box<dyn std::error::Error>> {
for entry in WalkDir::new(adr_path) {
let entry = entry?;
if entry.path().extension().and_then(|s| s.to_str()) == Some("md") {
let content = std::fs::read_to_string(entry.path())?;

// Parse ADR structure
let adr_data = self.parse_adr(&content)?;

// Generate summary with Claude
let summary = self.generate_summary_with_claude(
&content,
"adr",
"Coditect architecture decision"
).await?;

self.documents.push(CoditectKnowledgeDocument {
id: format!("adr_{}", entry.path().file_stem().unwrap().to_str().unwrap()),
document_type: "adr".to_string(),
title: adr_data.title,
content,
summary,
agent_types: vec!["architect".to_string(), "orchestrator".to_string()],
compliance_frameworks: vec![],
token_estimate: 0, // Calculated later
priority_score: 1.0,
});
}
}
Ok(())
}

async fn generate_summary_with_claude(
&self,
content: &str,
doc_type: &str,
context: &str
) -> Result<String, Box<dyn std::error::Error>> {
let prompt = format!(
"Generate a concise 2-3 sentence summary of this {} for AI agent knowledge retrieval.\n\
Context: {}\n\
Requirements:\n\
- Focus on key decisions and rationale\n\
- Include relevant technical terms\n\
- Keep under 100 tokens\n\n\
Document:\n{}",
doc_type,
context,
&content[..content.len().min(5000)]
);

let response = self.anthropic.messages()
.create()
.model("claude-sonnet-4-20250514")
.max_tokens(500)
.user_message(&prompt)
.send()
.await?;

Ok(response.content[0].text.clone())
}

fn calculate_token_estimates(&mut self) {
for doc in &mut self.documents {
// Rough estimate: 1 token ≈ 4 characters
doc.token_estimate = (doc.content.len() / 4) as u32;
}
}

async fn upload_to_meilisearch(&self) -> Result<(), Box<dyn std::error::Error>> {
let index = self.meilisearch.index("coditect_knowledge");

// Configure index
index.set_searchable_attributes(&[
"title", "summary", "content"
]).await?;

index.set_filterable_attributes(&[
"agent_types", "document_type", "compliance_frameworks"
]).await?;

index.set_ranking_rules(&[
"words",
"typo",
"priority_score:desc",
"proximity",
"attribute",
"exactness",
]).await?;

// Batch upload
const BATCH_SIZE: usize = 100;
for chunk in self.documents.chunks(BATCH_SIZE) {
index.add_documents(chunk, Some("id")).await?;
}

println!("Indexed {} documents", self.documents.len());
Ok(())
}
}

// CI/CD integration
// .github/workflows/index-knowledge.yml
# .github/workflows/index-knowledge.yml
name: Index Knowledge Base

on:
push:
branches: [main]
paths:
- 'docs/adr/**'
- 'compliance/**'
- 'src/**/*.rs'

jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Setup Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable

- name: Build indexer
run: |
cd scripts/index_knowledge_base
cargo build --release

- name: Run indexing
env:
MEILI_HOST: ${{ secrets.MEILI_HOST }}
MEILI_ADMIN_KEY: ${{ secrets.MEILI_ADMIN_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: |
./scripts/index_knowledge_base/target/release/index_knowledge_base \
--repo-path . \
--environment production

- name: Verify index
run: |
curl -H "Authorization: Bearer ${{ secrets.MEILI_ADMIN_KEY }}" \
"${{ secrets.MEILI_HOST }}/indexes/coditect_knowledge/stats"

Multi-Tenant Implementation

// Multi-tenant knowledge base with tenant isolation

pub struct MultiTenantKnowledgeRetriever {
retriever: CoditectKnowledgeRetriever,
}

impl MultiTenantKnowledgeRetriever {
pub async fn search_for_tenant(
&self,
query: &str,
tenant_id: &str,
agent_type: &str,
) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
let index = self.retriever.client.index(&self.retriever.index_name);

// Strict tenant isolation
let filter = format!(
"tenant_id = {} AND agent_types = {}",
tenant_id,
agent_type
);

let search_query = SearchQuery::new(&index)
.with_query(query)
.with_filter(&filter)
.build();

let results = index.search()
.execute::<CoditectKnowledgeDocument>()
.await?;

Ok(results.hits.into_iter().map(|h| h.result).collect())
}
}

Solution 2: Lunr.js (PUBLIC DOCUMENTATION)

Why Lunr.js for Public Docs

Technical Fit:

  • Zero infrastructure: No server required
  • Privacy-first: No external services, no tracking
  • Offline-capable: Works in air-gapped environments
  • Apache 2.0 license: True open source
  • Battle-tested: Used by Hugo, Jekyll, Gatsby

Limitations:

  • Not suitable for >10K documents
  • No real-time updates (build-time only)
  • Client-side index increases bundle size
  • Basic ranking algorithm

Docusaurus Integration

// docusaurus.config.js
module.exports = {
plugins: [
[
require.resolve('@cmfcmf/docusaurus-search-local'),
{
indexDocs: true,
indexBlog: true,
indexPages: false,
language: ['en'],
style: undefined,
lunr: {
tokenizerSeparator: /[\s\-]+/,
// Stemming for better matches
b: 0.75,
k1: 1.2,
},
// Coditect-specific customization
indexDocSidebarParentCategories: 2,
docsRouteBasePath: '/docs',
blogRouteBasePath: '/blog',
},
],
],
};

Custom Indexing for Technical Terms

// scripts/build-custom-index.js
const lunr = require('lunr');
const fs = require('fs');
const path = require('path');

// Custom stemmer for technical terms
lunr.Pipeline.registerFunction(function technicalTermPreserver(token) {
// Don't stem technical terms
const technicalTerms = [
'foundationdb',
'wasm',
'oauth',
'kubernetes',
'rust',
'typescript'
];

if (technicalTerms.includes(token.toString().toLowerCase())) {
return token;
}

// Default stemming for other words
return lunr.stemmer(token);
}, 'technicalTermPreserver');

// Build index with custom pipeline
const idx = lunr(function() {
this.ref('id');
this.field('title', { boost: 10 });
this.field('content');

// Add custom pipeline step
this.pipeline.remove(lunr.stemmer);
this.pipeline.add(technicalTermPreserver);

// Add documents
documents.forEach(doc => {
this.add(doc);
});
});

// Save index
fs.writeFileSync(
path.join(__dirname, '../static/search-index.json'),
JSON.stringify(idx)
);

Solution 3: Pagefind (MODERN STATIC ALTERNATIVE)

Why Pagefind

Technical Advantages over Lunr.js:

  • Rust-based: 10x faster indexing, smaller bundle
  • Lazy loading: Only loads index chunks as needed
  • Better scaling: Handles 100K+ pages efficiently
  • MIT licensed: True open source
  • Zero JavaScript: Optional JS enhancement

Integration with Docusaurus

# Install Pagefind
npm install -D pagefind

# Add to package.json
{
"scripts": {
"build": "docusaurus build",
"postbuild": "pagefind --source build --bundle-dir build/pagefind"
}
}
// Custom SearchBar with Pagefind
// src/theme/SearchBar/index.tsx
import React, { useEffect, useRef, useState } from 'react';

export default function SearchBar() {
const searchRef = useRef(null);
const [pagefind, setPagefind] = useState(null);

useEffect(() => {
// Lazy load Pagefind
const loadPagefind = async () => {
const pf = await import('/pagefind/pagefind.js');
await pf.options({
baseUrl: '/',
});
setPagefind(pf);
};

loadPagefind();
}, []);

const handleSearch = async (query) => {
if (!pagefind || !query) return;

const results = await pagefind.search(query);
// Render results
};

return (
<div ref={searchRef}>
<input
type="search"
placeholder="Search docs..."
onChange={(e) => handleSearch(e.target.value)}
/>
</div>
);
}

Solution 4: Typesense (MEILISEARCH ALTERNATIVE)

Why Consider Typesense

Comparison to Meilisearch:

FeatureMeilisearchTypesense
LicenseMITGPL v3
LanguageRustC++
PerformanceExcellentExcellent
Typo toleranceYesYes
Cloud optionYes (proprietary)Yes (proprietary)
Multi-tenancyFilter-basedCollection-based

When to choose Typesense over Meilisearch:

  • Already using C++ ecosystem
  • Need collection-level isolation (vs filter-based)
  • Prefer GPL v3 licensing

Deployment (Nearly Identical to Meilisearch)

# typesense-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: typesense
spec:
serviceName: typesense
replicas: 3
template:
spec:
containers:
- name: typesense
image: typesense/typesense:0.25.2
args:
- "--data-dir=/data"
- "--api-key=$(TYPESENSE_API_KEY)"
- "--enable-cors"
env:
- name: TYPESENSE_API_KEY
valueFrom:
secretKeyRef:
name: typesense-secrets
key: api-key

Recommendation: Stick with Meilisearch unless GPL v3 is specifically required. Meilisearch's MIT license is more permissive for commercial use.


Cost Analysis: Open-Source Self-Hosted

Infrastructure Costs (GKE)

meilisearch_production_costs:

compute:
instance_type: e2-standard-4 # 4 vCPU, 16GB RAM
instances: 3
cost_per_instance_month: $122
total_compute_month: $366

storage:
persistent_volumes: 3
size_per_volume: 100Gi
storage_class: ssd
cost_per_gb_month: $0.17
total_storage_month: $51

networking:
egress_gb_month: 100
cost_per_gb: $0.12
total_networking_month: $12

backup:
gcs_storage_gb: 300 # 3x100GB snapshots
cost_per_gb_month: $0.02
total_backup_month: $6

total_monthly_cost: $435

cost_per_search_query: $0.0000145 # Assuming 30M queries/month

vs_algolia_equivalent:
algolia_cost_30m_queries: $2400/month
savings: $1965/month
roi: 351%

Comparison to Proprietary Alternatives

SolutionMonthly Cost (30M queries)LicensingData Control
Meilisearch (self-hosted)$435Free (MIT)Full
Typesense (self-hosted)$435Free (GPL v3)Full
Algolia$2,400Pay-per-useNone
Orama Cloud$1,200Pay-per-useLimited
Elasticsearch Cloud$1,800Pay-per-useLimited

ROI for Coditect:

  • Infrastructure cost: $435/month
  • Engineering time: 2 weeks initial setup, 4 hours/month maintenance
  • Total annual cost: ~$5,220 infrastructure + ~$15K engineering = ~$20K
  • Algolia equivalent: ~$29K annually
  • Savings: ~$9K/year
  • Plus: Full data control, air-gap capability, no vendor lock-in

Implementation Roadmap for Coditect

Phase 1: Public Documentation (Week 1-2)

Goal: Launch coditect.ai/docs with search

# Day 1-2: Setup Docusaurus
npx create-docusaurus@latest coditect-docs classic
cd coditect-docs

# Day 3-4: Add Lunr.js search
npm install @cmfcmf/docusaurus-search-local

# Configure in docusaurus.config.js
# (see config above)

# Day 5-7: Write initial documentation
# - Getting Started
# - Architecture Overview
# - Agent Types
# - API Reference

# Day 8-10: Deploy to Vercel/Cloudflare Pages
npm run build
# Deploy build/ directory

# Search works immediately, zero infrastructure!

Success Criteria:

  • ✅ Documentation site live
  • ✅ Search functional
  • ✅ <2s page load time
  • ✅ Zero infrastructure cost

Phase 2: Agent Knowledge Retrieval (Week 3-6)

Goal: Meilisearch for autonomous agent knowledge

// Week 3: Deploy Meilisearch to GKE
// kubectl apply -f meilisearch-statefulset.yaml

// Week 4: Build indexing pipeline
// - Extract ADRs, patterns, compliance rules
// - Generate Claude summaries
// - Calculate token estimates
// - Upload to Meilisearch

// Week 5: Integrate with agents
// - Implement KnowledgeRetriever
// - Wire into Architect agent
// - Add token-conscious filtering
// - Measure performance

// Week 6: Expand to all agent types
// - Orchestrator knowledge retrieval
// - Implementer pattern lookup
// - Reviewer compliance checks

Success Criteria:

  • ✅ Meilisearch deployed and stable
  • ✅ Knowledge base indexed (1000+ documents)
  • ✅ Agent retrieval latency <100ms p95
  • ✅ Token reduction >30%

Phase 3: Multi-Tenant Customer Portal (Week 7-10)

Goal: Tenant-isolated knowledge base

// Week 7-8: Multi-tenant architecture
// - Implement tenant_id filtering
// - Build tenant data isolation
// - Add customer-specific documentation

// Week 9: Customer portal UI
// - Docusaurus with custom SearchBar
// - Meilisearch backend
// - Tenant authentication

// Week 10: Production hardening
// - Load testing
// - Security audit
// - Monitoring dashboards

Success Criteria:

  • ✅ 10+ customers with isolated knowledge bases
  • ✅ Zero cross-tenant data leakage
  • ✅ <200ms search latency
  • ✅ 99.9% uptime

Monitoring & Operations

Prometheus Metrics

# meilisearch-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: meilisearch
spec:
selector:
matchLabels:
app: meilisearch
endpoints:
- port: http
path: /metrics
// Custom metrics for Coditect
use prometheus::{Counter, Histogram, Registry};

lazy_static! {
static ref SEARCH_QUERIES: Counter = Counter::new(
"coditect_search_queries_total",
"Total search queries"
).unwrap();

static ref SEARCH_LATENCY: Histogram = Histogram::new(
"coditect_search_latency_seconds",
"Search query latency"
).unwrap();

static ref TOKEN_BUDGET_USAGE: Histogram = Histogram::new(
"coditect_token_budget_usage",
"Token budget usage per search"
).unwrap();
}

Grafana Dashboard

{
"dashboard": {
"title": "Coditect Knowledge Retrieval",
"panels": [
{
"title": "Search QPS",
"targets": [{
"expr": "rate(coditect_search_queries_total[5m])"
}]
},
{
"title": "P95 Latency",
"targets": [{
"expr": "histogram_quantile(0.95, coditect_search_latency_seconds)"
}]
},
{
"title": "Token Budget Efficiency",
"targets": [{
"expr": "avg(coditect_token_budget_usage)"
}]
}
]
}
}

Air-Gapped Deployment (Healthcare Clients)

Offline Meilisearch Setup

# On internet-connected machine:
# 1. Download Meilisearch binary
wget https://github.com/meilisearch/meilisearch/releases/download/v1.5.0/meilisearch-linux-amd64

# 2. Build knowledge index
cargo run --release --bin index_knowledge_base -- \
--repo-path /path/to/coditect \
--output knowledge_index.meilisearch

# 3. Package for air-gapped deployment
tar czf coditect-knowledge-airgap.tar.gz \
meilisearch-linux-amd64 \
knowledge_index.meilisearch \
deploy.sh

# On air-gapped machine:
tar xzf coditect-knowledge-airgap.tar.gz
./deploy.sh --data-dir /opt/meilisearch/data

Air-Gapped Update Process

# scripts/airgap-update.sh

#!/bin/bash
set -e

# 1. Export updated index from staging
./meilisearch-exporter \
--host http://staging-meilisearch:7700 \
--index coditect_knowledge \
--output knowledge_index_v2.meilisearch

# 2. Package with version
VERSION=$(date +%Y%m%d-%H%M%S)
tar czf coditect-knowledge-update-${VERSION}.tar.gz \
knowledge_index_v2.meilisearch \
update.sh

# 3. Transfer via approved method (USB, secure file transfer)
# Customer applies update on air-gapped network

# 4. On air-gapped machine:
./update.sh --backup-existing

Compliance Considerations

Data Residency

meilisearch_compliance:

fda_21_cfr_part_11:
audit_trail: enabled
implementation: meilisearch_logs + external_audit_system
retention: 7_years

hipaa:
encryption_at_rest: enabled # Persistent volume encryption
encryption_in_transit: mutual_tls
access_control: rbac
phi_handling: no_phi_in_search_index

soc2:
monitoring: prometheus_metrics
alerting: grafana_alerts
backup: automated_every_6_hours
disaster_recovery: multi_az_deployment

Search Query Logging

// Compliance-aware search logging
pub async fn search_with_audit_log(
query: &str,
user_id: &str,
tenant_id: &str,
) -> Result<Vec<Document>, Error> {
// Log search query for compliance
audit_log::record(AuditEvent {
event_type: "search_query",
user_id: user_id.to_string(),
tenant_id: tenant_id.to_string(),
query: query.to_string(),
timestamp: Utc::now(),
ip_address: request_ip,
});

// Execute search
let results = search_internal(query, tenant_id).await?;

// Log results count (not content)
audit_log::record(AuditEvent {
event_type: "search_results",
user_id: user_id.to_string(),
results_count: results.len(),
timestamp: Utc::now(),
});

Ok(results)
}

Migration Path from Proprietary to Open-Source

If Currently Using Algolia

# scripts/migrate_from_algolia.py

import os
from algoliasearch.search_client import SearchClient as AlgoliaClient
from meilisearch import Client as MeilisearchClient

def migrate_algolia_to_meilisearch():
# Connect to both
algolia = AlgoliaClient.create(
os.environ['ALGOLIA_APP_ID'],
os.environ['ALGOLIA_ADMIN_KEY']
)

meilisearch = MeilisearchClient(
os.environ['MEILI_HOST'],
os.environ['MEILI_ADMIN_KEY']
)

# Export from Algolia
algolia_index = algolia.init_index('coditect_docs')

documents = []
for hit in algolia_index.browse_objects():
documents.append(hit)

# Transform for Meilisearch
meili_docs = transform_documents(documents)

# Import to Meilisearch
meili_index = meilisearch.index('coditect_knowledge')
meili_index.add_documents(meili_docs)

print(f"Migrated {len(documents)} documents")

def transform_documents(algolia_docs):
"""Transform Algolia schema to Meilisearch"""
return [
{
'id': doc['objectID'],
'title': doc.get('title'),
'content': doc.get('content'),
# Map other fields
}
for doc in algolia_docs
]

Final Recommendation for Coditect

Dual Architecture

recommended_implementation:

public_documentation:
solution: lunr.js
reasoning:
- Zero infrastructure cost
- Perfect for <10K docs
- Privacy-first (no external calls)
- Works offline
deployment: docusaurus + lunr plugin
timeline: 1-2 weeks

agent_knowledge_retrieval:
solution: meilisearch_self_hosted
reasoning:
- Rust-based (matches Coditect stack)
- MIT license (true open source)
- Excellent performance
- Multi-tenant capable
- Air-gap compatible
deployment: kubernetes_statefulset
timeline: 4-6 weeks

customer_portal:
solution: meilisearch_multi_tenant
reasoning:
- Same infrastructure as agent knowledge
- Tenant isolation built-in
- Customer-specific documentation
deployment: shared_meilisearch_cluster
timeline: 2-3 weeks

Total Cost of Ownership (3 Years)

open_source_tco:
year_1:
infrastructure: $5,220
engineering: $30,000 # Initial setup + 3 months learning
total: $35,220

year_2:
infrastructure: $5,220
engineering: $12,000 # 1 hour/week maintenance
total: $17,220

year_3:
infrastructure: $5,220
engineering: $12,000
total: $17,220

three_year_total: $69,660

proprietary_alternative_tco:
year_1:
algolia_fees: $28,800
engineering: $5,000 # Easier setup
total: $33,800

year_2:
algolia_fees: $34,560 # 20% growth
engineering: $2,000
total: $36,560

year_3:
algolia_fees: $41,472
engineering: $2,000
total: $43,472

three_year_total: $113,832

savings_with_open_source: $44,172
plus_benefits:
- Full data control
- No vendor lock-in
- Air-gap capability
- Compliance-native
- Unlimited scaling

Strategic Value

Beyond Cost Savings:

  1. Competitive Advantage: "100% open-source AI development platform"
  2. Customer Trust: No proprietary lock-in for regulated industries
  3. Compliance: Air-gap deployments for FDA/HIPAA clients
  4. Control: Customize ranking, filtering, indexing for Coditect's needs
  5. Scaling: No per-query fees as usage grows

Conclusion

Primary Recommendation: Meilisearch (self-hosted) + Lunr.js (public docs)

Implementation Priority:

  1. Week 1-2: Deploy Lunr.js for coditect.ai/docs
  2. Week 3-6: Deploy Meilisearch for agent knowledge retrieval
  3. Week 7-10: Expand to multi-tenant customer portal

Success Metrics:

  • Public docs search: >60% usage rate
  • Agent knowledge retrieval: <100ms p95 latency
  • Token savings: >30% reduction
  • Infrastructure cost: <$500/month
  • Customer satisfaction: "Search just works"

Open-Source Wins:

  • ✅ No licensing fees
  • ✅ Full source code access
  • ✅ Community-driven development
  • ✅ No vendor lock-in
  • ✅ Air-gap deployment capability
  • ✅ Compliance-friendly
  • ✅ Unlimited customization

This architecture gives Coditect a production-grade, fully open-source search solution that scales from public documentation to enterprise multi-tenant deployments, all while maintaining the autonomy and control critical for a platform serving regulated industries.