Open-Source Search Solutions for Coditect: Production Implementation Guide

Executive Summary

Mandate: 100% open-source, self-hosted, no proprietary dependencies

Recommended Stack:

Primary: Meilisearch (MIT License) - Production knowledge retrieval for agents
Backup: Typesense (GPL v3) - Equivalent alternative
Lightweight: Lunr.js (Apache 2.0) - Client-side documentation search
Modern Static: Pagefind (MIT) - Zero-dependency static site search

Strategic Rationale: Meilisearch + Lunr.js dual deployment

Meilisearch: Agent knowledge retrieval (server-side, high-performance)
Lunr.js: Public documentation (client-side, zero infrastructure)

Open-Source Search Landscape

Comparison Matrix

Solution	License	Deployment	Indexing	Language	Scalability	Use Case
Meilisearch	MIT	Self-hosted	Server-side	Rust	Millions of docs	Agent knowledge, APIs
Typesense	GPL v3	Self-hosted	Server-side	C++	Millions of docs	Agent knowledge, APIs
Lunr.js	Apache 2.0	Client-side	Build-time	JavaScript	~10K docs	Public documentation
Pagefind	MIT	Client-side	Build-time	Rust	100K+ docs	Static documentation
FlexSearch	Apache 2.0	Client-side	Build-time	JavaScript	~50K docs	Documentation
Stork	Apache 2.0	Client-side	Build-time	Rust	~100K docs	Documentation
Elasticsearch	Elastic License 2.0*	Self-hosted	Server-side	Java	Billions of docs	Enterprise search

*Note: Elasticsearch changed from Apache 2.0 to Elastic License 2.0 (not OSI-approved) in 2021. OpenSearch (AWS fork) remains Apache 2.0.

Recommended Architecture for Coditect

coditect_search_architecture:
  
  public_documentation:
    solution: lunr.js  # or Pagefind
    deployment: static_bundle
    location: coditect.ai/docs
    cost: zero_infrastructure
    
  agent_knowledge_retrieval:
    solution: meilisearch
    deployment: self_hosted_gke
    location: internal_cluster
    cost: compute_only
    
  customer_portal:
    solution: meilisearch
    deployment: multi_tenant_instance
    isolation: tenant_id_filtering
    cost: shared_infrastructure
    
  compliance_documentation:
    solution: meilisearch
    deployment: air_gapped_instance
    location: customer_premises
    cost: customer_managed

Solution 1: Meilisearch (PRIMARY RECOMMENDATION)

Why Meilisearch for Coditect

Technical Fit:

Rust-based: Performance, memory safety, aligns with Coditect's Rust stack
MIT License: True open source, no licensing concerns
Typo tolerance: Critical for agent queries with technical terms
Fast indexing: Millions of documents, sub-second updates
Multi-tenancy: Built-in tenant filtering via filter parameter
Simple deployment: Single binary, minimal dependencies

Coditect-Specific Advantages:

// Meilisearch aligns with Coditect's Rust ecosystem
// Agent knowledge retrieval in native language

use meilisearch_sdk::{Client, IndexConfig};
use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize, Debug)]
struct CoditectKnowledgeDocument {
    id: String,
    document_type: String,  // "adr", "pattern", "compliance"
    title: String,
    content: String,
    summary: String,
    agent_types: Vec<String>,
    compliance_frameworks: Vec<String>,
    token_estimate: u32,
    priority_score: f32,
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Initialize Meilisearch client
    let client = Client::new(
        "http://meilisearch:7700",
        Some("MASTER_KEY")
    );
    
    // Create index for Coditect knowledge base
    let index = client
        .create_index("coditect_knowledge", Some("id"))
        .await?;
    
    // Configure searchable and filterable attributes
    index.set_searchable_attributes(&[
        "title",
        "summary",
        "content",
    ]).await?;
    
    index.set_filterable_attributes(&[
        "agent_types",
        "document_type",
        "compliance_frameworks",
    ]).await?;
    
    index.set_ranking_rules(&[
        "words",
        "typo",
        "priority_score:desc",
        "proximity",
        "attribute",
        "exactness",
    ]).await?;
    
    Ok(())
}

Production Deployment Architecture

meilisearch_production_deployment:
  
  infrastructure:
    platform: google_kubernetes_engine
    deployment: statefulset
    replicas: 3  # HA configuration
    
  storage:
    type: persistent_volume
    size: 100Gi  # Scales with knowledge base
    storage_class: ssd
    
  networking:
    service_type: ClusterIP
    internal_only: true  # Not exposed to internet
    tls: mutual_tls
    
  resources:
    requests:
      cpu: 1000m
      memory: 2Gi
    limits:
      cpu: 2000m
      memory: 4Gi
      
  backup:
    schedule: "0 */6 * * *"  # Every 6 hours
    retention: 7_days
    destination: gcs_bucket
    
  monitoring:
    prometheus: true
    grafana_dashboard: true
    alerts:
      - search_latency_p95 > 200ms
      - index_size > 80GB
      - memory_usage > 3.5Gi

Kubernetes Deployment

# meilisearch-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: meilisearch
  namespace: coditect-platform
spec:
  serviceName: meilisearch
  replicas: 3
  selector:
    matchLabels:
      app: meilisearch
  template:
    metadata:
      labels:
        app: meilisearch
    spec:
      containers:
      - name: meilisearch
        image: getmeili/meilisearch:v1.5
        ports:
        - containerPort: 7700
          name: http
        env:
        - name: MEILI_MASTER_KEY
          valueFrom:
            secretKeyRef:
              name: meilisearch-secrets
              key: master-key
        - name: MEILI_ENV
          value: "production"
        - name: MEILI_NO_ANALYTICS
          value: "true"  # Privacy-focused
        - name: MEILI_LOG_LEVEL
          value: "INFO"
        resources:
          requests:
            cpu: 1000m
            memory: 2Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        volumeMounts:
        - name: data
          mountPath: /meili_data
        livenessProbe:
          httpGet:
            path: /health
            port: 7700
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 7700
          initialDelaySeconds: 5
          periodSeconds: 5
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: ssd
      resources:
        requests:
          storage: 100Gi
---
apiVersion: v1
kind: Service
metadata:
  name: meilisearch
  namespace: coditect-platform
spec:
  clusterIP: None  # Headless service
  selector:
    app: meilisearch
  ports:
  - port: 7700
    targetPort: 7700
    name: http

Agent Knowledge Retrieval Implementation

// src/knowledge/meilisearch_retrieval.rs

use meilisearch_sdk::{Client, SearchQuery, SearchResults};
use std::collections::HashMap;

pub struct CoditectKnowledgeRetriever {
    client: Client,
    index_name: String,
}

impl CoditectKnowledgeRetriever {
    pub fn new(host: &str, api_key: &str) -> Self {
        let client = Client::new(host, Some(api_key));
        Self {
            client,
            index_name: "coditect_knowledge".to_string(),
        }
    }
    
    pub async fn search_for_agent(
        &self,
        query: &str,
        agent_type: &str,
        compliance_frameworks: Vec<&str>,
        max_tokens: u32,
    ) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
        let index = self.client.index(&self.index_name);
        
        // Build filter string
        let mut filters = vec![
            format!("agent_types = {}", agent_type)
        ];
        
        if !compliance_frameworks.is_empty() {
            let compliance_filter = compliance_frameworks
                .iter()
                .map(|fw| format!("compliance_frameworks = {}", fw))
                .collect::<Vec<_>>()
                .join(" OR ");
            filters.push(format!("({})", compliance_filter));
        }
        
        let filter_str = filters.join(" AND ");
        
        // Execute search
        let search_query = SearchQuery::new(&index)
            .with_query(query)
            .with_filter(&filter_str)
            .with_limit(50)  // Get more, filter by token budget
            .build();
        
        let results: SearchResults<CoditectKnowledgeDocument> = 
            index.search().execute::<CoditectKnowledgeDocument>().await?;
        
        // Token-conscious selection
        let mut selected = Vec::new();
        let mut token_budget = max_tokens;
        
        for hit in results.hits {
            let doc = hit.result;
            if doc.token_estimate <= token_budget {
                token_budget -= doc.token_estimate;
                selected.push(doc);
            }
            
            if token_budget < 500 {
                break;
            }
        }
        
        Ok(selected)
    }
    
    pub async fn search_with_multi_tenancy(
        &self,
        query: &str,
        tenant_id: &str,
    ) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
        let index = self.client.index(&self.index_name);
        
        let search_query = SearchQuery::new(&index)
            .with_query(query)
            .with_filter(&format!("tenant_id = {}", tenant_id))
            .build();
        
        let results = index.search()
            .execute::<CoditectKnowledgeDocument>()
            .await?;
        
        Ok(results.hits.into_iter().map(|h| h.result).collect())
    }
}

// Usage in agent orchestration
#[tokio::main]
async fn main() {
    let retriever = CoditectKnowledgeRetriever::new(
        "http://meilisearch:7700",
        &std::env::var("MEILI_API_KEY").unwrap()
    );
    
    // Architect agent searching for design patterns
    let knowledge = retriever.search_for_agent(
        "event-driven architecture patterns",
        "architect",
        vec!["fda_21_cfr_11"],
        15000  // 15K token budget
    ).await.unwrap();
    
    println!("Retrieved {} knowledge documents", knowledge.len());
    for doc in knowledge {
        println!("- {} ({} tokens)", doc.title, doc.token_estimate);
    }
}

Indexing Pipeline (CI/CD Integration)

// scripts/index_knowledge_base/src/main.rs

use meilisearch_sdk::Client;
use std::path::Path;
use walkdir::WalkDir;
use anthropic_sdk::Client as AnthropicClient;

struct KnowledgeIndexer {
    meilisearch: Client,
    anthropic: AnthropicClient,
    documents: Vec<CoditectKnowledgeDocument>,
}

impl KnowledgeIndexer {
    pub async fn build_index(&mut self, repo_path: &Path) -> Result<(), Box<dyn std::error::Error>> {
        // 1. Extract ADRs
        self.extract_adrs(repo_path.join("docs/adr")).await?;
        
        // 2. Extract compliance rules
        self.extract_compliance_rules(repo_path.join("compliance")).await?;
        
        // 3. Extract code patterns
        self.extract_patterns(repo_path.join("src")).await?;
        
        // 4. Generate summaries with Claude
        self.generate_summaries().await?;
        
        // 5. Calculate token estimates
        self.calculate_token_estimates();
        
        // 6. Upload to Meilisearch
        self.upload_to_meilisearch().await?;
        
        Ok(())
    }
    
    async fn extract_adrs(&mut self, adr_path: std::path::PathBuf) -> Result<(), Box<dyn std::error::Error>> {
        for entry in WalkDir::new(adr_path) {
            let entry = entry?;
            if entry.path().extension().and_then(|s| s.to_str()) == Some("md") {
                let content = std::fs::read_to_string(entry.path())?;
                
                // Parse ADR structure
                let adr_data = self.parse_adr(&content)?;
                
                // Generate summary with Claude
                let summary = self.generate_summary_with_claude(
                    &content,
                    "adr",
                    "Coditect architecture decision"
                ).await?;
                
                self.documents.push(CoditectKnowledgeDocument {
                    id: format!("adr_{}", entry.path().file_stem().unwrap().to_str().unwrap()),
                    document_type: "adr".to_string(),
                    title: adr_data.title,
                    content,
                    summary,
                    agent_types: vec!["architect".to_string(), "orchestrator".to_string()],
                    compliance_frameworks: vec![],
                    token_estimate: 0,  // Calculated later
                    priority_score: 1.0,
                });
            }
        }
        Ok(())
    }
    
    async fn generate_summary_with_claude(
        &self,
        content: &str,
        doc_type: &str,
        context: &str
    ) -> Result<String, Box<dyn std::error::Error>> {
        let prompt = format!(
            "Generate a concise 2-3 sentence summary of this {} for AI agent knowledge retrieval.\n\
            Context: {}\n\
            Requirements:\n\
            - Focus on key decisions and rationale\n\
            - Include relevant technical terms\n\
            - Keep under 100 tokens\n\n\
            Document:\n{}",
            doc_type,
            context,
            &content[..content.len().min(5000)]
        );
        
        let response = self.anthropic.messages()
            .create()
            .model("claude-sonnet-4-20250514")
            .max_tokens(500)
            .user_message(&prompt)
            .send()
            .await?;
        
        Ok(response.content[0].text.clone())
    }
    
    fn calculate_token_estimates(&mut self) {
        for doc in &mut self.documents {
            // Rough estimate: 1 token ≈ 4 characters
            doc.token_estimate = (doc.content.len() / 4) as u32;
        }
    }
    
    async fn upload_to_meilisearch(&self) -> Result<(), Box<dyn std::error::Error>> {
        let index = self.meilisearch.index("coditect_knowledge");
        
        // Configure index
        index.set_searchable_attributes(&[
            "title", "summary", "content"
        ]).await?;
        
        index.set_filterable_attributes(&[
            "agent_types", "document_type", "compliance_frameworks"
        ]).await?;
        
        index.set_ranking_rules(&[
            "words",
            "typo",
            "priority_score:desc",
            "proximity",
            "attribute",
            "exactness",
        ]).await?;
        
        // Batch upload
        const BATCH_SIZE: usize = 100;
        for chunk in self.documents.chunks(BATCH_SIZE) {
            index.add_documents(chunk, Some("id")).await?;
        }
        
        println!("Indexed {} documents", self.documents.len());
        Ok(())
    }
}

// CI/CD integration
// .github/workflows/index-knowledge.yml

# .github/workflows/index-knowledge.yml
name: Index Knowledge Base

on:
  push:
    branches: [main]
    paths:
      - 'docs/adr/**'
      - 'compliance/**'
      - 'src/**/*.rs'

jobs:
  index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      
      - name: Setup Rust
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
      
      - name: Build indexer
        run: |
          cd scripts/index_knowledge_base
          cargo build --release
      
      - name: Run indexing
        env:
          MEILI_HOST: ${{ secrets.MEILI_HOST }}
          MEILI_ADMIN_KEY: ${{ secrets.MEILI_ADMIN_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          ./scripts/index_knowledge_base/target/release/index_knowledge_base \
            --repo-path . \
            --environment production
      
      - name: Verify index
        run: |
          curl -H "Authorization: Bearer ${{ secrets.MEILI_ADMIN_KEY }}" \
            "${{ secrets.MEILI_HOST }}/indexes/coditect_knowledge/stats"

Multi-Tenant Implementation

// Multi-tenant knowledge base with tenant isolation

pub struct MultiTenantKnowledgeRetriever {
    retriever: CoditectKnowledgeRetriever,
}

impl MultiTenantKnowledgeRetriever {
    pub async fn search_for_tenant(
        &self,
        query: &str,
        tenant_id: &str,
        agent_type: &str,
    ) -> Result<Vec<CoditectKnowledgeDocument>, Box<dyn std::error::Error>> {
        let index = self.retriever.client.index(&self.retriever.index_name);
        
        // Strict tenant isolation
        let filter = format!(
            "tenant_id = {} AND agent_types = {}",
            tenant_id,
            agent_type
        );
        
        let search_query = SearchQuery::new(&index)
            .with_query(query)
            .with_filter(&filter)
            .build();
        
        let results = index.search()
            .execute::<CoditectKnowledgeDocument>()
            .await?;
        
        Ok(results.hits.into_iter().map(|h| h.result).collect())
    }
}

Solution 2: Lunr.js (PUBLIC DOCUMENTATION)

Why Lunr.js for Public Docs

Technical Fit:

Zero infrastructure: No server required
Privacy-first: No external services, no tracking
Offline-capable: Works in air-gapped environments
Apache 2.0 license: True open source
Battle-tested: Used by Hugo, Jekyll, Gatsby

Limitations:

Not suitable for >10K documents
No real-time updates (build-time only)
Client-side index increases bundle size
Basic ranking algorithm

Docusaurus Integration

// docusaurus.config.js
module.exports = {
  plugins: [
    [
      require.resolve('@cmfcmf/docusaurus-search-local'),
      {
        indexDocs: true,
        indexBlog: true,
        indexPages: false,
        language: ['en'],
        style: undefined,
        lunr: {
          tokenizerSeparator: /[\s\-]+/,
          // Stemming for better matches
          b: 0.75,
          k1: 1.2,
        },
        // Coditect-specific customization
        indexDocSidebarParentCategories: 2,
        docsRouteBasePath: '/docs',
        blogRouteBasePath: '/blog',
      },
    ],
  ],
};

Custom Indexing for Technical Terms

// scripts/build-custom-index.js
const lunr = require('lunr');
const fs = require('fs');
const path = require('path');

// Custom stemmer for technical terms
lunr.Pipeline.registerFunction(function technicalTermPreserver(token) {
  // Don't stem technical terms
  const technicalTerms = [
    'foundationdb',
    'wasm',
    'oauth',
    'kubernetes',
    'rust',
    'typescript'
  ];
  
  if (technicalTerms.includes(token.toString().toLowerCase())) {
    return token;
  }
  
  // Default stemming for other words
  return lunr.stemmer(token);
}, 'technicalTermPreserver');

// Build index with custom pipeline
const idx = lunr(function() {
  this.ref('id');
  this.field('title', { boost: 10 });
  this.field('content');
  
  // Add custom pipeline step
  this.pipeline.remove(lunr.stemmer);
  this.pipeline.add(technicalTermPreserver);
  
  // Add documents
  documents.forEach(doc => {
    this.add(doc);
  });
});

// Save index
fs.writeFileSync(
  path.join(__dirname, '../static/search-index.json'),
  JSON.stringify(idx)
);

Solution 3: Pagefind (MODERN STATIC ALTERNATIVE)

Why Pagefind

Technical Advantages over Lunr.js:

Rust-based: 10x faster indexing, smaller bundle
Lazy loading: Only loads index chunks as needed
Better scaling: Handles 100K+ pages efficiently
MIT licensed: True open source
Zero JavaScript: Optional JS enhancement

Integration with Docusaurus

# Install Pagefind
npm install -D pagefind

# Add to package.json
{
  "scripts": {
    "build": "docusaurus build",
    "postbuild": "pagefind --source build --bundle-dir build/pagefind"
  }
}

// Custom SearchBar with Pagefind
// src/theme/SearchBar/index.tsx
import React, { useEffect, useRef, useState } from 'react';

export default function SearchBar() {
  const searchRef = useRef(null);
  const [pagefind, setPagefind] = useState(null);
  
  useEffect(() => {
    // Lazy load Pagefind
    const loadPagefind = async () => {
      const pf = await import('/pagefind/pagefind.js');
      await pf.options({
        baseUrl: '/',
      });
      setPagefind(pf);
    };
    
    loadPagefind();
  }, []);
  
  const handleSearch = async (query) => {
    if (!pagefind || !query) return;
    
    const results = await pagefind.search(query);
    // Render results
  };
  
  return (
    <div ref={searchRef}>
      <input
        type="search"
        placeholder="Search docs..."
        onChange={(e) => handleSearch(e.target.value)}
      />
    </div>
  );
}

Solution 4: Typesense (MEILISEARCH ALTERNATIVE)

Why Consider Typesense

Comparison to Meilisearch:

Feature	Meilisearch	Typesense
License	MIT	GPL v3
Language	Rust	C++
Performance	Excellent	Excellent
Typo tolerance	Yes	Yes
Cloud option	Yes (proprietary)	Yes (proprietary)
Multi-tenancy	Filter-based	Collection-based

When to choose Typesense over Meilisearch:

Already using C++ ecosystem
Need collection-level isolation (vs filter-based)
Prefer GPL v3 licensing

Deployment (Nearly Identical to Meilisearch)

# typesense-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: typesense
spec:
  serviceName: typesense
  replicas: 3
  template:
    spec:
      containers:
      - name: typesense
        image: typesense/typesense:0.25.2
        args:
        - "--data-dir=/data"
        - "--api-key=$(TYPESENSE_API_KEY)"
        - "--enable-cors"
        env:
        - name: TYPESENSE_API_KEY
          valueFrom:
            secretKeyRef:
              name: typesense-secrets
              key: api-key

Recommendation: Stick with Meilisearch unless GPL v3 is specifically required. Meilisearch's MIT license is more permissive for commercial use.

Cost Analysis: Open-Source Self-Hosted

Infrastructure Costs (GKE)

meilisearch_production_costs:
  
  compute:
    instance_type: e2-standard-4  # 4 vCPU, 16GB RAM
    instances: 3
    cost_per_instance_month: $122
    total_compute_month: $366
    
  storage:
    persistent_volumes: 3
    size_per_volume: 100Gi
    storage_class: ssd
    cost_per_gb_month: $0.17
    total_storage_month: $51
    
  networking:
    egress_gb_month: 100
    cost_per_gb: $0.12
    total_networking_month: $12
    
  backup:
    gcs_storage_gb: 300  # 3x100GB snapshots
    cost_per_gb_month: $0.02
    total_backup_month: $6
    
  total_monthly_cost: $435
  
  cost_per_search_query: $0.0000145  # Assuming 30M queries/month
  
  vs_algolia_equivalent:
    algolia_cost_30m_queries: $2400/month
    savings: $1965/month
    roi: 351%

Comparison to Proprietary Alternatives

Solution	Monthly Cost (30M queries)	Licensing	Data Control
Meilisearch (self-hosted)	$435	Free (MIT)	Full
Typesense (self-hosted)	$435	Free (GPL v3)	Full
Algolia	$2,400	Pay-per-use	None
Orama Cloud	$1,200	Pay-per-use	Limited
Elasticsearch Cloud	$1,800	Pay-per-use	Limited

ROI for Coditect:

Infrastructure cost: $435/month
Engineering time: 2 weeks initial setup, 4 hours/month maintenance
Total annual cost: ~$5,220 infrastructure + ~$15K engineering = ~$20K
Algolia equivalent: ~$29K annually
Savings: ~$9K/year
Plus: Full data control, air-gap capability, no vendor lock-in

Implementation Roadmap for Coditect

Phase 1: Public Documentation (Week 1-2)

Goal: Launch coditect.ai/docs with search

# Day 1-2: Setup Docusaurus
npx create-docusaurus@latest coditect-docs classic
cd coditect-docs

# Day 3-4: Add Lunr.js search
npm install @cmfcmf/docusaurus-search-local

# Configure in docusaurus.config.js
# (see config above)

# Day 5-7: Write initial documentation
# - Getting Started
# - Architecture Overview  
# - Agent Types
# - API Reference

# Day 8-10: Deploy to Vercel/Cloudflare Pages
npm run build
# Deploy build/ directory

# Search works immediately, zero infrastructure!

Success Criteria:

✅ Documentation site live
✅ Search functional
✅ <2s page load time
✅ Zero infrastructure cost

Phase 2: Agent Knowledge Retrieval (Week 3-6)

Goal: Meilisearch for autonomous agent knowledge

// Week 3: Deploy Meilisearch to GKE
// kubectl apply -f meilisearch-statefulset.yaml

// Week 4: Build indexing pipeline
// - Extract ADRs, patterns, compliance rules
// - Generate Claude summaries
// - Calculate token estimates
// - Upload to Meilisearch

// Week 5: Integrate with agents
// - Implement KnowledgeRetriever
// - Wire into Architect agent
// - Add token-conscious filtering
// - Measure performance

// Week 6: Expand to all agent types
// - Orchestrator knowledge retrieval
// - Implementer pattern lookup
// - Reviewer compliance checks

Success Criteria:

✅ Meilisearch deployed and stable
✅ Knowledge base indexed (1000+ documents)
✅ Agent retrieval latency <100ms p95
✅ Token reduction >30%

Phase 3: Multi-Tenant Customer Portal (Week 7-10)

Goal: Tenant-isolated knowledge base

// Week 7-8: Multi-tenant architecture
// - Implement tenant_id filtering
// - Build tenant data isolation
// - Add customer-specific documentation

// Week 9: Customer portal UI
// - Docusaurus with custom SearchBar
// - Meilisearch backend
// - Tenant authentication

// Week 10: Production hardening
// - Load testing
// - Security audit
// - Monitoring dashboards

Success Criteria:

✅ 10+ customers with isolated knowledge bases
✅ Zero cross-tenant data leakage
✅ <200ms search latency
✅ 99.9% uptime

Monitoring & Operations

Prometheus Metrics

# meilisearch-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: meilisearch
spec:
  selector:
    matchLabels:
      app: meilisearch
  endpoints:
  - port: http
    path: /metrics

// Custom metrics for Coditect
use prometheus::{Counter, Histogram, Registry};

lazy_static! {
    static ref SEARCH_QUERIES: Counter = Counter::new(
        "coditect_search_queries_total",
        "Total search queries"
    ).unwrap();
    
    static ref SEARCH_LATENCY: Histogram = Histogram::new(
        "coditect_search_latency_seconds",
        "Search query latency"
    ).unwrap();
    
    static ref TOKEN_BUDGET_USAGE: Histogram = Histogram::new(
        "coditect_token_budget_usage",
        "Token budget usage per search"
    ).unwrap();
}

Grafana Dashboard

{
  "dashboard": {
    "title": "Coditect Knowledge Retrieval",
    "panels": [
      {
        "title": "Search QPS",
        "targets": [{
          "expr": "rate(coditect_search_queries_total[5m])"
        }]
      },
      {
        "title": "P95 Latency",
        "targets": [{
          "expr": "histogram_quantile(0.95, coditect_search_latency_seconds)"
        }]
      },
      {
        "title": "Token Budget Efficiency",
        "targets": [{
          "expr": "avg(coditect_token_budget_usage)"
        }]
      }
    ]
  }
}

Air-Gapped Deployment (Healthcare Clients)

Offline Meilisearch Setup

# On internet-connected machine:
# 1. Download Meilisearch binary
wget https://github.com/meilisearch/meilisearch/releases/download/v1.5.0/meilisearch-linux-amd64

# 2. Build knowledge index
cargo run --release --bin index_knowledge_base -- \
  --repo-path /path/to/coditect \
  --output knowledge_index.meilisearch

# 3. Package for air-gapped deployment
tar czf coditect-knowledge-airgap.tar.gz \
  meilisearch-linux-amd64 \
  knowledge_index.meilisearch \
  deploy.sh

# On air-gapped machine:
tar xzf coditect-knowledge-airgap.tar.gz
./deploy.sh --data-dir /opt/meilisearch/data

Air-Gapped Update Process

# scripts/airgap-update.sh

#!/bin/bash
set -e

# 1. Export updated index from staging
./meilisearch-exporter \
  --host http://staging-meilisearch:7700 \
  --index coditect_knowledge \
  --output knowledge_index_v2.meilisearch

# 2. Package with version
VERSION=$(date +%Y%m%d-%H%M%S)
tar czf coditect-knowledge-update-${VERSION}.tar.gz \
  knowledge_index_v2.meilisearch \
  update.sh

# 3. Transfer via approved method (USB, secure file transfer)
# Customer applies update on air-gapped network

# 4. On air-gapped machine:
./update.sh --backup-existing

Compliance Considerations

Data Residency

meilisearch_compliance:
  
  fda_21_cfr_part_11:
    audit_trail: enabled
    implementation: meilisearch_logs + external_audit_system
    retention: 7_years
    
  hipaa:
    encryption_at_rest: enabled  # Persistent volume encryption
    encryption_in_transit: mutual_tls
    access_control: rbac
    phi_handling: no_phi_in_search_index
    
  soc2:
    monitoring: prometheus_metrics
    alerting: grafana_alerts
    backup: automated_every_6_hours
    disaster_recovery: multi_az_deployment

Search Query Logging

// Compliance-aware search logging
pub async fn search_with_audit_log(
    query: &str,
    user_id: &str,
    tenant_id: &str,
) -> Result<Vec<Document>, Error> {
    // Log search query for compliance
    audit_log::record(AuditEvent {
        event_type: "search_query",
        user_id: user_id.to_string(),
        tenant_id: tenant_id.to_string(),
        query: query.to_string(),
        timestamp: Utc::now(),
        ip_address: request_ip,
    });
    
    // Execute search
    let results = search_internal(query, tenant_id).await?;
    
    // Log results count (not content)
    audit_log::record(AuditEvent {
        event_type: "search_results",
        user_id: user_id.to_string(),
        results_count: results.len(),
        timestamp: Utc::now(),
    });
    
    Ok(results)
}

Migration Path from Proprietary to Open-Source

If Currently Using Algolia

# scripts/migrate_from_algolia.py

import os
from algoliasearch.search_client import SearchClient as AlgoliaClient
from meilisearch import Client as MeilisearchClient

def migrate_algolia_to_meilisearch():
    # Connect to both
    algolia = AlgoliaClient.create(
        os.environ['ALGOLIA_APP_ID'],
        os.environ['ALGOLIA_ADMIN_KEY']
    )
    
    meilisearch = MeilisearchClient(
        os.environ['MEILI_HOST'],
        os.environ['MEILI_ADMIN_KEY']
    )
    
    # Export from Algolia
    algolia_index = algolia.init_index('coditect_docs')
    
    documents = []
    for hit in algolia_index.browse_objects():
        documents.append(hit)
    
    # Transform for Meilisearch
    meili_docs = transform_documents(documents)
    
    # Import to Meilisearch
    meili_index = meilisearch.index('coditect_knowledge')
    meili_index.add_documents(meili_docs)
    
    print(f"Migrated {len(documents)} documents")

def transform_documents(algolia_docs):
    """Transform Algolia schema to Meilisearch"""
    return [
        {
            'id': doc['objectID'],
            'title': doc.get('title'),
            'content': doc.get('content'),
            # Map other fields
        }
        for doc in algolia_docs
    ]

Final Recommendation for Coditect

Dual Architecture

recommended_implementation:
  
  public_documentation:
    solution: lunr.js
    reasoning:
      - Zero infrastructure cost
      - Perfect for <10K docs
      - Privacy-first (no external calls)
      - Works offline
    deployment: docusaurus + lunr plugin
    timeline: 1-2 weeks
    
  agent_knowledge_retrieval:
    solution: meilisearch_self_hosted
    reasoning:
      - Rust-based (matches Coditect stack)
      - MIT license (true open source)
      - Excellent performance
      - Multi-tenant capable
      - Air-gap compatible
    deployment: kubernetes_statefulset
    timeline: 4-6 weeks
    
  customer_portal:
    solution: meilisearch_multi_tenant
    reasoning:
      - Same infrastructure as agent knowledge
      - Tenant isolation built-in
      - Customer-specific documentation
    deployment: shared_meilisearch_cluster
    timeline: 2-3 weeks

Total Cost of Ownership (3 Years)

open_source_tco:
  year_1:
    infrastructure: $5,220
    engineering: $30,000  # Initial setup + 3 months learning
    total: $35,220
    
  year_2:
    infrastructure: $5,220
    engineering: $12,000  # 1 hour/week maintenance
    total: $17,220
    
  year_3:
    infrastructure: $5,220
    engineering: $12,000
    total: $17,220
    
  three_year_total: $69,660

proprietary_alternative_tco:
  year_1:
    algolia_fees: $28,800
    engineering: $5,000  # Easier setup
    total: $33,800
    
  year_2:
    algolia_fees: $34,560  # 20% growth
    engineering: $2,000
    total: $36,560
    
  year_3:
    algolia_fees: $41,472
    engineering: $2,000
    total: $43,472
    
  three_year_total: $113,832
  
savings_with_open_source: $44,172
plus_benefits:
  - Full data control
  - No vendor lock-in
  - Air-gap capability
  - Compliance-native
  - Unlimited scaling

Strategic Value

Beyond Cost Savings:

Competitive Advantage: "100% open-source AI development platform"
Customer Trust: No proprietary lock-in for regulated industries
Compliance: Air-gap deployments for FDA/HIPAA clients
Control: Customize ranking, filtering, indexing for Coditect's needs
Scaling: No per-query fees as usage grows

Conclusion

Primary Recommendation: Meilisearch (self-hosted) + Lunr.js (public docs)

Implementation Priority:

Week 1-2: Deploy Lunr.js for coditect.ai/docs
Week 3-6: Deploy Meilisearch for agent knowledge retrieval
Week 7-10: Expand to multi-tenant customer portal

Success Metrics:

Public docs search: >60% usage rate
Agent knowledge retrieval: <100ms p95 latency
Token savings: >30% reduction
Infrastructure cost: <$500/month
Customer satisfaction: "Search just works"

Open-Source Wins:

✅ No licensing fees
✅ Full source code access
✅ Community-driven development
✅ No vendor lock-in
✅ Air-gap deployment capability
✅ Compliance-friendly
✅ Unlimited customization

This architecture gives Coditect a production-grade, fully open-source search solution that scales from public documentation to enterprise multi-tenant deployments, all while maintaining the autonomy and control critical for a platform serving regulated industries.

Executive Summary​

Open-Source Search Landscape​

Comparison Matrix​

Recommended Architecture for Coditect​

Solution 1: Meilisearch (PRIMARY RECOMMENDATION)​

Why Meilisearch for Coditect​

Production Deployment Architecture​

Kubernetes Deployment​

Agent Knowledge Retrieval Implementation​

Indexing Pipeline (CI/CD Integration)​

Multi-Tenant Implementation​

Solution 2: Lunr.js (PUBLIC DOCUMENTATION)​

Why Lunr.js for Public Docs​

Docusaurus Integration​

Custom Indexing for Technical Terms​

Solution 3: Pagefind (MODERN STATIC ALTERNATIVE)​

Why Pagefind​

Integration with Docusaurus​

Solution 4: Typesense (MEILISEARCH ALTERNATIVE)​

Why Consider Typesense​

Deployment (Nearly Identical to Meilisearch)​

Cost Analysis: Open-Source Self-Hosted​

Infrastructure Costs (GKE)​

Comparison to Proprietary Alternatives​

Implementation Roadmap for Coditect​

Phase 1: Public Documentation (Week 1-2)​

Phase 2: Agent Knowledge Retrieval (Week 3-6)​

Phase 3: Multi-Tenant Customer Portal (Week 7-10)​

Monitoring & Operations​

Prometheus Metrics​

Grafana Dashboard​

Air-Gapped Deployment (Healthcare Clients)​

Offline Meilisearch Setup​

Air-Gapped Update Process​

Compliance Considerations​

Data Residency​

Search Query Logging​

Migration Path from Proprietary to Open-Source​

If Currently Using Algolia​

Final Recommendation for Coditect​

Dual Architecture​

Total Cost of Ownership (3 Years)​

Strategic Value​

Conclusion​

Executive Summary

Open-Source Search Landscape

Comparison Matrix

Recommended Architecture for Coditect

Solution 1: Meilisearch (PRIMARY RECOMMENDATION)

Why Meilisearch for Coditect

Production Deployment Architecture

Kubernetes Deployment

Agent Knowledge Retrieval Implementation

Indexing Pipeline (CI/CD Integration)

Multi-Tenant Implementation

Solution 2: Lunr.js (PUBLIC DOCUMENTATION)

Why Lunr.js for Public Docs

Docusaurus Integration

Custom Indexing for Technical Terms

Solution 3: Pagefind (MODERN STATIC ALTERNATIVE)

Why Pagefind

Integration with Docusaurus

Solution 4: Typesense (MEILISEARCH ALTERNATIVE)

Why Consider Typesense

Deployment (Nearly Identical to Meilisearch)

Cost Analysis: Open-Source Self-Hosted

Infrastructure Costs (GKE)

Comparison to Proprietary Alternatives

Implementation Roadmap for Coditect

Phase 1: Public Documentation (Week 1-2)

Phase 2: Agent Knowledge Retrieval (Week 3-6)

Phase 3: Multi-Tenant Customer Portal (Week 7-10)

Monitoring & Operations

Prometheus Metrics

Grafana Dashboard

Air-Gapped Deployment (Healthcare Clients)

Offline Meilisearch Setup

Air-Gapped Update Process

Compliance Considerations

Data Residency

Search Query Logging

Migration Path from Proprietary to Open-Source

If Currently Using Algolia

Final Recommendation for Coditect

Dual Architecture

Total Cost of Ownership (3 Years)

Strategic Value

Conclusion