Multi Tenant Context Architecture with FoundationDB

Multi-Tenant Context Architecture with FoundationDB

Document Version: 1.0 Date: 2025-12-16 Status: DRAFT (Architecture Planning) Author: Architecture Team Purpose: Scalable context memory system for 1M+ users across organizations

Executive Summary

This document outlines the architecture for evolving CODITECT's context memory system from a single-user SQLite database to a multi-tenant, multi-user system supporting 1M+ users across thousands of organizations using FoundationDB as the persistence layer.

Current State:

SQLite-based context database (~600MB)
73,000+ unique messages with deduplication
Single-user, single-session design
GCP Cloud Storage backup with snapshot + compression

Target State:

FoundationDB-based distributed context store
Multi-tenant isolation with per-organization key prefixes
Real-time sync across multiple sessions and users
Event-sourced architecture for conflict resolution
Scale to 1M users with sub-10ms read latency

Current Architecture Analysis
Scaling Challenges
FoundationDB Architecture
Multi-Tenant Key Design
Data Model
Event Sourcing Pattern
Session Synchronization
Migration Strategy
Performance Projections
Implementation Roadmap

Current Architecture Analysis

SQLite Context Store

Location: context-storage/context.db Size: ~600MB (growing) Schema:

-- Messages table (primary store)
CREATE TABLE messages (
    id INTEGER PRIMARY KEY,
    hash TEXT UNIQUE NOT NULL,
    content TEXT NOT NULL,
    role TEXT NOT NULL,  -- 'user' | 'assistant'
    source_type TEXT,    -- 'jsonl' | 'export'
    source_file TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- FTS5 full-text search index
CREATE VIRTUAL TABLE messages_fts USING fts5(
    content,
    content='messages',
    content_rowid='id'
);

-- Knowledge extraction (decisions, patterns, errors)
CREATE TABLE knowledge (
    id INTEGER PRIMARY KEY,
    message_id INTEGER REFERENCES messages(id),
    knowledge_type TEXT,  -- 'decision' | 'pattern' | 'error_solution'
    extracted_data JSON,
    created_at TIMESTAMP
);

Current Workflow

┌────────────────────────────────────────────────────────────┐
│ User Session (Claude Code)                                  │
├────────────────────────────────────────────────────────────┤
│   /cx → Extract messages → Deduplicate → SQLite            │
│   /cxq → Query FTS5 index → Return results                 │
│   /recall → Knowledge retrieval → Context injection         │
└────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌────────────────────────────────────────────────────────────┐
│ GCP Backup (scripts/backup-context-db.sh)                  │
├────────────────────────────────────────────────────────────┤
│   SQLite .backup snapshot → gzip compression               │
│   Upload to gs://coditect-cloud-infra-context-backups      │
│   67% compression (720MB → 235MB)                          │
└────────────────────────────────────────────────────────────┘

Limitations

Single-User Design: No tenant isolation
Local Storage: Not shareable across machines
Write Conflicts: Concurrent sessions corrupt database
Limited Scale: SQLite max ~1TB, 100 concurrent readers
No Real-Time Sync: Changes not propagated to other sessions

Scaling Challenges

Multi-User Scenarios

Scenario	Challenge	Impact
Team Context	Shared knowledge base across 10-50 users	Need tenant isolation + shared spaces
Parallel Sessions	5+ Claude Code sessions simultaneously	Write conflicts, data loss
Organization Memory	1000s of users in enterprise	Need hierarchical key design
Cross-Project Context	Context relevant across projects	Need flexible key spaces

Scale Targets (Phase 7)

Metric	Current	Target	Factor
Users	1	1,000,000	1M×
Organizations	1	50,000	50K×
Messages	73K	10B	137K×
Concurrent Sessions	1	100,000	100K×
Read Latency (p99)	~50ms	<10ms	5×
Write Throughput	~100/s	1M/s	10K×

FoundationDB Architecture

Why FoundationDB

CODITECT already uses FoundationDB in production for the cloud IDE (coditect.ai):

Current FDB Deployment (from IDE analysis):

GKE StatefulSet: 5 pods (3 coordinators + 2 proxies)
Version: FoundationDB 7.1+
Key space: /az1ai-ide/sessions/, /az1ai-ide/files/, /az1ai-ide/settings/

FoundationDB Advantages:

ACID Transactions: Serializable isolation across distributed operations
Multi-Tenant Native: Record Layer designed for massive multi-tenancy
Sub-10ms Latency: Production-proven performance at Apple scale
Horizontal Scale: Linear scaling to millions of operations/second
Strong Consistency: No eventual consistency complexity
Open Source: Apache 2.0 license, active community

Cluster Topology (Target)

Phase 6 (10K users):

FoundationDB Cluster
├── Coordinators: 5 (odd number for consensus)
├── Storage: 9 processes (3× replication factor)
├── Transaction Log: 3 processes
└── Proxies: 5 processes

Phase 7 (50K+ users):

FoundationDB Cluster (from diagrams/phase-7-enterprise-scale/README.md)
├── Storage Nodes: 15
├── Transaction Nodes: 7
├── Total Processes: 22+
└── Replication: Triple redundancy

Multi-Tenant Key Design

Hierarchical Key Structure

Using FoundationDB Directory Layer and Record Layer patterns:

/coditect/
├── context/                           # Context memory subsystem
│   ├── tenants/
│   │   └── {tenant_id}/               # Organization-level isolation
│   │       ├── metadata               # Tenant config, quotas
│   │       ├── users/
│   │       │   └── {user_id}/
│   │       │       ├── sessions/
│   │       │       │   └── {session_id}/
│   │       │       │       ├── messages/
│   │       │       │       │   └── {timestamp}_{hash}  # Message data
│   │       │       │       └── state/    # Session state
│   │       │       └── preferences/      # User settings
│   │       ├── shared/                   # Team-shared context
│   │       │   ├── decisions/            # Architectural decisions
│   │       │   ├── patterns/             # Code patterns
│   │       │   └── errors/               # Error solutions
│   │       └── projects/
│   │           └── {project_id}/
│   │               ├── messages/         # Project-specific context
│   │               └── knowledge/        # Project knowledge base
│   │
│   └── global/                           # Cross-tenant (future: marketplace)
│       └── public_patterns/              # Community-shared patterns
│
└── events/                               # Event sourcing log
    └── {tenant_id}/
        └── {timestamp}_{event_id}        # Immutable event log

Key Design Principles

Tenant Prefix First: All keys start with tenant_id for isolation
Timestamp-Ordered: Natural time-series ordering for context
Hash-Based Dedup: Content hash prevents duplicates
Hierarchical Access: Fine-grained permissions at each level

Rust Key Types

use foundationdb::tuple::{Subspace, pack};

pub struct ContextKeys {
    root: Subspace,
}

impl ContextKeys {
    pub fn new() -> Self {
        Self {
            root: Subspace::all().subspace(&("coditect", "context")),
        }
    }

    pub fn message_key(
        &self,
        tenant_id: &str,
        user_id: &str,
        session_id: &str,
        timestamp: i64,
        hash: &str,
    ) -> Vec<u8> {
        self.root
            .subspace(&("tenants", tenant_id, "users", user_id, "sessions", session_id, "messages"))
            .pack(&(timestamp, hash))
    }

    pub fn shared_decision_key(
        &self,
        tenant_id: &str,
        decision_id: &str,
    ) -> Vec<u8> {
        self.root
            .subspace(&("tenants", tenant_id, "shared", "decisions"))
            .pack(&decision_id)
    }

    pub fn tenant_range(&self, tenant_id: &str) -> (Vec<u8>, Vec<u8>) {
        let subspace = self.root.subspace(&("tenants", tenant_id));
        subspace.range()
    }
}

Data Model

Message Schema (Protobuf/Serde)

use serde::{Deserialize, Serialize};
use chrono::{DateTime, Utc};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContextMessage {
    /// Content hash (SHA-256)
    pub hash: String,

    /// Message content
    pub content: String,

    /// Role: user | assistant | system
    pub role: String,

    /// Provenance tracking
    pub provenance: Provenance,

    /// Timestamps
    pub occurred_at: DateTime<Utc>,
    pub indexed_at: DateTime<Utc>,

    /// Extracted metadata
    pub metadata: MessageMetadata,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Provenance {
    pub tenant_id: String,
    pub user_id: String,
    pub session_id: String,
    pub source_type: String,  // "claude-code" | "export" | "api"
    pub source_file: Option<String>,
    pub source_line: Option<u32>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MessageMetadata {
    pub content_length: usize,
    pub has_code: bool,
    pub has_markdown: bool,
    pub language_hints: Vec<String>,
    pub topics: Vec<String>,
}

Knowledge Extraction Schema

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum Knowledge {
    Decision {
        id: String,
        title: String,
        context: String,
        decision: String,
        rationale: String,
        source_messages: Vec<String>,  // Message hashes
        created_at: DateTime<Utc>,
    },

    Pattern {
        id: String,
        name: String,
        description: String,
        code_example: Option<String>,
        use_cases: Vec<String>,
        source_messages: Vec<String>,
        created_at: DateTime<Utc>,
    },

    ErrorSolution {
        id: String,
        error_signature: String,
        solution: String,
        steps: Vec<String>,
        source_messages: Vec<String>,
        created_at: DateTime<Utc>,
    },
}

Event Sourcing Pattern

Why Event Sourcing

For multi-user context sync, event sourcing provides:

Conflict Resolution: Events are immutable, conflicts resolved by ordering
Audit Trail: Complete history of all changes
Time Travel: Reconstruct state at any point
Decoupling: Writers and readers operate independently
Scalability: Append-only is highly parallelizable

Event Types

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "event_type")]
pub enum ContextEvent {
    /// New message indexed
    MessageIndexed {
        event_id: String,
        tenant_id: String,
        user_id: String,
        session_id: String,
        message_hash: String,
        message: ContextMessage,
        timestamp: DateTime<Utc>,
    },

    /// Knowledge extracted from messages
    KnowledgeExtracted {
        event_id: String,
        tenant_id: String,
        knowledge: Knowledge,
        source_message_hashes: Vec<String>,
        timestamp: DateTime<Utc>,
    },

    /// Knowledge shared with team
    KnowledgeShared {
        event_id: String,
        tenant_id: String,
        user_id: String,
        knowledge_id: String,
        shared_scope: SharedScope,  // Team | Project | Public
        timestamp: DateTime<Utc>,
    },

    /// Session started
    SessionStarted {
        event_id: String,
        tenant_id: String,
        user_id: String,
        session_id: String,
        project_path: String,
        timestamp: DateTime<Utc>,
    },

    /// Session ended
    SessionEnded {
        event_id: String,
        tenant_id: String,
        user_id: String,
        session_id: String,
        message_count: u32,
        timestamp: DateTime<Utc>,
    },
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SharedScope {
    User,       // Private to user
    Team,       // Shared within organization
    Project,    // Shared within project
    Public,     // Public marketplace (future)
}

Event Storage in FDB

use foundationdb::{Database, Transaction, RangeOption};

pub async fn append_event(
    db: &Database,
    event: &ContextEvent,
) -> Result<(), FdbError> {
    let tenant_id = event.tenant_id();
    let timestamp = event.timestamp().timestamp_micros();
    let event_id = event.event_id();

    db.run(|trx| async {
        let key = format!(
            "/coditect/events/{}/{}_{}",
            tenant_id, timestamp, event_id
        );
        let value = serde_json::to_vec(&event)?;
        trx.set(key.as_bytes(), &value);
        Ok(())
    }).await
}

pub async fn replay_events(
    db: &Database,
    tenant_id: &str,
    since: DateTime<Utc>,
) -> Result<Vec<ContextEvent>, FdbError> {
    let start_key = format!(
        "/coditect/events/{}/{}",
        tenant_id, since.timestamp_micros()
    );
    let end_key = format!(
        "/coditect/events/{}/\xff",
        tenant_id
    );

    db.run(|trx| async {
        let range = trx.get_range(
            &RangeOption::from((start_key.as_bytes(), end_key.as_bytes())),
            0,
            false,
        ).await?;

        range.iter()
            .map(|kv| serde_json::from_slice(kv.value()))
            .collect()
    }).await
}

Session Synchronization

Real-Time Sync Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Session A (User 1)          Session B (User 1)                   │
│    ↓ index message              ↓ index message                  │
│    ↓                            ↓                                │
├─────────────────────────────────────────────────────────────────┤
│                     Event Bus (FDB Watches)                      │
│  ┌─────────────────────────────────────────────────────────┐    │
│  │ /coditect/events/{tenant}/...                            │    │
│  │   ├── 1734345600000000_evt1  (Session A indexed msg)     │    │
│  │   ├── 1734345600001000_evt2  (Session B indexed msg)     │    │
│  │   └── ...                                                │    │
│  └─────────────────────────────────────────────────────────┘    │
│                            ↓                                     │
│                    Event Processor                               │
│                            ↓                                     │
├─────────────────────────────────────────────────────────────────┤
│                Materialized Views (per session)                  │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │ Session A   │  │ Session B   │  │ Session C   │              │
│  │ local view  │  │ local view  │  │ local view  │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
└─────────────────────────────────────────────────────────────────┘

FDB Watch API for Real-Time

use foundationdb::future::FdbFuture;

/// Watch for new events in tenant's event log
pub async fn watch_events(
    db: &Database,
    tenant_id: &str,
    last_seen: i64,
    callback: impl Fn(ContextEvent),
) -> Result<(), FdbError> {
    let watch_key = format!(
        "/coditect/events/{}/latest",
        tenant_id
    );

    loop {
        let (watch, current_value) = db.run(|trx| async {
            let value = trx.get(watch_key.as_bytes()).await?;
            let watch = trx.watch(watch_key.as_bytes());
            Ok((watch, value))
        }).await?;

        // Process any new events since last_seen
        let events = replay_events(db, tenant_id, last_seen).await?;
        for event in events {
            callback(event);
        }

        // Wait for next change
        watch.await?;
    }
}

Migration Strategy

Phase 1: Dual-Write (Weeks 1-4)

┌─────────────────────────────────────────────────────────────────┐
│ /cx Command (modified)                                          │
├─────────────────────────────────────────────────────────────────┤
│ 1. Extract messages (existing)                                  │
│ 2. Write to SQLite (existing)                                   │
│ 3. Write to FDB (new) ← dual-write                             │
│ 4. Verify consistency                                           │
└─────────────────────────────────────────────────────────────────┘

Implementation:

Add FDB client to context extraction scripts
Write every message to both SQLite and FDB
Log any discrepancies for debugging
SQLite remains primary (read path unchanged)

Phase 2: Shadow Read (Weeks 5-8)

┌─────────────────────────────────────────────────────────────────┐
│ /cxq Command (modified)                                         │
├─────────────────────────────────────────────────────────────────┤
│ 1. Query SQLite (primary)                                       │
│ 2. Query FDB (shadow) ← parallel                               │
│ 3. Compare results                                              │
│ 4. Return SQLite results                                        │
│ 5. Log differences                                              │
└─────────────────────────────────────────────────────────────────┘

Implementation:

Add FDB read path in parallel with SQLite
Compare results for consistency validation
Measure FDB latency vs SQLite
Build confidence in FDB correctness

Phase 3: FDB Primary (Weeks 9-12)

┌─────────────────────────────────────────────────────────────────┐
│ /cx and /cxq Commands                                           │
├─────────────────────────────────────────────────────────────────┤
│ FDB is now primary                                              │
│ SQLite kept as backup (reads on FDB failure)                    │
│ GCP backup updated to export FDB snapshot                       │
└─────────────────────────────────────────────────────────────────┘

Phase 4: Multi-Tenant (Weeks 13-16)

Enable tenant isolation in FDB
Add user/session tracking
Implement event sourcing
Add real-time sync via watches

Performance Projections

Latency Targets

Operation	SQLite (Current)	FDB (Target)	Notes
Single message read	~1ms	<5ms	FDB adds network hop
FTS query (10 results)	~50ms	<20ms	FDB secondary indexes
Message write	~2ms	<15ms	FDB with fsync
Batch write (100 msgs)	~100ms	<50ms	FDB batch transactions
Knowledge extraction	~200ms	<100ms	Parallelized

Throughput Projections

Scale	Messages/Day	FDB Cluster Size	Est. Cost/Month
100 users	100K	5 nodes	~$200
1,000 users	1M	9 nodes	~$500
10,000 users	10M	15 nodes	~$1,500
100,000 users	100M	25 nodes	~$4,000
1,000,000 users	1B	50 nodes	~$10,000

Storage Projections

Scale	Messages	Raw Size	With Compression	Est. Cost/Month
100 users	1M	5GB	1.5GB	~$0.50
1,000 users	10M	50GB	15GB	~$5
10,000 users	100M	500GB	150GB	~$50
100,000 users	1B	5TB	1.5TB	~$500
1,000,000 users	10B	50TB	15TB	~$5,000

Implementation Roadmap

Sprint 1: FDB Client Integration (Week 1-2)

Add foundationdb Rust crate to backend
Create ContextKeys utility for key generation
Implement basic read/write operations
Add unit tests for FDB operations

Sprint 2: Dual-Write Path (Week 3-4)

Modify unified-message-extractor.py for dual-write
Add FDB write to context extraction
Implement consistency verification
Create monitoring dashboard

Sprint 3: Shadow Read Path (Week 5-6)

Add FDB query path to /cxq
Implement parallel query execution
Add result comparison logging
Performance benchmarking

Sprint 4: Event Sourcing (Week 7-8)

Implement ContextEvent types
Create event log storage in FDB
Add event processor service
Implement replay_events function

Sprint 5: Multi-Tenant (Week 9-10)

Add tenant isolation to key design
Implement tenant creation/management
Add quota enforcement per tenant
Test cross-tenant isolation

Sprint 6: Real-Time Sync (Week 11-12)

Implement FDB watch API integration
Create session sync service
Add WebSocket bridge for Claude Code
End-to-end multi-session testing

Sprint 7: Cutover (Week 13-14)

FDB becomes primary
Update backup scripts for FDB
Migrate existing SQLite data
Update documentation

Sprint 8: Production Hardening (Week 15-16)

Performance optimization
Chaos testing (node failures)
Security audit
Documentation finalization

CODITECT Ecosystem

FoundationDB Resources

Multi-Tenant Best Practices

Appendix: Current Backup Script

The existing backup script (scripts/backup-context-db.sh) will be updated to support FDB:

Current Features:

SQLite .backup snapshot (parallel-session safe)
Gzip compression (67% reduction)
GCP Cloud Storage upload
90-day retention policy

FDB Additions (Phase 3+):

FDB snapshot export using fdbbackup tool
Incremental backups for large datasets
Point-in-time recovery support

Last Updated: 2025-12-16 Maintained By: CODITECT Architecture Team Next Review: After Sprint 2 completion

Executive Summary​

Table of Contents​

Current Architecture Analysis​

SQLite Context Store​

Current Workflow​

Limitations​

Scaling Challenges​

Multi-User Scenarios​

Scale Targets (Phase 7)​

FoundationDB Architecture​

Why FoundationDB​

Cluster Topology (Target)​

Multi-Tenant Key Design​

Hierarchical Key Structure​

Key Design Principles​

Rust Key Types​

Data Model​

Message Schema (Protobuf/Serde)​

Knowledge Extraction Schema​

Event Sourcing Pattern​

Why Event Sourcing​

Event Types​

Event Storage in FDB​

Session Synchronization​

Real-Time Sync Architecture​

FDB Watch API for Real-Time​

Migration Strategy​

Phase 1: Dual-Write (Weeks 1-4)​

Phase 2: Shadow Read (Weeks 5-8)​

Phase 3: FDB Primary (Weeks 9-12)​

Phase 4: Multi-Tenant (Weeks 13-16)​

Performance Projections​

Latency Targets​

Throughput Projections​

Storage Projections​

Implementation Roadmap​

Sprint 1: FDB Client Integration (Week 1-2)​

Sprint 2: Dual-Write Path (Week 3-4)​

Sprint 3: Shadow Read Path (Week 5-6)​

Sprint 4: Event Sourcing (Week 7-8)​

Sprint 5: Multi-Tenant (Week 9-10)​

Sprint 6: Real-Time Sync (Week 11-12)​

Sprint 7: Cutover (Week 13-14)​

Sprint 8: Production Hardening (Week 15-16)​

Related Documentation​

CODITECT Ecosystem​

FoundationDB Resources​

Multi-Tenant Best Practices​

Appendix: Current Backup Script​