Skip to main content

Multi-Tenant Context Architecture with FoundationDB

Document Version: 1.0 Date: 2025-12-16 Status: DRAFT (Architecture Planning) Author: Architecture Team Purpose: Scalable context memory system for 1M+ users across organizations


Executive Summary

This document outlines the architecture for evolving CODITECT's context memory system from a single-user SQLite database to a multi-tenant, multi-user system supporting 1M+ users across thousands of organizations using FoundationDB as the persistence layer.

Current State:

  • SQLite-based context database (~600MB)
  • 73,000+ unique messages with deduplication
  • Single-user, single-session design
  • GCP Cloud Storage backup with snapshot + compression

Target State:

  • FoundationDB-based distributed context store
  • Multi-tenant isolation with per-organization key prefixes
  • Real-time sync across multiple sessions and users
  • Event-sourced architecture for conflict resolution
  • Scale to 1M users with sub-10ms read latency

Table of Contents

  1. Current Architecture Analysis
  2. Scaling Challenges
  3. FoundationDB Architecture
  4. Multi-Tenant Key Design
  5. Data Model
  6. Event Sourcing Pattern
  7. Session Synchronization
  8. Migration Strategy
  9. Performance Projections
  10. Implementation Roadmap

Current Architecture Analysis

SQLite Context Store

Location: context-storage/context.db Size: ~600MB (growing) Schema:

-- Messages table (primary store)
CREATE TABLE messages (
id INTEGER PRIMARY KEY,
hash TEXT UNIQUE NOT NULL,
content TEXT NOT NULL,
role TEXT NOT NULL, -- 'user' | 'assistant'
source_type TEXT, -- 'jsonl' | 'export'
source_file TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- FTS5 full-text search index
CREATE VIRTUAL TABLE messages_fts USING fts5(
content,
content='messages',
content_rowid='id'
);

-- Knowledge extraction (decisions, patterns, errors)
CREATE TABLE knowledge (
id INTEGER PRIMARY KEY,
message_id INTEGER REFERENCES messages(id),
knowledge_type TEXT, -- 'decision' | 'pattern' | 'error_solution'
extracted_data JSON,
created_at TIMESTAMP
);

Current Workflow

┌────────────────────────────────────────────────────────────┐
│ User Session (Claude Code) │
├────────────────────────────────────────────────────────────┤
│ /cx → Extract messages → Deduplicate → SQLite │
│ /cxq → Query FTS5 index → Return results │
│ /recall → Knowledge retrieval → Context injection │
└────────────────────────────────────────────────────────────┘


┌────────────────────────────────────────────────────────────┐
│ GCP Backup (scripts/backup-context-db.sh) │
├────────────────────────────────────────────────────────────┤
│ SQLite .backup snapshot → gzip compression │
│ Upload to gs://coditect-cloud-infra-context-backups │
│ 67% compression (720MB → 235MB) │
└────────────────────────────────────────────────────────────┘

Limitations

  1. Single-User Design: No tenant isolation
  2. Local Storage: Not shareable across machines
  3. Write Conflicts: Concurrent sessions corrupt database
  4. Limited Scale: SQLite max ~1TB, 100 concurrent readers
  5. No Real-Time Sync: Changes not propagated to other sessions

Scaling Challenges

Multi-User Scenarios

ScenarioChallengeImpact
Team ContextShared knowledge base across 10-50 usersNeed tenant isolation + shared spaces
Parallel Sessions5+ Claude Code sessions simultaneouslyWrite conflicts, data loss
Organization Memory1000s of users in enterpriseNeed hierarchical key design
Cross-Project ContextContext relevant across projectsNeed flexible key spaces

Scale Targets (Phase 7)

MetricCurrentTargetFactor
Users11,000,0001M×
Organizations150,00050K×
Messages73K10B137K×
Concurrent Sessions1100,000100K×
Read Latency (p99)~50ms<10ms
Write Throughput~100/s1M/s10K×

FoundationDB Architecture

Why FoundationDB

CODITECT already uses FoundationDB in production for the cloud IDE (coditect.ai):

Current FDB Deployment (from IDE analysis):

  • GKE StatefulSet: 5 pods (3 coordinators + 2 proxies)
  • Version: FoundationDB 7.1+
  • Key space: /az1ai-ide/sessions/, /az1ai-ide/files/, /az1ai-ide/settings/

FoundationDB Advantages:

  1. ACID Transactions: Serializable isolation across distributed operations
  2. Multi-Tenant Native: Record Layer designed for massive multi-tenancy
  3. Sub-10ms Latency: Production-proven performance at Apple scale
  4. Horizontal Scale: Linear scaling to millions of operations/second
  5. Strong Consistency: No eventual consistency complexity
  6. Open Source: Apache 2.0 license, active community

Cluster Topology (Target)

Phase 6 (10K users):

FoundationDB Cluster
├── Coordinators: 5 (odd number for consensus)
├── Storage: 9 processes (3× replication factor)
├── Transaction Log: 3 processes
└── Proxies: 5 processes

Phase 7 (50K+ users):

FoundationDB Cluster (from diagrams/phase-7-enterprise-scale/README.md)
├── Storage Nodes: 15
├── Transaction Nodes: 7
├── Total Processes: 22+
└── Replication: Triple redundancy

Multi-Tenant Key Design

Hierarchical Key Structure

Using FoundationDB Directory Layer and Record Layer patterns:

/coditect/
├── context/ # Context memory subsystem
│ ├── tenants/
│ │ └── {tenant_id}/ # Organization-level isolation
│ │ ├── metadata # Tenant config, quotas
│ │ ├── users/
│ │ │ └── {user_id}/
│ │ │ ├── sessions/
│ │ │ │ └── {session_id}/
│ │ │ │ ├── messages/
│ │ │ │ │ └── {timestamp}_{hash} # Message data
│ │ │ │ └── state/ # Session state
│ │ │ └── preferences/ # User settings
│ │ ├── shared/ # Team-shared context
│ │ │ ├── decisions/ # Architectural decisions
│ │ │ ├── patterns/ # Code patterns
│ │ │ └── errors/ # Error solutions
│ │ └── projects/
│ │ └── {project_id}/
│ │ ├── messages/ # Project-specific context
│ │ └── knowledge/ # Project knowledge base
│ │
│ └── global/ # Cross-tenant (future: marketplace)
│ └── public_patterns/ # Community-shared patterns

└── events/ # Event sourcing log
└── {tenant_id}/
└── {timestamp}_{event_id} # Immutable event log

Key Design Principles

  1. Tenant Prefix First: All keys start with tenant_id for isolation
  2. Timestamp-Ordered: Natural time-series ordering for context
  3. Hash-Based Dedup: Content hash prevents duplicates
  4. Hierarchical Access: Fine-grained permissions at each level

Rust Key Types

use foundationdb::tuple::{Subspace, pack};

pub struct ContextKeys {
root: Subspace,
}

impl ContextKeys {
pub fn new() -> Self {
Self {
root: Subspace::all().subspace(&("coditect", "context")),
}
}

pub fn message_key(
&self,
tenant_id: &str,
user_id: &str,
session_id: &str,
timestamp: i64,
hash: &str,
) -> Vec<u8> {
self.root
.subspace(&("tenants", tenant_id, "users", user_id, "sessions", session_id, "messages"))
.pack(&(timestamp, hash))
}

pub fn shared_decision_key(
&self,
tenant_id: &str,
decision_id: &str,
) -> Vec<u8> {
self.root
.subspace(&("tenants", tenant_id, "shared", "decisions"))
.pack(&decision_id)
}

pub fn tenant_range(&self, tenant_id: &str) -> (Vec<u8>, Vec<u8>) {
let subspace = self.root.subspace(&("tenants", tenant_id));
subspace.range()
}
}

Data Model

Message Schema (Protobuf/Serde)

use serde::{Deserialize, Serialize};
use chrono::{DateTime, Utc};

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContextMessage {
/// Content hash (SHA-256)
pub hash: String,

/// Message content
pub content: String,

/// Role: user | assistant | system
pub role: String,

/// Provenance tracking
pub provenance: Provenance,

/// Timestamps
pub occurred_at: DateTime<Utc>,
pub indexed_at: DateTime<Utc>,

/// Extracted metadata
pub metadata: MessageMetadata,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Provenance {
pub tenant_id: String,
pub user_id: String,
pub session_id: String,
pub source_type: String, // "claude-code" | "export" | "api"
pub source_file: Option<String>,
pub source_line: Option<u32>,
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MessageMetadata {
pub content_length: usize,
pub has_code: bool,
pub has_markdown: bool,
pub language_hints: Vec<String>,
pub topics: Vec<String>,
}

Knowledge Extraction Schema

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum Knowledge {
Decision {
id: String,
title: String,
context: String,
decision: String,
rationale: String,
source_messages: Vec<String>, // Message hashes
created_at: DateTime<Utc>,
},

Pattern {
id: String,
name: String,
description: String,
code_example: Option<String>,
use_cases: Vec<String>,
source_messages: Vec<String>,
created_at: DateTime<Utc>,
},

ErrorSolution {
id: String,
error_signature: String,
solution: String,
steps: Vec<String>,
source_messages: Vec<String>,
created_at: DateTime<Utc>,
},
}

Event Sourcing Pattern

Why Event Sourcing

For multi-user context sync, event sourcing provides:

  1. Conflict Resolution: Events are immutable, conflicts resolved by ordering
  2. Audit Trail: Complete history of all changes
  3. Time Travel: Reconstruct state at any point
  4. Decoupling: Writers and readers operate independently
  5. Scalability: Append-only is highly parallelizable

Event Types

#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "event_type")]
pub enum ContextEvent {
/// New message indexed
MessageIndexed {
event_id: String,
tenant_id: String,
user_id: String,
session_id: String,
message_hash: String,
message: ContextMessage,
timestamp: DateTime<Utc>,
},

/// Knowledge extracted from messages
KnowledgeExtracted {
event_id: String,
tenant_id: String,
knowledge: Knowledge,
source_message_hashes: Vec<String>,
timestamp: DateTime<Utc>,
},

/// Knowledge shared with team
KnowledgeShared {
event_id: String,
tenant_id: String,
user_id: String,
knowledge_id: String,
shared_scope: SharedScope, // Team | Project | Public
timestamp: DateTime<Utc>,
},

/// Session started
SessionStarted {
event_id: String,
tenant_id: String,
user_id: String,
session_id: String,
project_path: String,
timestamp: DateTime<Utc>,
},

/// Session ended
SessionEnded {
event_id: String,
tenant_id: String,
user_id: String,
session_id: String,
message_count: u32,
timestamp: DateTime<Utc>,
},
}

#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SharedScope {
User, // Private to user
Team, // Shared within organization
Project, // Shared within project
Public, // Public marketplace (future)
}

Event Storage in FDB

use foundationdb::{Database, Transaction, RangeOption};

pub async fn append_event(
db: &Database,
event: &ContextEvent,
) -> Result<(), FdbError> {
let tenant_id = event.tenant_id();
let timestamp = event.timestamp().timestamp_micros();
let event_id = event.event_id();

db.run(|trx| async {
let key = format!(
"/coditect/events/{}/{}_{}",
tenant_id, timestamp, event_id
);
let value = serde_json::to_vec(&event)?;
trx.set(key.as_bytes(), &value);
Ok(())
}).await
}

pub async fn replay_events(
db: &Database,
tenant_id: &str,
since: DateTime<Utc>,
) -> Result<Vec<ContextEvent>, FdbError> {
let start_key = format!(
"/coditect/events/{}/{}",
tenant_id, since.timestamp_micros()
);
let end_key = format!(
"/coditect/events/{}/\xff",
tenant_id
);

db.run(|trx| async {
let range = trx.get_range(
&RangeOption::from((start_key.as_bytes(), end_key.as_bytes())),
0,
false,
).await?;

range.iter()
.map(|kv| serde_json::from_slice(kv.value()))
.collect()
}).await
}

Session Synchronization

Real-Time Sync Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Session A (User 1) Session B (User 1) │
│ ↓ index message ↓ index message │
│ ↓ ↓ │
├─────────────────────────────────────────────────────────────────┤
│ Event Bus (FDB Watches) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ /coditect/events/{tenant}/... │ │
│ │ ├── 1734345600000000_evt1 (Session A indexed msg) │ │
│ │ ├── 1734345600001000_evt2 (Session B indexed msg) │ │
│ │ └── ... │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ↓ │
│ Event Processor │
│ ↓ │
├─────────────────────────────────────────────────────────────────┤
│ Materialized Views (per session) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Session A │ │ Session B │ │ Session C │ │
│ │ local view │ │ local view │ │ local view │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘

FDB Watch API for Real-Time

use foundationdb::future::FdbFuture;

/// Watch for new events in tenant's event log
pub async fn watch_events(
db: &Database,
tenant_id: &str,
last_seen: i64,
callback: impl Fn(ContextEvent),
) -> Result<(), FdbError> {
let watch_key = format!(
"/coditect/events/{}/latest",
tenant_id
);

loop {
let (watch, current_value) = db.run(|trx| async {
let value = trx.get(watch_key.as_bytes()).await?;
let watch = trx.watch(watch_key.as_bytes());
Ok((watch, value))
}).await?;

// Process any new events since last_seen
let events = replay_events(db, tenant_id, last_seen).await?;
for event in events {
callback(event);
}

// Wait for next change
watch.await?;
}
}

Migration Strategy

Phase 1: Dual-Write (Weeks 1-4)

┌─────────────────────────────────────────────────────────────────┐
│ /cx Command (modified) │
├─────────────────────────────────────────────────────────────────┤
│ 1. Extract messages (existing) │
│ 2. Write to SQLite (existing) │
│ 3. Write to FDB (new) ← dual-write │
│ 4. Verify consistency │
└─────────────────────────────────────────────────────────────────┘

Implementation:

  • Add FDB client to context extraction scripts
  • Write every message to both SQLite and FDB
  • Log any discrepancies for debugging
  • SQLite remains primary (read path unchanged)

Phase 2: Shadow Read (Weeks 5-8)

┌─────────────────────────────────────────────────────────────────┐
│ /cxq Command (modified) │
├─────────────────────────────────────────────────────────────────┤
│ 1. Query SQLite (primary) │
│ 2. Query FDB (shadow) ← parallel │
│ 3. Compare results │
│ 4. Return SQLite results │
│ 5. Log differences │
└─────────────────────────────────────────────────────────────────┘

Implementation:

  • Add FDB read path in parallel with SQLite
  • Compare results for consistency validation
  • Measure FDB latency vs SQLite
  • Build confidence in FDB correctness

Phase 3: FDB Primary (Weeks 9-12)

┌─────────────────────────────────────────────────────────────────┐
│ /cx and /cxq Commands │
├─────────────────────────────────────────────────────────────────┤
│ FDB is now primary │
│ SQLite kept as backup (reads on FDB failure) │
│ GCP backup updated to export FDB snapshot │
└─────────────────────────────────────────────────────────────────┘

Phase 4: Multi-Tenant (Weeks 13-16)

  • Enable tenant isolation in FDB
  • Add user/session tracking
  • Implement event sourcing
  • Add real-time sync via watches

Performance Projections

Latency Targets

OperationSQLite (Current)FDB (Target)Notes
Single message read~1ms<5msFDB adds network hop
FTS query (10 results)~50ms<20msFDB secondary indexes
Message write~2ms<15msFDB with fsync
Batch write (100 msgs)~100ms<50msFDB batch transactions
Knowledge extraction~200ms<100msParallelized

Throughput Projections

ScaleMessages/DayFDB Cluster SizeEst. Cost/Month
100 users100K5 nodes~$200
1,000 users1M9 nodes~$500
10,000 users10M15 nodes~$1,500
100,000 users100M25 nodes~$4,000
1,000,000 users1B50 nodes~$10,000

Storage Projections

ScaleMessagesRaw SizeWith CompressionEst. Cost/Month
100 users1M5GB1.5GB~$0.50
1,000 users10M50GB15GB~$5
10,000 users100M500GB150GB~$50
100,000 users1B5TB1.5TB~$500
1,000,000 users10B50TB15TB~$5,000

Implementation Roadmap

Sprint 1: FDB Client Integration (Week 1-2)

  • Add foundationdb Rust crate to backend
  • Create ContextKeys utility for key generation
  • Implement basic read/write operations
  • Add unit tests for FDB operations

Sprint 2: Dual-Write Path (Week 3-4)

  • Modify unified-message-extractor.py for dual-write
  • Add FDB write to context extraction
  • Implement consistency verification
  • Create monitoring dashboard

Sprint 3: Shadow Read Path (Week 5-6)

  • Add FDB query path to /cxq
  • Implement parallel query execution
  • Add result comparison logging
  • Performance benchmarking

Sprint 4: Event Sourcing (Week 7-8)

  • Implement ContextEvent types
  • Create event log storage in FDB
  • Add event processor service
  • Implement replay_events function

Sprint 5: Multi-Tenant (Week 9-10)

  • Add tenant isolation to key design
  • Implement tenant creation/management
  • Add quota enforcement per tenant
  • Test cross-tenant isolation

Sprint 6: Real-Time Sync (Week 11-12)

  • Implement FDB watch API integration
  • Create session sync service
  • Add WebSocket bridge for Claude Code
  • End-to-end multi-session testing

Sprint 7: Cutover (Week 13-14)

  • FDB becomes primary
  • Update backup scripts for FDB
  • Migrate existing SQLite data
  • Update documentation

Sprint 8: Production Hardening (Week 15-16)

  • Performance optimization
  • Chaos testing (node failures)
  • Security audit
  • Documentation finalization

CODITECT Ecosystem

FoundationDB Resources

Multi-Tenant Best Practices


Appendix: Current Backup Script

The existing backup script (scripts/backup-context-db.sh) will be updated to support FDB:

Current Features:

  • SQLite .backup snapshot (parallel-session safe)
  • Gzip compression (67% reduction)
  • GCP Cloud Storage upload
  • 90-day retention policy

FDB Additions (Phase 3+):

  • FDB snapshot export using fdbbackup tool
  • Incremental backups for large datasets
  • Point-in-time recovery support

Last Updated: 2025-12-16 Maintained By: CODITECT Architecture Team Next Review: After Sprint 2 completion