Multi Tenant Context Architecture with FoundationDB
Multi-Tenant Context Architecture with FoundationDB
Document Version: 1.0 Date: 2025-12-16 Status: DRAFT (Architecture Planning) Author: Architecture Team Purpose: Scalable context memory system for 1M+ users across organizations
Executive Summary
This document outlines the architecture for evolving CODITECT's context memory system from a single-user SQLite database to a multi-tenant, multi-user system supporting 1M+ users across thousands of organizations using FoundationDB as the persistence layer.
Current State:
- SQLite-based context database (~600MB)
- 73,000+ unique messages with deduplication
- Single-user, single-session design
- GCP Cloud Storage backup with snapshot + compression
Target State:
- FoundationDB-based distributed context store
- Multi-tenant isolation with per-organization key prefixes
- Real-time sync across multiple sessions and users
- Event-sourced architecture for conflict resolution
- Scale to 1M users with sub-10ms read latency
Table of Contents
- Current Architecture Analysis
- Scaling Challenges
- FoundationDB Architecture
- Multi-Tenant Key Design
- Data Model
- Event Sourcing Pattern
- Session Synchronization
- Migration Strategy
- Performance Projections
- Implementation Roadmap
Current Architecture Analysis
SQLite Context Store
Location: context-storage/context.db
Size: ~600MB (growing)
Schema:
-- Messages table (primary store)
CREATE TABLE messages (
id INTEGER PRIMARY KEY,
hash TEXT UNIQUE NOT NULL,
content TEXT NOT NULL,
role TEXT NOT NULL, -- 'user' | 'assistant'
source_type TEXT, -- 'jsonl' | 'export'
source_file TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- FTS5 full-text search index
CREATE VIRTUAL TABLE messages_fts USING fts5(
content,
content='messages',
content_rowid='id'
);
-- Knowledge extraction (decisions, patterns, errors)
CREATE TABLE knowledge (
id INTEGER PRIMARY KEY,
message_id INTEGER REFERENCES messages(id),
knowledge_type TEXT, -- 'decision' | 'pattern' | 'error_solution'
extracted_data JSON,
created_at TIMESTAMP
);
Current Workflow
┌────────────────────────────────────────────────────────────┐
│ User Session (Claude Code) │
├────────────────────────────────────────────────────────────┤
│ /cx → Extract messages → Deduplicate → SQLite │
│ /cxq → Query FTS5 index → Return results │
│ /recall → Knowledge retrieval → Context injection │
└────────────────────────────────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────┐
│ GCP Backup (scripts/backup-context-db.sh) │
├────────────────────────────────────────────────────────────┤
│ SQLite .backup snapshot → gzip compression │
│ Upload to gs://coditect-cloud-infra-context-backups │
│ 67% compression (720MB → 235MB) │
└────────────────────────────────────────────────────────────┘
Limitations
- Single-User Design: No tenant isolation
- Local Storage: Not shareable across machines
- Write Conflicts: Concurrent sessions corrupt database
- Limited Scale: SQLite max ~1TB, 100 concurrent readers
- No Real-Time Sync: Changes not propagated to other sessions
Scaling Challenges
Multi-User Scenarios
| Scenario | Challenge | Impact |
|---|---|---|
| Team Context | Shared knowledge base across 10-50 users | Need tenant isolation + shared spaces |
| Parallel Sessions | 5+ Claude Code sessions simultaneously | Write conflicts, data loss |
| Organization Memory | 1000s of users in enterprise | Need hierarchical key design |
| Cross-Project Context | Context relevant across projects | Need flexible key spaces |
Scale Targets (Phase 7)
| Metric | Current | Target | Factor |
|---|---|---|---|
| Users | 1 | 1,000,000 | 1M× |
| Organizations | 1 | 50,000 | 50K× |
| Messages | 73K | 10B | 137K× |
| Concurrent Sessions | 1 | 100,000 | 100K× |
| Read Latency (p99) | ~50ms | <10ms | 5× |
| Write Throughput | ~100/s | 1M/s | 10K× |
FoundationDB Architecture
Why FoundationDB
CODITECT already uses FoundationDB in production for the cloud IDE (coditect.ai):
Current FDB Deployment (from IDE analysis):
- GKE StatefulSet: 5 pods (3 coordinators + 2 proxies)
- Version: FoundationDB 7.1+
- Key space:
/az1ai-ide/sessions/,/az1ai-ide/files/,/az1ai-ide/settings/
FoundationDB Advantages:
- ACID Transactions: Serializable isolation across distributed operations
- Multi-Tenant Native: Record Layer designed for massive multi-tenancy
- Sub-10ms Latency: Production-proven performance at Apple scale
- Horizontal Scale: Linear scaling to millions of operations/second
- Strong Consistency: No eventual consistency complexity
- Open Source: Apache 2.0 license, active community
Cluster Topology (Target)
Phase 6 (10K users):
FoundationDB Cluster
├── Coordinators: 5 (odd number for consensus)
├── Storage: 9 processes (3× replication factor)
├── Transaction Log: 3 processes
└── Proxies: 5 processes
Phase 7 (50K+ users):
FoundationDB Cluster (from diagrams/phase-7-enterprise-scale/README.md)
├── Storage Nodes: 15
├── Transaction Nodes: 7
├── Total Processes: 22+
└── Replication: Triple redundancy
Multi-Tenant Key Design
Hierarchical Key Structure
Using FoundationDB Directory Layer and Record Layer patterns:
/coditect/
├── context/ # Context memory subsystem
│ ├── tenants/
│ │ └── {tenant_id}/ # Organization-level isolation
│ │ ├── metadata # Tenant config, quotas
│ │ ├── users/
│ │ │ └── {user_id}/
│ │ │ ├── sessions/
│ │ │ │ └── {session_id}/
│ │ │ │ ├── messages/
│ │ │ │ │ └── {timestamp}_{hash} # Message data
│ │ │ │ └── state/ # Session state
│ │ │ └── preferences/ # User settings
│ │ ├── shared/ # Team-shared context
│ │ │ ├── decisions/ # Architectural decisions
│ │ │ ├── patterns/ # Code patterns
│ │ │ └── errors/ # Error solutions
│ │ └── projects/
│ │ └── {project_id}/
│ │ ├── messages/ # Project-specific context
│ │ └── knowledge/ # Project knowledge base
│ │
│ └── global/ # Cross-tenant (future: marketplace)
│ └── public_patterns/ # Community-shared patterns
│
└── events/ # Event sourcing log
└── {tenant_id}/
└── {timestamp}_{event_id} # Immutable event log
Key Design Principles
- Tenant Prefix First: All keys start with tenant_id for isolation
- Timestamp-Ordered: Natural time-series ordering for context
- Hash-Based Dedup: Content hash prevents duplicates
- Hierarchical Access: Fine-grained permissions at each level
Rust Key Types
use foundationdb::tuple::{Subspace, pack};
pub struct ContextKeys {
root: Subspace,
}
impl ContextKeys {
pub fn new() -> Self {
Self {
root: Subspace::all().subspace(&("coditect", "context")),
}
}
pub fn message_key(
&self,
tenant_id: &str,
user_id: &str,
session_id: &str,
timestamp: i64,
hash: &str,
) -> Vec<u8> {
self.root
.subspace(&("tenants", tenant_id, "users", user_id, "sessions", session_id, "messages"))
.pack(&(timestamp, hash))
}
pub fn shared_decision_key(
&self,
tenant_id: &str,
decision_id: &str,
) -> Vec<u8> {
self.root
.subspace(&("tenants", tenant_id, "shared", "decisions"))
.pack(&decision_id)
}
pub fn tenant_range(&self, tenant_id: &str) -> (Vec<u8>, Vec<u8>) {
let subspace = self.root.subspace(&("tenants", tenant_id));
subspace.range()
}
}
Data Model
Message Schema (Protobuf/Serde)
use serde::{Deserialize, Serialize};
use chrono::{DateTime, Utc};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct ContextMessage {
/// Content hash (SHA-256)
pub hash: String,
/// Message content
pub content: String,
/// Role: user | assistant | system
pub role: String,
/// Provenance tracking
pub provenance: Provenance,
/// Timestamps
pub occurred_at: DateTime<Utc>,
pub indexed_at: DateTime<Utc>,
/// Extracted metadata
pub metadata: MessageMetadata,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Provenance {
pub tenant_id: String,
pub user_id: String,
pub session_id: String,
pub source_type: String, // "claude-code" | "export" | "api"
pub source_file: Option<String>,
pub source_line: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct MessageMetadata {
pub content_length: usize,
pub has_code: bool,
pub has_markdown: bool,
pub language_hints: Vec<String>,
pub topics: Vec<String>,
}
Knowledge Extraction Schema
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "type")]
pub enum Knowledge {
Decision {
id: String,
title: String,
context: String,
decision: String,
rationale: String,
source_messages: Vec<String>, // Message hashes
created_at: DateTime<Utc>,
},
Pattern {
id: String,
name: String,
description: String,
code_example: Option<String>,
use_cases: Vec<String>,
source_messages: Vec<String>,
created_at: DateTime<Utc>,
},
ErrorSolution {
id: String,
error_signature: String,
solution: String,
steps: Vec<String>,
source_messages: Vec<String>,
created_at: DateTime<Utc>,
},
}
Event Sourcing Pattern
Why Event Sourcing
For multi-user context sync, event sourcing provides:
- Conflict Resolution: Events are immutable, conflicts resolved by ordering
- Audit Trail: Complete history of all changes
- Time Travel: Reconstruct state at any point
- Decoupling: Writers and readers operate independently
- Scalability: Append-only is highly parallelizable
Event Types
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "event_type")]
pub enum ContextEvent {
/// New message indexed
MessageIndexed {
event_id: String,
tenant_id: String,
user_id: String,
session_id: String,
message_hash: String,
message: ContextMessage,
timestamp: DateTime<Utc>,
},
/// Knowledge extracted from messages
KnowledgeExtracted {
event_id: String,
tenant_id: String,
knowledge: Knowledge,
source_message_hashes: Vec<String>,
timestamp: DateTime<Utc>,
},
/// Knowledge shared with team
KnowledgeShared {
event_id: String,
tenant_id: String,
user_id: String,
knowledge_id: String,
shared_scope: SharedScope, // Team | Project | Public
timestamp: DateTime<Utc>,
},
/// Session started
SessionStarted {
event_id: String,
tenant_id: String,
user_id: String,
session_id: String,
project_path: String,
timestamp: DateTime<Utc>,
},
/// Session ended
SessionEnded {
event_id: String,
tenant_id: String,
user_id: String,
session_id: String,
message_count: u32,
timestamp: DateTime<Utc>,
},
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum SharedScope {
User, // Private to user
Team, // Shared within organization
Project, // Shared within project
Public, // Public marketplace (future)
}
Event Storage in FDB
use foundationdb::{Database, Transaction, RangeOption};
pub async fn append_event(
db: &Database,
event: &ContextEvent,
) -> Result<(), FdbError> {
let tenant_id = event.tenant_id();
let timestamp = event.timestamp().timestamp_micros();
let event_id = event.event_id();
db.run(|trx| async {
let key = format!(
"/coditect/events/{}/{}_{}",
tenant_id, timestamp, event_id
);
let value = serde_json::to_vec(&event)?;
trx.set(key.as_bytes(), &value);
Ok(())
}).await
}
pub async fn replay_events(
db: &Database,
tenant_id: &str,
since: DateTime<Utc>,
) -> Result<Vec<ContextEvent>, FdbError> {
let start_key = format!(
"/coditect/events/{}/{}",
tenant_id, since.timestamp_micros()
);
let end_key = format!(
"/coditect/events/{}/\xff",
tenant_id
);
db.run(|trx| async {
let range = trx.get_range(
&RangeOption::from((start_key.as_bytes(), end_key.as_bytes())),
0,
false,
).await?;
range.iter()
.map(|kv| serde_json::from_slice(kv.value()))
.collect()
}).await
}
Session Synchronization
Real-Time Sync Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Session A (User 1) Session B (User 1) │
│ ↓ index message ↓ index message │
│ ↓ ↓ │
├─────────────────────────────────────────────────────────────────┤
│ Event Bus (FDB Watches) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ /coditect/events/{tenant}/... │ │
│ │ ├── 1734345600000000_evt1 (Session A indexed msg) │ │
│ │ ├── 1734345600001000_evt2 (Session B indexed msg) │ │
│ │ └── ... │ │
│ └─────────────────────────────────────────────────────────┘ │
│ ↓ │
│ Event Processor │
│ ↓ │
├─────────────────────────────────────────────────────────────────┤
│ Materialized Views (per session) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Session A │ │ Session B │ │ Session C │ │
│ │ local view │ │ local view │ │ local view │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘
FDB Watch API for Real-Time
use foundationdb::future::FdbFuture;
/// Watch for new events in tenant's event log
pub async fn watch_events(
db: &Database,
tenant_id: &str,
last_seen: i64,
callback: impl Fn(ContextEvent),
) -> Result<(), FdbError> {
let watch_key = format!(
"/coditect/events/{}/latest",
tenant_id
);
loop {
let (watch, current_value) = db.run(|trx| async {
let value = trx.get(watch_key.as_bytes()).await?;
let watch = trx.watch(watch_key.as_bytes());
Ok((watch, value))
}).await?;
// Process any new events since last_seen
let events = replay_events(db, tenant_id, last_seen).await?;
for event in events {
callback(event);
}
// Wait for next change
watch.await?;
}
}
Migration Strategy
Phase 1: Dual-Write (Weeks 1-4)
┌─────────────────────────────────────────────────────────────────┐
│ /cx Command (modified) │
├─────────────────────────────────────────────────────────────────┤
│ 1. Extract messages (existing) │
│ 2. Write to SQLite (existing) │
│ 3. Write to FDB (new) ← dual-write │
│ 4. Verify consistency │
└─────────────────────────────────────────────────────────────────┘
Implementation:
- Add FDB client to context extraction scripts
- Write every message to both SQLite and FDB
- Log any discrepancies for debugging
- SQLite remains primary (read path unchanged)
Phase 2: Shadow Read (Weeks 5-8)
┌─────────────────────────────────────────────────────────────────┐
│ /cxq Command (modified) │
├─────────────────────────────────────────────────────────────────┤
│ 1. Query SQLite (primary) │
│ 2. Query FDB (shadow) ← parallel │
│ 3. Compare results │
│ 4. Return SQLite results │
│ 5. Log differences │
└─────────────────────────────────────────────────────────────────┘
Implementation:
- Add FDB read path in parallel with SQLite
- Compare results for consistency validation
- Measure FDB latency vs SQLite
- Build confidence in FDB correctness
Phase 3: FDB Primary (Weeks 9-12)
┌─────────────────────────────────────────────────────────────────┐
│ /cx and /cxq Commands │
├─────────────────────────────────────────────────────────────────┤
│ FDB is now primary │
│ SQLite kept as backup (reads on FDB failure) │
│ GCP backup updated to export FDB snapshot │
└─────────────────────────────────────────────────────────────────┘
Phase 4: Multi-Tenant (Weeks 13-16)
- Enable tenant isolation in FDB
- Add user/session tracking
- Implement event sourcing
- Add real-time sync via watches
Performance Projections
Latency Targets
| Operation | SQLite (Current) | FDB (Target) | Notes |
|---|---|---|---|
| Single message read | ~1ms | <5ms | FDB adds network hop |
| FTS query (10 results) | ~50ms | <20ms | FDB secondary indexes |
| Message write | ~2ms | <15ms | FDB with fsync |
| Batch write (100 msgs) | ~100ms | <50ms | FDB batch transactions |
| Knowledge extraction | ~200ms | <100ms | Parallelized |
Throughput Projections
| Scale | Messages/Day | FDB Cluster Size | Est. Cost/Month |
|---|---|---|---|
| 100 users | 100K | 5 nodes | ~$200 |
| 1,000 users | 1M | 9 nodes | ~$500 |
| 10,000 users | 10M | 15 nodes | ~$1,500 |
| 100,000 users | 100M | 25 nodes | ~$4,000 |
| 1,000,000 users | 1B | 50 nodes | ~$10,000 |
Storage Projections
| Scale | Messages | Raw Size | With Compression | Est. Cost/Month |
|---|---|---|---|---|
| 100 users | 1M | 5GB | 1.5GB | ~$0.50 |
| 1,000 users | 10M | 50GB | 15GB | ~$5 |
| 10,000 users | 100M | 500GB | 150GB | ~$50 |
| 100,000 users | 1B | 5TB | 1.5TB | ~$500 |
| 1,000,000 users | 10B | 50TB | 15TB | ~$5,000 |
Implementation Roadmap
Sprint 1: FDB Client Integration (Week 1-2)
- Add
foundationdbRust crate to backend - Create ContextKeys utility for key generation
- Implement basic read/write operations
- Add unit tests for FDB operations
Sprint 2: Dual-Write Path (Week 3-4)
- Modify
unified-message-extractor.pyfor dual-write - Add FDB write to context extraction
- Implement consistency verification
- Create monitoring dashboard
Sprint 3: Shadow Read Path (Week 5-6)
- Add FDB query path to
/cxq - Implement parallel query execution
- Add result comparison logging
- Performance benchmarking
Sprint 4: Event Sourcing (Week 7-8)
- Implement ContextEvent types
- Create event log storage in FDB
- Add event processor service
- Implement replay_events function
Sprint 5: Multi-Tenant (Week 9-10)
- Add tenant isolation to key design
- Implement tenant creation/management
- Add quota enforcement per tenant
- Test cross-tenant isolation
Sprint 6: Real-Time Sync (Week 11-12)
- Implement FDB watch API integration
- Create session sync service
- Add WebSocket bridge for Claude Code
- End-to-end multi-session testing
Sprint 7: Cutover (Week 13-14)
- FDB becomes primary
- Update backup scripts for FDB
- Migrate existing SQLite data
- Update documentation
Sprint 8: Production Hardening (Week 15-16)
- Performance optimization
- Chaos testing (node failures)
- Security audit
- Documentation finalization
Related Documentation
CODITECT Ecosystem
FoundationDB Resources
Multi-Tenant Best Practices
Appendix: Current Backup Script
The existing backup script (scripts/backup-context-db.sh) will be updated to support FDB:
Current Features:
- SQLite
.backupsnapshot (parallel-session safe) - Gzip compression (67% reduction)
- GCP Cloud Storage upload
- 90-day retention policy
FDB Additions (Phase 3+):
- FDB snapshot export using
fdbbackuptool - Incremental backups for large datasets
- Point-in-time recovery support
Last Updated: 2025-12-16 Maintained By: CODITECT Architecture Team Next Review: After Sprint 2 completion