Skip to main content

V5 FoundationDB Schema and ADR Analysis

Date: 2025-10-07 Purpose: Comprehensive analysis of V4 database models and ADRs to inform V5 multi-tenant architecture Status: Active Reference Document


Executive Summary

This document analyzes V4's FoundationDB schema and Architecture Decision Records (ADRs) to extract proven patterns for V5's multi-tenant, multi-session, multi-llm IDE with automated pod provisioning.

Key Findings:

  1. Multi-tenant patterns are battle-tested - V4 uses tenant isolation extensively
  2. Session management exists - V4 has WebSocket session tracking with JWT auth
  3. License/subscription system ready - Stripe integration, quota tracking, usage monitoring
  4. workspace pod allocation - V4 tracks user → pod assignments
  5. ⚠️ V4 lacks automated provisioning - Pods were manually provisioned, not automated
  6. ⚠️ No Helm/ArgoCD - V4 used direct kubectl deployments

Table of Contents

  1. FoundationDB Key Schema
  2. Core Data Models
  3. ADR Summary
  4. Multi-Tenant Architecture
  5. Session Management
  6. User Authentication & Authorization
  7. License & Billing
  8. workspace Pod Management
  9. Gaps for V5
  10. Recommendations for V5

FoundationDB Key Schema

V4 Key Structure (Proven Patterns)

FoundationDB Key Hierarchy (UTF-8 encoded strings)
├── users/
│ ├── {user_id} → User record
│ ├── by_email/{email} → Email → user_id index
│ └── {user_id}/
│ ├── tenants/ → User's tenant associations
│ │ └── {tenant_id} → UserTenantAssociation
│ ├── session/{session_id} → Active user session (JWT)
│ └── apikey/{key_id} → API key metadata

├── tenants/
│ ├── {tenant_id} → Tenant record
│ ├── by_domain/{domain} → Domain → tenant_id index
│ └── {tenant_id}/
│ ├── users/ → Reverse index: tenant → users
│ │ └── {user_id} → User ID reference
│ ├── sessions/ → All sessions in tenant
│ │ └── {session_id} → Session metadata
│ └── workspaces/ → workspace assignments
│ └── {user_id} → workspaceAssignment

├── sessions/
│ ├── {session_id} → UserSession record (JWT metadata)
│ └── {session_id}/
│ ├── metadata → Session info, timestamps
│ ├── editor-tabs/ → Open files, positions
│ ├── llm-messages/ → Conversation history
│ └── config/ → Session-specific settings

├── workspaces/
│ └── {assignment_id} → workspaceAssignment (user → pod mapping)

├── pods/
│ └── {pod_name}/ → PodAllocation (load balancing)
│ ├── users/ → Active users in pod
│ └── health → Last health check

├── licenses/
│ ├── {user_id} → UserLicense (current license)
│ ├── history/{history_id} → LicenseHistory (audit log)
│ └── usage/{tenant_id}/{month} → TenantUsage (monthly tracking)

├── quotas/
│ └── {user_id}/{period} → UserQuota (daily/monthly limits)

├── apikeys/
│ └── {key_id} → APIKey record

├── files/
│ └── {file_path}/
│ ├── content → File contents
│ ├── metadata → Language, encoding, timestamps
│ └── versions/ → Version history

├── settings/
│ ├── global/ → Global preferences
│ └── workspace/ → workspace settings

├── models/
│ └── {model_id}/ → llm model configurations

└── dspy_prompts/ → DSPy optimization cache
└── {tenant_id}/{user_id}/{task_type}/{signature} → DspyPromptCache

Key Design Principles (from ADR-004):

  1. Hierarchical Keys: Slash-separated paths for readability and range queries
  2. Secondary Indexes: Dedicated keys for lookups (email, domain, OAuth)
  3. UTF-8 Encoding: Human-readable keys for debugging
  4. Bidirectional Relationships: Both users/{id}/tenants AND tenants/{id}/users
  5. Atomic Transactions: All related updates in single FDB transaction

Core Data Models

1. User & Tenant Models

// Core user record (multi-tenant capable)
pub struct User {
pub user_id: Uuid,
pub email: String,
pub first_name: String,
pub last_name: String,
pub is_active: bool,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub password_hash: String, // Argon2
pub primary_tenant_id: Uuid, // Self-tenant (deterministic UUID v5)
}

// Tenant (organization/workspace)
pub struct Tenant {
pub tenant_id: Uuid,
pub name: String,
pub domain: Option<String>, // Custom domain for SSO
pub is_active: bool,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
}

// User-Tenant association with roles
pub struct UserTenantAssociation {
pub user_id: Uuid,
pub tenant_id: Uuid,
pub role: String, // "owner", "admin", "member", "viewer"
pub company: Option<String>,
pub joined_at: DateTime<Utc>,
pub invited_by: Option<Uuid>,
pub is_active: bool,
}

Self-Tenant Pattern (Critical for Multi-Tenancy):

  • Every user gets a personal "self-tenant" on registration
  • Self-tenant ID is deterministic: UUID v5(namespace, user_id)
  • User is "owner" of their self-tenant
  • Users can join multiple organization tenants with different roles
  • All data is scoped to tenant_id → enables perfect isolation

2. Session Models

// From ADR-007 (Multi-Session Architecture)
interface Session {
id: string; // Unique session ID
name: string; // User-defined name
icon?: string; // Optional icon
createdAt: Date;
updatedAt: Date;

// State references
editorState: {
tabs: editorTab[];
activeTabId: string | null;
scrollPosition: number;
};

llmState: {
messages: Message[];
primaryModel: string | null;
secondaryModel: string | null;
mode: WorkflowMode; // 'single' | 'parallel' | 'sequential' | 'consensus'
config: llmConfig;
};

fileState: {
workspaceRoot: string;
expandedFolders: string[];
selectedFile: string | null;
};

terminalState: {
cwd: string;
history: string[];
buffer: string;
};

// Metadata
isDirty: boolean; // Unsaved changes
isActive: boolean; // Currently active
order: number; // Tab order
}

// Backend session tracking (JWT metadata)
pub struct UserSession {
pub session_id: String, // UUID
pub user_id: String,
pub access_token: String, // JWT (15 min)
pub refresh_token: String, // JWT (7 days)
pub ip_address: String,
pub user_agent: String,
pub created_at: i64,
pub expires_at: i64,
pub last_activity_at: i64,
pub is_active: bool,
}

Session Management Pattern:

  • Frontend Sessions: Browser tabs with separate editor/llm/terminal state (Zustand store)
  • Backend Sessions: JWT-tracked authentication sessions (FoundationDB)
  • Auto-save: Every 500ms after changes (debounced)
  • Persistence: Sessions survive browser restarts (loaded from FDB)

3. workspace Pod Models

// User → Pod assignment tracking
pub struct workspaceAssignment {
pub id: Uuid,
pub user_id: Uuid,
pub tenant_id: Uuid,
pub pod_name: String, // e.g., "workspace-abc123"
pub namespace: String, // e.g., "user-{user_id}"
pub assigned_at: DateTime<Utc>,
pub last_active: DateTime<Utc>,
pub status: workspaceStatus, // Active | Idle | Suspended | Terminating | Failed
pub resource_usage: ResourceUsage,
}

pub struct ResourceUsage {
pub cpu_cores: f32,
pub memory_mb: u64,
pub storage_gb: f32,
pub network_gb: f32,
}

// Pod allocation for load balancing
pub struct PodAllocation {
pub pod_name: String,
pub namespace: String,
pub total_users: u32,
pub active_users: u32,
pub cpu_allocated: f32,
pub memory_allocated_mb: u64,
pub last_health_check: DateTime<Utc>,
pub is_available: bool,
}

// workspace activity for usage tracking
pub struct workspaceActivity {
pub id: Uuid,
pub workspace_id: Uuid,
pub user_id: Uuid,
pub activity_type: ActivityType, // terminalSession | FileOperation | CodeExecution | BuildProcess | Idle
pub timestamp: DateTime<Utc>,
pub duration_seconds: Option<u64>,
pub metadata: Option<serde_json::Value>,
}

workspace Pattern (V4 Manual, V5 Automated):

  • Each user gets a dedicated Kubernetes pod (theia IDE container)
  • Pod is provisioned in per-user namespace
  • FDB tracks user → pod assignment
  • Pod allocation tracks load balancing across cluster
  • Activity logging for billing and idle detection

4. License & Billing Models

// User's current license
pub struct UserLicense {
pub user_id: Uuid,
pub license_id: Uuid,
pub assigned_at: DateTime<Utc>,
pub previous_license_id: Option<Uuid>,
}

// License change audit trail
pub struct LicenseHistory {
pub id: Uuid,
pub user_id: Uuid,
pub license_id: Uuid,
pub action: LicenseAction, // Created | Upgraded | Downgraded | Renewed | Expired | Cancelled | PaymentFailed
pub reason: String,
pub metadata: serde_json::Value,
pub created_at: DateTime<Utc>,
}

// Monthly usage tracking per tenant
pub struct TenantUsage {
pub tenant_id: Uuid,
pub month: String, // "2025-10"
pub projects_count: i32,
pub agents_count: i32,
pub storage_gb: f32,
pub workspace_hours: f32, // Total pod runtime
pub api_calls: i64,
pub last_updated: DateTime<Utc>,
}

// License enforcement
pub struct TenantLicensePolicy {
pub tenant_id: Uuid,
pub owner_user_id: Uuid,
pub license_type: LicenseType, // Free | Starter | Pro | Enterprise
pub enforcement_mode: EnforcementMode, // Strict | Warning | GracePeriod
pub override_limits: Option<serde_json::Value>,
pub updated_at: DateTime<Utc>,
}

// Real-time usage events
pub struct UsageEvent {
pub id: Uuid,
pub tenant_id: Uuid,
pub user_id: Uuid,
pub resource_type: ResourceType, // Projects | Agents | Storage | workspaceHours
pub operation: UsageOperation,
pub amount: f32,
pub timestamp: DateTime<Utc>,
pub metadata: Option<serde_json::Value>,
}

Billing Integration (from ADR-021):

  • Stripe for payment processing
  • License tiers: Free, Starter ($29/mo), Pro ($99/mo), Enterprise (custom)
  • Quotas enforced via middleware (before llm calls, file operations)
  • Usage tracked in real-time → synced to Stripe monthly
  • Grace period enforcement for upgrades

5. Authentication Models

// From ADR-021 (User Management & Authentication)

interface User {
user_id: string; // UUID
email: string; // Unique
password_hash?: string; // bcrypt (null for OAuth-only users)
name: string;
avatar_url?: string;
role: 'admin' | 'developer' | 'viewer';
oauth_providers: Array<{
provider: 'google' | 'github';
provider_user_id: string;
access_token?: string; // Encrypted
}>;
created_at: number; // Timestamp
updated_at: number;
email_verified: boolean;
is_active: boolean;
metadata: Record<string, any>;
}

interface APIKey {
key_id: string; // UUID
user_id: string;
key_hash: string; // bcrypt of actual key
key_prefix: string; // First 8 chars for display (e.g., "ak_12ab...")
name: string; // User-defined name
permissions: string[]; // ['read:files', 'write:files', 'llm:chat']
created_at: number;
expires_at?: number;
last_used_at?: number;
is_active: boolean;
}

interface UserQuota {
user_id: string;
period: 'daily' | 'monthly';
period_start: number;

// Usage
llm_requests: number;
llm_tokens_used: number;
llm_cost_usd: number;
storage_bytes: number;
api_requests: number;

// Limits
llm_requests_limit: number;
llm_tokens_limit: number;
llm_cost_limit_usd: number;
storage_bytes_limit: number;
api_requests_limit: number;
}

Auth Pattern (JWT + OAuth + API Keys):

  • JWT: Stateless authentication (15-min access token, 7-day refresh token)
  • OAuth 2.0: Google + GitHub social login (reduces password breach risk)
  • API Keys: For programmatic access (CLI, CI/CD)
  • RBAC: Role-based permissions (admin/developer/viewer)
  • Quotas: Rate limiting and usage enforcement

ADR Summary

Critical ADRs for V5

ADRDecisionImpact on V5Status
ADR-004Use FoundationDB for PersistenceKEEP - Multi-tenant key schema provenReuse patterns
ADR-007Multi-Session ArchitectureKEEP - Browser tab sessions with FDB persistenceImplement
ADR-014Use Eclipse theia as FoundationKEEP - VS Code-like IDE in browserActive
ADR-017WebSocket Backend ArchitectureKEEP - Sidecar pattern for theia ↔ BackendImplement
ADR-020GCP DeploymentKEEP - GKE + Cloud Run infrastructureActive
ADR-021User Management & AuthenticationKEEP - JWT + OAuth + Stripe billingImplement
ADR-022Audit Logging ArchitectureKEEP - Compliance tracking (SOC2/GDPR)Implement

ADR-004: FoundationDB (Detailed)

Why FoundationDB over alternatives:

  • ACID Transactions: Serializable isolation (critical for multi-tenant)
  • Sub-10ms Latency: Fast session/file operations
  • Horizontal Scaling: Millions of ops/sec
  • Multi-Model: Key-value core + document/graph layers
  • Watch API: Real-time updates for collaborative features
  • Fault Tolerance: Self-healing, automatic replication

Rejected Alternatives:

  • ❌ PostgreSQL: Slower, less scalable for key-value workloads
  • ❌ MongoDB: Eventual consistency too weak
  • ❌ Redis: Not durable enough for primary storage
  • ❌ IndexedDB: Browser-only, no multi-client sync

Data Model:

/az1ai-ide/
├── sessions/{session-id}/
│ ├── metadata # Session info, timestamps
│ ├── editor-tabs/ # Open files, positions
│ ├── llm-messages/ # Conversation history
│ └── config/ # Session-specific settings
├── files/{file-path}/
│ ├── content # File contents
│ ├── metadata # Language, encoding, timestamps
│ └── versions/ # Version history
├── settings/
│ ├── global/ # Global preferences
│ └── workspace/ # workspace settings
└── models/{model-id}/ # Model configurations

ADR-007: Multi-Session Architecture

Decision: Tab-based sessions (like browser tabs)

Benefits:

  • ✅ Work on multiple projects simultaneously
  • ✅ Separate llm conversations per session
  • ✅ Independent editor/terminal contexts
  • ✅ Persistent across browser restarts
  • ✅ Keyboard shortcuts (Cmd+1-9 for session switching)

Session Lifecycle:

  1. Create: Generate session ID, initialize state
  2. Auto-save: Debounced saves every 500ms
  3. Switch: Save current, load target session
  4. Close: Prompt for unsaved changes, delete from FDB

Session Isolation:

  • Each session has isolated editor tabs, llm messages, terminal state
  • Sessions saved to FDB with prefix: sessions/{session-id}/
  • OPFS cache for offline mode (FDB is source of truth)

ADR-014: Eclipse theia Foundation

Decision: Use theia framework (not build IDE from scratch)

Why theia:

  • EPL 2.0 License: Free commercial use (no license fees)
  • VS Code Compatible: Runs VS Code extensions
  • Monaco editor: Same editor as VS Code
  • Saves 6-12 months: Don't rebuild file explorer, terminal, settings
  • Dependency Injection: Clean architecture with InversifyJS
  • Multi-Language Support: TypeScript, Python, Rust out-of-box

theia vs VS Code:

FeatureVS Codetheia
LicenseMITEPL 2.0
Browser Support
Extension APIFullCompatible
CustomizationLimitedFull (framework)
HostingDesktop onlyCloud-native

ADR-017: WebSocket Backend Architecture

Decision: Sidecar pattern for theia ↔ Backend communication

Architecture:

┌─────────────────────────────────────────┐
│ workspace Pod │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ theia IDE │ │ WebSocket │ │
│ │ (Port 3000) │←→│ Sidecar │ │
│ │ │ │ (Port 8080) │ │
│ └──────────────┘ └─────────────────┘ │
│ ↓ ↓ │
│ localhost:3000 localhost:8080 │
└─────────────────────────────────────────┘

┌─────────────┐
│ Auth Backend│
│ (JWT verify)│
└─────────────┘

┌─────────────┐
│ FoundationDB│
└─────────────┘

Why Sidecar:

  • Localhost Communication: No network latency
  • Security: WebSocket gateway validates JWT before forwarding
  • Simplicity: theia doesn't need FDB client
  • Isolation: Each pod has dedicated sidecar

ADR-020: GCP Deployment

Infrastructure (Already deployed):

  • GKE Cluster: codi-poc-e2-cluster (us-central1-a)
  • FoundationDB: 3-node StatefulSet (10.56.x.x:4500)
  • Container Registry: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect
  • Domain: coditect.ai (34.8.51.57, Google-managed SSL)
  • CI/CD: Cloud Build pipelines

Deployment Strategy:

  1. Backend API: Cloud Run (stateless, auto-scaling)
  2. workspace Pods: GKE (stateful, per-user pods)
  3. FoundationDB: GKE StatefulSet (persistent storage)
  4. Ingress: NGINX (load balancer + SSL termination)

ADR-021: User Management & Authentication

Authentication Methods:

  1. Email/Password: Argon2 password hashing
  2. Google OAuth: Social login (reduces friction)
  3. GitHub OAuth: Developer-friendly auth
  4. API Keys: For CLI/API access (bcrypt hashed)

JWT Strategy:

  • Access Token: 15 minutes (short-lived for security)
  • Refresh Token: 7 days (long-lived, revocable)
  • Claims: user_id, email, role, tenant_id

RBAC Roles:

const ROLES = {
admin: {
permissions: [{ resource: '*', actions: ['*'] }] // Full access
},
developer: {
permissions: [
{ resource: 'files', actions: ['read', 'write', 'delete'] },
{ resource: 'sessions', actions: ['read', 'write', 'delete'] },
{ resource: 'llm', actions: ['read', 'write'] },
{ resource: 'agents', actions: ['read', 'write'] },
{ resource: 'users', actions: ['read'] }, // Own profile only
]
},
viewer: {
permissions: [
{ resource: 'files', actions: ['read'] },
{ resource: 'sessions', actions: ['read'] },
{ resource: 'llm', actions: ['read'] },
]
}
};

Quota System:

interface UserQuota {
llm_requests_limit: 10000; // 10K requests/month
llm_tokens_limit: 10000000; // 10M tokens/month
llm_cost_limit_usd: 100; // $100/month
storage_bytes_limit: 10 * 1024 * 1024 * 1024; // 10GB
api_requests_limit: 100000; // 100K API requests/month
}

Multi-Tenant Architecture

Tenant Isolation Strategy

Key Principle: Every data access is scoped by tenant_id

FDB Key Prefixes:

tenant_id/resource_type/resource_id/...

Examples:
- 123e4567/sessions/abc-session-id/metadata
- 123e4567/files/src/main.ts/content
- 123e4567/workspaces/user-789/assignment

Security Enforcement:

// Middleware extracts tenant_id from JWT
pub async fn require_tenant_access(
req: &Request,
required_tenant_id: &Uuid,
) -> Result<(), AuthError> {
let user = req.user()?;

// Check if user belongs to tenant
let association = UserTenantRepository::get_association(
&user.user_id,
required_tenant_id
).await?;

if !association.is_active {
return Err(AuthError::Forbidden);
}

Ok(())
}

Benefits:

  • Perfect Isolation: Impossible to access other tenant's data
  • Range Queries: Fetch all sessions/files for tenant
  • Efficient: Single FDB transaction per operation
  • Auditable: All access logged with tenant context

Session Management

Session Types

1. Browser UI Sessions (ADR-007):

  • Tab-based workspaces in theia frontend
  • Persist to FDB: sessions/{session-id}/
  • Auto-save every 500ms (debounced)
  • Keyboard shortcuts: Cmd+1-9 for switching

2. Backend Auth Sessions (ADR-021):

  • JWT-tracked authentication sessions
  • Stored in FDB: user:sessions/{session-id}
  • Track: IP, user agent, last activity
  • Expire after 7 days or forced logout

3. WebSocket Sessions (ADR-017):

  • Real-time connection to backend
  • Validated with JWT on connect
  • Heartbeat every 30 seconds
  • Auto-reconnect on disconnect

Session Lifecycle

// Create session
const createSession = async (name?: string) => {
const session: Session = {
id: nanoid(),
name: name || `Session ${sessions.length + 1}`,
createdAt: new Date(),
updatedAt: new Date(),
editorState: { tabs: [], activeTabId: null, scrollPosition: 0 },
llmState: {
messages: [],
primaryModel: null,
secondaryModel: null,
mode: 'single',
config: defaultllmConfig
},
fileState: {
workspaceRoot: '/',
expandedFolders: [],
selectedFile: null
},
terminalState: {
cwd: '~',
history: [],
buffer: ''
},
isDirty: false,
isActive: true,
order: sessions.length
};

await fdbService.saveSession(session);
return session;
};

// Auto-save on changes
useEffect(() => {
if (!activeSession) return;

const saveTimeout = setTimeout(() => {
fdbService.saveSession(activeSession);
}, 500); // Debounced 500ms

return () => clearTimeout(saveTimeout);
}, [activeSession]);

// Load on mount
useEffect(() => {
const restoreLastSession = async () => {
const lastSessionId = localStorage.getItem('lastActiveSession');
if (lastSessionId) {
const session = await fdbService.loadSession(lastSessionId);
setActiveSession(session);
}
};

restoreLastSession();
}, []);

User Authentication & Authorization

JWT Token Flow

1. User Login (Email/Password or OAuth)

2. Backend Validates Credentials

3. Generate Tokens:
- Access Token (15 min, contains: user_id, email, role, tenant_id)
- Refresh Token (7 days, contains: user_id, type='refresh')

4. Store Session in FDB:
- user:sessions/{session_id}
- Contains: IP, user agent, tokens, timestamps

5. Return Tokens to Frontend

6. Frontend Stores in Memory (NOT localStorage for security)

7. All API Requests: Authorization: Bearer <access_token>

8. Backend Middleware Validates JWT

9. If Expired: Use Refresh Token to Get New Access Token

10. If Refresh Expired: Force Re-login

Permission Checking

// Middleware checks permissions before allowing operation
async function checkPermission(
user: User,
resource: string,
action: string
): Promise<boolean> {
const role = ROLES[user.role];
if (!role) return false;

for (const perm of role.permissions) {
if (perm.resource === '*' || perm.resource === resource) {
if (perm.actions.includes('*') || perm.actions.includes(action)) {
return true;
}
}
}

return false;
}

// Usage in route handler
app.post('/api/files/save', authenticate, async (req, res) => {
const hasPermission = await checkPermission(req.user, 'files', 'write');
if (!hasPermission) {
return res.status(403).json({ error: 'Permission denied' });
}

// ... save file
});

License & Billing

Stripe Integration

License Tiers:

const LICENSE_TIERS = {
free: {
price: 0,
limits: {
llm_requests: 100, // 100/month
storage_gb: 1, // 1GB
workspace_hours: 10, // 10 hours/month
agents: 1,
projects: 1,
}
},
starter: {
price: 29, // $29/month
stripe_price_id: 'price_abc123',
limits: {
llm_requests: 10000,
storage_gb: 10,
workspace_hours: 100,
agents: 5,
projects: 10,
}
},
pro: {
price: 99, // $99/month
stripe_price_id: 'price_def456',
limits: {
llm_requests: 100000,
storage_gb: 100,
workspace_hours: 500,
agents: -1, // Unlimited
projects: -1, // Unlimited
}
},
enterprise: {
price: null, // Custom pricing
limits: {
llm_requests: -1,
storage_gb: -1,
workspace_hours: -1,
agents: -1,
projects: -1,
}
}
};

Quota Enforcement:

// Before llm call
async function checkllmQuota(user_id: string): Promise<boolean> {
const quota = await fdbService.get(`quota:${user_id}:monthly`);
return quota.llm_requests < quota.llm_requests_limit;
}

// After llm call
async function incrementllmUsage(user_id: string, tokens: number): Promise<void> {
const quota = await fdbService.get(`quota:${user_id}:monthly`);
quota.llm_requests += 1;
quota.llm_tokens_used += tokens;
quota.llm_cost_usd += tokens * 0.00001; // Example: $0.01 per 1K tokens
await fdbService.set(`quota:${user_id}:monthly`, quota);
}

Webhook Handler (Stripe → Backend):

app.post('/webhooks/stripe', async (req, res) => {
const event = stripe.webhooks.constructEvent(
req.body,
req.headers['stripe-signature'],
process.env.STRIPE_WEBHOOK_SECRET
);

switch (event.type) {
case 'customer.subscription.created':
await handleSubscriptionCreated(event.data.object);
break;
case 'customer.subscription.updated':
await handleSubscriptionUpdated(event.data.object);
break;
case 'customer.subscription.deleted':
await handleSubscriptionCancelled(event.data.object);
break;
case 'invoice.payment_failed':
await handlePaymentFailed(event.data.object);
break;
}

res.json({ received: true });
});

workspace Pod Management

V4 Pattern (Manual Provisioning)

// User → Pod assignment (tracked in FDB)
pub struct workspaceAssignment {
pub id: Uuid,
pub user_id: Uuid,
pub tenant_id: Uuid,
pub pod_name: String, // "workspace-abc123"
pub namespace: String, // "user-{user_id}"
pub assigned_at: DateTime<Utc>,
pub last_active: DateTime<Utc>,
pub status: workspaceStatus,
pub resource_usage: ResourceUsage,
}

// Load balancing across pods
pub struct PodAllocation {
pub pod_name: String,
pub namespace: String,
pub total_users: u32,
pub active_users: u32,
pub cpu_allocated: f32,
pub memory_allocated_mb: u64,
pub last_health_check: DateTime<Utc>,
pub is_available: bool,
}

V4 Limitations:

  • ❌ Pods were manually created via kubectl apply
  • ❌ No automated provisioning on user signup
  • ❌ No automated cleanup of idle pods
  • ❌ No Helm charts or GitOps (ArgoCD)

Gaps for V5

What V4 Had (Keep)

  1. Multi-tenant FDB schema - Proven and scalable
  2. JWT authentication - Industry standard
  3. Stripe billing - Payment processing ready
  4. License/quota system - Usage tracking and enforcement
  5. Session management - Multi-session architecture
  6. workspace tracking - User → Pod assignments in FDB

What V4 Lacked (Need for V5)

  1. Automated Pod Provisioning - V4 required manual kubectl apply
  2. Kubernetes Operator - No controller for watching user signups
  3. Helm Charts - No templated deployments
  4. ArgoCD/GitOps - No declarative deployment pipeline
  5. Auto-Scaling - Pods not auto-scaled based on load
  6. Idle Pod Cleanup - No automated termination of idle workspaces
  7. RBAC Automation - ServiceAccounts/Roles created manually
  8. PVC Provisioning - Persistent volumes not auto-created
  9. Blue-Green Deploys - No zero-downtime rollout strategy

Recommendations for V5

1. Automated Pod Provisioning System

Architecture:

User Registration → Backend API → Provisioning Controller → Kubernetes API

Creates: Namespace + RBAC + PVC + Pod

Stores: workspaceAssignment in FDB

Returns: Pod URL to user

Provisioning Controller (Rust Kubernetes Operator):

pub struct ProvisioningController {
k8s_client: Client,
fdb_client: Database,
}

impl ProvisioningController {
pub async fn provision_workspace(&self, user_id: &str, user_email: &str) -> Result<workspaceAssignment> {
let ns_name = format!("user-{}", user_id);

// 1. Create namespace
self.create_namespace(&ns_name).await?;

// 2. Create RBAC (ServiceAccount, Role, RoleBinding)
self.create_rbac(&ns_name, user_email).await?;

// 3. Create PVC (10GB default)
self.create_pvc(&ns_name, "workspace-pvc", "10Gi").await?;

// 4. Create workspace pod (theia + Sidecar)
let pod_name = self.create_workspace_pod(&ns_name, user_id).await?;

// 5. Wait for pod ready
self.wait_for_pod_ready(&ns_name, &pod_name, Duration::from_secs(120)).await?;

// 6. Save assignment to FDB
let assignment = workspaceAssignment {
id: Uuid::new_v4(),
user_id: Uuid::parse_str(user_id)?,
tenant_id: self.get_user_tenant(user_id).await?,
pod_name: pod_name.clone(),
namespace: ns_name.clone(),
assigned_at: Utc::now(),
last_active: Utc::now(),
status: workspaceStatus::Active,
resource_usage: ResourceUsage::default(),
};

self.save_assignment(&assignment).await?;

Ok(assignment)
}
}

2. Helm Charts for Deployment

Chart Structure:

helm/
├── Chart.yaml
├── values.yaml
├── values-prod.yaml
├── values-staging.yaml
└── templates/
├── backend-deployment.yaml
├── backend-service.yaml
├── workspace-pod-template.yaml # Template for user pods
├── foundationdb-statefulset.yaml
├── ingress.yaml
├── secrets.yaml
├── rbac.yaml
└── provisioning-controller.yaml

Install Commands:

# Deploy to staging
helm upgrade --install coditect-staging ./helm \
-f ./helm/values-staging.yaml \
--namespace coditect-staging \
--create-namespace

# Deploy to production
helm upgrade --install coditect-prod ./helm \
-f ./helm/values-prod.yaml \
--namespace coditect-app \
--create-namespace

3. ArgoCD GitOps Pipeline

Workflow:

1. Developer commits to main branch

2. Cloud Build triggers:
- Build Docker images (backend, theia, sidecar)
- Push to GCR
- Update Helm values.yaml with new image tags

3. Commit new values.yaml to Git

4. ArgoCD watches Git repo

5. ArgoCD syncs changes to GKE cluster

6. Rolling update (zero-downtime)

7. Health checks pass → Deployment complete

ArgoCD Application:

# argocd/coditect-prod.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coditect-prod
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/coditect-ai/Coditect-v5-multiple-llm-IDE
targetRevision: main
path: helm
helm:
valueFiles:
- values-prod.yaml
destination:
server: https://kubernetes.default.svc
namespace: coditect-app
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true

4. Idle Pod Cleanup Job

CronJob (runs every hour):

apiVersion: batch/v1
kind: CronJob
metadata:
name: idle-pod-cleanup
spec:
schedule: "0 * * * *" # Every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/pod-cleanup:latest
env:
- name: IDLE_THRESHOLD_HOURS
value: "2" # Terminate pods idle for 2+ hours

Cleanup Logic:

async fn cleanup_idle_pods() -> Result<()> {
let idle_threshold = Duration::hours(2);
let now = Utc::now();

// Query FDB for all workspace assignments
let assignments: Vec<workspaceAssignment> = fdb_client
.scan("workspaces/")
.await?;

for assignment in assignments {
if assignment.status == workspaceStatus::Active {
let idle_duration = now - assignment.last_active;

if idle_duration > idle_threshold {
// Mark as idle
assignment.status = workspaceStatus::Idle;
fdb_client.update_assignment(&assignment).await?;

// Delete pod
k8s_client.delete_pod(&assignment.namespace, &assignment.pod_name).await?;

println!("Terminated idle pod: {}/{}", assignment.namespace, assignment.pod_name);
}
}
}

Ok(())
}

5. CI/CD Pipeline (Cloud Build)

cloudbuild-v5.yaml:

steps:
# Build all images in parallel
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/backend-api:$SHORT_SHA', './backend']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/theia-ide:$SHORT_SHA', './theia-app']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/ws-sidecar:$SHORT_SHA', './websocket-sidecar']

# Push all images
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/backend-api:$SHORT_SHA']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/theia-ide:$SHORT_SHA']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/ws-sidecar:$SHORT_SHA']

# Update Helm values with new image tags
- name: 'gcr.io/cloud-builders/git'
args:
- 'config'
- 'user.email'
- 'cloud-build@coditect.ai'
- name: 'gcr.io/cloud-builders/git'
args:
- 'config'
- 'user.name'
- 'Cloud Build'
- name: 'gcr.io/cloud-builders/git'
entrypoint: 'bash'
args:
- '-c'
- |
sed -i "s/backend-api:.*$/backend-api:$SHORT_SHA/" helm/values-prod.yaml
sed -i "s/theia-ide:.*$/theia-ide:$SHORT_SHA/" helm/values-prod.yaml
sed -i "s/ws-sidecar:.*$/ws-sidecar:$SHORT_SHA/" helm/values-prod.yaml
git add helm/values-prod.yaml
git commit -m "chore: Update production images to $SHORT_SHA"
git push origin main

timeout: '3600s'
options:
machineType: 'N1_HIGHCPU_8'

6. Blue-Green Deployment Strategy

Concept: Run two identical production environments (Blue = current, Green = new)

Steps:

  1. Deploy new version to "Green" environment
  2. Run health checks and smoke tests on Green
  3. If tests pass, switch Ingress to point to Green
  4. Monitor Green for 1 hour
  5. If stable, decommission Blue
  6. If issues, rollback Ingress to Blue

Implementation with ArgoCD:

# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: coditect-api-blue
spec:
replicas: 3
selector:
matchLabels:
app: coditect-api
version: blue
template:
metadata:
labels:
app: coditect-api
version: blue
spec:
containers:
- name: api
image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/backend-api:v1.0.0

---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: coditect-api-green
spec:
replicas: 3
selector:
matchLabels:
app: coditect-api
version: green
template:
metadata:
labels:
app: coditect-api
version: green
spec:
containers:
- name: api
image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/backend-api:v1.1.0

---
# Service (switch between blue/green by changing selector)
apiVersion: v1
kind: Service
metadata:
name: coditect-api
spec:
selector:
app: coditect-api
version: blue # Change to 'green' for cutover
ports:
- port: 8000
targetPort: 8000

Implementation Priority

Phase 1: MVP (Current - Week 1)

  • ✅ Backend API with JWT auth (DONE)
  • ✅ FoundationDB integration (DONE)
  • 🔲 Frontend wrapper (React + theia embed)
  • 🔲 Basic session management
  • 🔲 Stripe payment integration (registration flow)

Phase 2: Automated Provisioning (Week 2)

  • 🔲 Kubernetes operator (Rust)
  • 🔲 Automated namespace creation
  • 🔲 RBAC automation
  • 🔲 PVC provisioning
  • 🔲 workspace pod deployment

Phase 3: CI/CD & GitOps (Week 3)

  • 🔲 Helm charts for all components
  • 🔲 ArgoCD setup
  • 🔲 Cloud Build pipeline
  • 🔲 Blue-green deployment

Phase 4: Production Ready (Week 4)

  • 🔲 Idle pod cleanup job
  • 🔲 Monitoring (Prometheus + Grafana)
  • 🔲 Logging (Cloud Logging)
  • 🔲 Alerting (PagerDuty integration)
  • 🔲 Beta user onboarding

Conclusion

V4 Provides Solid Foundation:

  • ✅ Multi-tenant FDB schema (proven, scalable)
  • ✅ Authentication & authorization (JWT + OAuth + RBAC)
  • ✅ Billing integration (Stripe + quotas)
  • ✅ Session management patterns
  • ✅ workspace tracking (user → pod assignments)

V5 Needs Automation:

  • ❌ Automated pod provisioning (Kubernetes operator)
  • ❌ Helm charts + ArgoCD (GitOps)
  • ❌ CI/CD pipeline (Cloud Build → ArgoCD)
  • ❌ Idle pod cleanup (CronJob)
  • ❌ Blue-green deployments (zero-downtime)

Next Steps:

  1. Build provisioning controller (Rust Kubernetes operator)
  2. Create Helm charts for all components
  3. Set up ArgoCD for GitOps
  4. Implement CI/CD pipeline (Cloud Build)
  5. Add idle pod cleanup CronJob
  6. Configure blue-green deployment strategy

Timeline: 4 weeks to production-ready MVP with full automation.


Document Status: ✅ Complete Last Updated: 2025-10-07 Next Review: After Phase 1 completion