V5 FoundationDB Schema and ADR Analysis

Date: 2025-10-07 Purpose: Comprehensive analysis of V4 database models and ADRs to inform V5 multi-tenant architecture Status: Active Reference Document

Executive Summary

This document analyzes V4's FoundationDB schema and Architecture Decision Records (ADRs) to extract proven patterns for V5's multi-tenant, multi-session, multi-llm IDE with automated pod provisioning.

Key Findings:

✅ Multi-tenant patterns are battle-tested - V4 uses tenant isolation extensively
✅ Session management exists - V4 has WebSocket session tracking with JWT auth
✅ License/subscription system ready - Stripe integration, quota tracking, usage monitoring
✅ workspace pod allocation - V4 tracks user → pod assignments
⚠️ V4 lacks automated provisioning - Pods were manually provisioned, not automated
⚠️ No Helm/ArgoCD - V4 used direct kubectl deployments

FoundationDB Key Schema
Core Data Models
ADR Summary
Multi-Tenant Architecture
Session Management
User Authentication & Authorization
License & Billing
workspace Pod Management
Gaps for V5
Recommendations for V5

FoundationDB Key Schema

V4 Key Structure (Proven Patterns)

FoundationDB Key Hierarchy (UTF-8 encoded strings)
├── users/
│   ├── {user_id}                           → User record
│   ├── by_email/{email}                    → Email → user_id index
│   └── {user_id}/
│       ├── tenants/                        → User's tenant associations
│       │   └── {tenant_id}                 → UserTenantAssociation
│       ├── session/{session_id}            → Active user session (JWT)
│       └── apikey/{key_id}                 → API key metadata
│
├── tenants/
│   ├── {tenant_id}                         → Tenant record
│   ├── by_domain/{domain}                  → Domain → tenant_id index
│   └── {tenant_id}/
│       ├── users/                          → Reverse index: tenant → users
│       │   └── {user_id}                   → User ID reference
│       ├── sessions/                       → All sessions in tenant
│       │   └── {session_id}                → Session metadata
│       └── workspaces/                     → workspace assignments
│           └── {user_id}                   → workspaceAssignment
│
├── sessions/
│   ├── {session_id}                        → UserSession record (JWT metadata)
│   └── {session_id}/
│       ├── metadata                        → Session info, timestamps
│       ├── editor-tabs/                    → Open files, positions
│       ├── llm-messages/                   → Conversation history
│       └── config/                         → Session-specific settings
│
├── workspaces/
│   └── {assignment_id}                     → workspaceAssignment (user → pod mapping)
│
├── pods/
│   └── {pod_name}/                         → PodAllocation (load balancing)
│       ├── users/                          → Active users in pod
│       └── health                          → Last health check
│
├── licenses/
│   ├── {user_id}                           → UserLicense (current license)
│   ├── history/{history_id}                → LicenseHistory (audit log)
│   └── usage/{tenant_id}/{month}           → TenantUsage (monthly tracking)
│
├── quotas/
│   └── {user_id}/{period}                  → UserQuota (daily/monthly limits)
│
├── apikeys/
│   └── {key_id}                            → APIKey record
│
├── files/
│   └── {file_path}/
│       ├── content                         → File contents
│       ├── metadata                        → Language, encoding, timestamps
│       └── versions/                       → Version history
│
├── settings/
│   ├── global/                             → Global preferences
│   └── workspace/                          → workspace settings
│
├── models/
│   └── {model_id}/                         → llm model configurations
│
└── dspy_prompts/                           → DSPy optimization cache
    └── {tenant_id}/{user_id}/{task_type}/{signature} → DspyPromptCache

Key Design Principles (from ADR-004):

Hierarchical Keys: Slash-separated paths for readability and range queries
Secondary Indexes: Dedicated keys for lookups (email, domain, OAuth)
UTF-8 Encoding: Human-readable keys for debugging
Bidirectional Relationships: Both users/{id}/tenants AND tenants/{id}/users
Atomic Transactions: All related updates in single FDB transaction

Core Data Models

1. User & Tenant Models

// Core user record (multi-tenant capable)
pub struct User {
    pub user_id: Uuid,
    pub email: String,
    pub first_name: String,
    pub last_name: String,
    pub is_active: bool,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
    pub password_hash: String,           // Argon2
    pub primary_tenant_id: Uuid,         // Self-tenant (deterministic UUID v5)
}

// Tenant (organization/workspace)
pub struct Tenant {
    pub tenant_id: Uuid,
    pub name: String,
    pub domain: Option<String>,          // Custom domain for SSO
    pub is_active: bool,
    pub created_at: DateTime<Utc>,
    pub updated_at: DateTime<Utc>,
}

// User-Tenant association with roles
pub struct UserTenantAssociation {
    pub user_id: Uuid,
    pub tenant_id: Uuid,
    pub role: String,                    // "owner", "admin", "member", "viewer"
    pub company: Option<String>,
    pub joined_at: DateTime<Utc>,
    pub invited_by: Option<Uuid>,
    pub is_active: bool,
}

Self-Tenant Pattern (Critical for Multi-Tenancy):

Every user gets a personal "self-tenant" on registration
Self-tenant ID is deterministic: UUID v5(namespace, user_id)
User is "owner" of their self-tenant
Users can join multiple organization tenants with different roles
All data is scoped to tenant_id → enables perfect isolation

2. Session Models

// From ADR-007 (Multi-Session Architecture)
interface Session {
  id: string;                    // Unique session ID
  name: string;                  // User-defined name
  icon?: string;                 // Optional icon
  createdAt: Date;
  updatedAt: Date;

  // State references
  editorState: {
    tabs: editorTab[];
    activeTabId: string | null;
    scrollPosition: number;
  };

  llmState: {
    messages: Message[];
    primaryModel: string | null;
    secondaryModel: string | null;
    mode: WorkflowMode;           // 'single' | 'parallel' | 'sequential' | 'consensus'
    config: llmConfig;
  };

  fileState: {
    workspaceRoot: string;
    expandedFolders: string[];
    selectedFile: string | null;
  };

  terminalState: {
    cwd: string;
    history: string[];
    buffer: string;
  };

  // Metadata
  isDirty: boolean;              // Unsaved changes
  isActive: boolean;             // Currently active
  order: number;                 // Tab order
}

// Backend session tracking (JWT metadata)
pub struct UserSession {
    pub session_id: String,       // UUID
    pub user_id: String,
    pub access_token: String,     // JWT (15 min)
    pub refresh_token: String,    // JWT (7 days)
    pub ip_address: String,
    pub user_agent: String,
    pub created_at: i64,
    pub expires_at: i64,
    pub last_activity_at: i64,
    pub is_active: bool,
}

Session Management Pattern:

Frontend Sessions: Browser tabs with separate editor/llm/terminal state (Zustand store)
Backend Sessions: JWT-tracked authentication sessions (FoundationDB)
Auto-save: Every 500ms after changes (debounced)
Persistence: Sessions survive browser restarts (loaded from FDB)

3. workspace Pod Models

// User → Pod assignment tracking
pub struct workspaceAssignment {
    pub id: Uuid,
    pub user_id: Uuid,
    pub tenant_id: Uuid,
    pub pod_name: String,            // e.g., "workspace-abc123"
    pub namespace: String,            // e.g., "user-{user_id}"
    pub assigned_at: DateTime<Utc>,
    pub last_active: DateTime<Utc>,
    pub status: workspaceStatus,      // Active | Idle | Suspended | Terminating | Failed
    pub resource_usage: ResourceUsage,
}

pub struct ResourceUsage {
    pub cpu_cores: f32,
    pub memory_mb: u64,
    pub storage_gb: f32,
    pub network_gb: f32,
}

// Pod allocation for load balancing
pub struct PodAllocation {
    pub pod_name: String,
    pub namespace: String,
    pub total_users: u32,
    pub active_users: u32,
    pub cpu_allocated: f32,
    pub memory_allocated_mb: u64,
    pub last_health_check: DateTime<Utc>,
    pub is_available: bool,
}

// workspace activity for usage tracking
pub struct workspaceActivity {
    pub id: Uuid,
    pub workspace_id: Uuid,
    pub user_id: Uuid,
    pub activity_type: ActivityType,  // terminalSession | FileOperation | CodeExecution | BuildProcess | Idle
    pub timestamp: DateTime<Utc>,
    pub duration_seconds: Option<u64>,
    pub metadata: Option<serde_json::Value>,
}

workspace Pattern (V4 Manual, V5 Automated):

Each user gets a dedicated Kubernetes pod (theia IDE container)
Pod is provisioned in per-user namespace
FDB tracks user → pod assignment
Pod allocation tracks load balancing across cluster
Activity logging for billing and idle detection

4. License & Billing Models

// User's current license
pub struct UserLicense {
    pub user_id: Uuid,
    pub license_id: Uuid,
    pub assigned_at: DateTime<Utc>,
    pub previous_license_id: Option<Uuid>,
}

// License change audit trail
pub struct LicenseHistory {
    pub id: Uuid,
    pub user_id: Uuid,
    pub license_id: Uuid,
    pub action: LicenseAction,       // Created | Upgraded | Downgraded | Renewed | Expired | Cancelled | PaymentFailed
    pub reason: String,
    pub metadata: serde_json::Value,
    pub created_at: DateTime<Utc>,
}

// Monthly usage tracking per tenant
pub struct TenantUsage {
    pub tenant_id: Uuid,
    pub month: String,               // "2025-10"
    pub projects_count: i32,
    pub agents_count: i32,
    pub storage_gb: f32,
    pub workspace_hours: f32,        // Total pod runtime
    pub api_calls: i64,
    pub last_updated: DateTime<Utc>,
}

// License enforcement
pub struct TenantLicensePolicy {
    pub tenant_id: Uuid,
    pub owner_user_id: Uuid,
    pub license_type: LicenseType,   // Free | Starter | Pro | Enterprise
    pub enforcement_mode: EnforcementMode,  // Strict | Warning | GracePeriod
    pub override_limits: Option<serde_json::Value>,
    pub updated_at: DateTime<Utc>,
}

// Real-time usage events
pub struct UsageEvent {
    pub id: Uuid,
    pub tenant_id: Uuid,
    pub user_id: Uuid,
    pub resource_type: ResourceType,  // Projects | Agents | Storage | workspaceHours
    pub operation: UsageOperation,
    pub amount: f32,
    pub timestamp: DateTime<Utc>,
    pub metadata: Option<serde_json::Value>,
}

Billing Integration (from ADR-021):

Stripe for payment processing
License tiers: Free, Starter ($29/mo), Pro ($99/mo), Enterprise (custom)
Quotas enforced via middleware (before llm calls, file operations)
Usage tracked in real-time → synced to Stripe monthly
Grace period enforcement for upgrades

5. Authentication Models

// From ADR-021 (User Management & Authentication)

interface User {
  user_id: string;              // UUID
  email: string;                // Unique
  password_hash?: string;       // bcrypt (null for OAuth-only users)
  name: string;
  avatar_url?: string;
  role: 'admin' | 'developer' | 'viewer';
  oauth_providers: Array<{
    provider: 'google' | 'github';
    provider_user_id: string;
    access_token?: string;      // Encrypted
  }>;
  created_at: number;           // Timestamp
  updated_at: number;
  email_verified: boolean;
  is_active: boolean;
  metadata: Record<string, any>;
}

interface APIKey {
  key_id: string;               // UUID
  user_id: string;
  key_hash: string;             // bcrypt of actual key
  key_prefix: string;           // First 8 chars for display (e.g., "ak_12ab...")
  name: string;                 // User-defined name
  permissions: string[];        // ['read:files', 'write:files', 'llm:chat']
  created_at: number;
  expires_at?: number;
  last_used_at?: number;
  is_active: boolean;
}

interface UserQuota {
  user_id: string;
  period: 'daily' | 'monthly';
  period_start: number;

  // Usage
  llm_requests: number;
  llm_tokens_used: number;
  llm_cost_usd: number;
  storage_bytes: number;
  api_requests: number;

  // Limits
  llm_requests_limit: number;
  llm_tokens_limit: number;
  llm_cost_limit_usd: number;
  storage_bytes_limit: number;
  api_requests_limit: number;
}

Auth Pattern (JWT + OAuth + API Keys):

JWT: Stateless authentication (15-min access token, 7-day refresh token)
OAuth 2.0: Google + GitHub social login (reduces password breach risk)
API Keys: For programmatic access (CLI, CI/CD)
RBAC: Role-based permissions (admin/developer/viewer)
Quotas: Rate limiting and usage enforcement

ADR Summary

Critical ADRs for V5

ADR	Decision	Impact on V5	Status
ADR-004	Use FoundationDB for Persistence	✅ KEEP - Multi-tenant key schema proven	Reuse patterns
ADR-007	Multi-Session Architecture	✅ KEEP - Browser tab sessions with FDB persistence	Implement
ADR-014	Use Eclipse theia as Foundation	✅ KEEP - VS Code-like IDE in browser	Active
ADR-017	WebSocket Backend Architecture	✅ KEEP - Sidecar pattern for theia ↔ Backend	Implement
ADR-020	GCP Deployment	✅ KEEP - GKE + Cloud Run infrastructure	Active
ADR-021	User Management & Authentication	✅ KEEP - JWT + OAuth + Stripe billing	Implement
ADR-022	Audit Logging Architecture	✅ KEEP - Compliance tracking (SOC2/GDPR)	Implement

ADR-004: FoundationDB (Detailed)

Why FoundationDB over alternatives:

✅ ACID Transactions: Serializable isolation (critical for multi-tenant)
✅ Sub-10ms Latency: Fast session/file operations
✅ Horizontal Scaling: Millions of ops/sec
✅ Multi-Model: Key-value core + document/graph layers
✅ Watch API: Real-time updates for collaborative features
✅ Fault Tolerance: Self-healing, automatic replication

Rejected Alternatives:

❌ PostgreSQL: Slower, less scalable for key-value workloads
❌ MongoDB: Eventual consistency too weak
❌ Redis: Not durable enough for primary storage
❌ IndexedDB: Browser-only, no multi-client sync

Data Model:

/az1ai-ide/
├── sessions/{session-id}/
│   ├── metadata       # Session info, timestamps
│   ├── editor-tabs/   # Open files, positions
│   ├── llm-messages/  # Conversation history
│   └── config/        # Session-specific settings
├── files/{file-path}/
│   ├── content        # File contents
│   ├── metadata       # Language, encoding, timestamps
│   └── versions/      # Version history
├── settings/
│   ├── global/        # Global preferences
│   └── workspace/     # workspace settings
└── models/{model-id}/ # Model configurations

ADR-007: Multi-Session Architecture

Decision: Tab-based sessions (like browser tabs)

Benefits:

✅ Work on multiple projects simultaneously
✅ Separate llm conversations per session
✅ Independent editor/terminal contexts
✅ Persistent across browser restarts
✅ Keyboard shortcuts (Cmd+1-9 for session switching)

Session Lifecycle:

Create: Generate session ID, initialize state
Auto-save: Debounced saves every 500ms
Switch: Save current, load target session
Close: Prompt for unsaved changes, delete from FDB

Session Isolation:

Each session has isolated editor tabs, llm messages, terminal state
Sessions saved to FDB with prefix: sessions/{session-id}/
OPFS cache for offline mode (FDB is source of truth)

ADR-014: Eclipse theia Foundation

Decision: Use theia framework (not build IDE from scratch)

Why theia:

✅ EPL 2.0 License: Free commercial use (no license fees)
✅ VS Code Compatible: Runs VS Code extensions
✅ Monaco editor: Same editor as VS Code
✅ Saves 6-12 months: Don't rebuild file explorer, terminal, settings
✅ Dependency Injection: Clean architecture with InversifyJS
✅ Multi-Language Support: TypeScript, Python, Rust out-of-box

theia vs VS Code:

Feature	VS Code	theia
License	MIT	EPL 2.0
Browser Support	❌	✅
Extension API	Full	Compatible
Customization	Limited	Full (framework)
Hosting	Desktop only	Cloud-native

ADR-017: WebSocket Backend Architecture

Decision: Sidecar pattern for theia ↔ Backend communication

Architecture:

┌─────────────────────────────────────────┐
│          workspace Pod                  │
│  ┌──────────────┐  ┌─────────────────┐ │
│  │ theia IDE    │  │ WebSocket       │ │
│  │ (Port 3000)  │←→│ Sidecar         │ │
│  │              │  │ (Port 8080)     │ │
│  └──────────────┘  └─────────────────┘ │
│         ↓                    ↓          │
│    localhost:3000      localhost:8080  │
└─────────────────────────────────────────┘
                   ↓
            ┌─────────────┐
            │ Auth Backend│
            │ (JWT verify)│
            └─────────────┘
                   ↓
            ┌─────────────┐
            │ FoundationDB│
            └─────────────┘

Why Sidecar:

✅ Localhost Communication: No network latency
✅ Security: WebSocket gateway validates JWT before forwarding
✅ Simplicity: theia doesn't need FDB client
✅ Isolation: Each pod has dedicated sidecar

ADR-020: GCP Deployment

Infrastructure (Already deployed):

✅ GKE Cluster: codi-poc-e2-cluster (us-central1-a)
✅ FoundationDB: 3-node StatefulSet (10.56.x.x:4500)
✅ Container Registry: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect
✅ Domain: coditect.ai (34.8.51.57, Google-managed SSL)
✅ CI/CD: Cloud Build pipelines

Deployment Strategy:

Backend API: Cloud Run (stateless, auto-scaling)
workspace Pods: GKE (stateful, per-user pods)
FoundationDB: GKE StatefulSet (persistent storage)
Ingress: NGINX (load balancer + SSL termination)

ADR-021: User Management & Authentication

Authentication Methods:

Email/Password: Argon2 password hashing
Google OAuth: Social login (reduces friction)
GitHub OAuth: Developer-friendly auth
API Keys: For CLI/API access (bcrypt hashed)

JWT Strategy:

Access Token: 15 minutes (short-lived for security)
Refresh Token: 7 days (long-lived, revocable)
Claims: user_id, email, role, tenant_id

RBAC Roles:

const ROLES = {
  admin: {
    permissions: [{ resource: '*', actions: ['*'] }]  // Full access
  },
  developer: {
    permissions: [
      { resource: 'files', actions: ['read', 'write', 'delete'] },
      { resource: 'sessions', actions: ['read', 'write', 'delete'] },
      { resource: 'llm', actions: ['read', 'write'] },
      { resource: 'agents', actions: ['read', 'write'] },
      { resource: 'users', actions: ['read'] },  // Own profile only
    ]
  },
  viewer: {
    permissions: [
      { resource: 'files', actions: ['read'] },
      { resource: 'sessions', actions: ['read'] },
      { resource: 'llm', actions: ['read'] },
    ]
  }
};

Quota System:

interface UserQuota {
  llm_requests_limit: 10000;      // 10K requests/month
  llm_tokens_limit: 10000000;     // 10M tokens/month
  llm_cost_limit_usd: 100;        // $100/month
  storage_bytes_limit: 10 * 1024 * 1024 * 1024;  // 10GB
  api_requests_limit: 100000;     // 100K API requests/month
}

Multi-Tenant Architecture

Tenant Isolation Strategy

Key Principle: Every data access is scoped by tenant_id

FDB Key Prefixes:

tenant_id/resource_type/resource_id/...

Examples:
- 123e4567/sessions/abc-session-id/metadata
- 123e4567/files/src/main.ts/content
- 123e4567/workspaces/user-789/assignment

Security Enforcement:

// Middleware extracts tenant_id from JWT
pub async fn require_tenant_access(
    req: &Request,
    required_tenant_id: &Uuid,
) -> Result<(), AuthError> {
    let user = req.user()?;

    // Check if user belongs to tenant
    let association = UserTenantRepository::get_association(
        &user.user_id,
        required_tenant_id
    ).await?;

    if !association.is_active {
        return Err(AuthError::Forbidden);
    }

    Ok(())
}

Benefits:

✅ Perfect Isolation: Impossible to access other tenant's data
✅ Range Queries: Fetch all sessions/files for tenant
✅ Efficient: Single FDB transaction per operation
✅ Auditable: All access logged with tenant context

Session Management

Session Types

1. Browser UI Sessions (ADR-007):

Tab-based workspaces in theia frontend
Persist to FDB: sessions/{session-id}/
Auto-save every 500ms (debounced)
Keyboard shortcuts: Cmd+1-9 for switching

2. Backend Auth Sessions (ADR-021):

JWT-tracked authentication sessions
Stored in FDB: user:sessions/{session-id}
Track: IP, user agent, last activity
Expire after 7 days or forced logout

3. WebSocket Sessions (ADR-017):

Real-time connection to backend
Validated with JWT on connect
Heartbeat every 30 seconds
Auto-reconnect on disconnect

Session Lifecycle

// Create session
const createSession = async (name?: string) => {
  const session: Session = {
    id: nanoid(),
    name: name || `Session ${sessions.length + 1}`,
    createdAt: new Date(),
    updatedAt: new Date(),
    editorState: { tabs: [], activeTabId: null, scrollPosition: 0 },
    llmState: {
      messages: [],
      primaryModel: null,
      secondaryModel: null,
      mode: 'single',
      config: defaultllmConfig
    },
    fileState: {
      workspaceRoot: '/',
      expandedFolders: [],
      selectedFile: null
    },
    terminalState: {
      cwd: '~',
      history: [],
      buffer: ''
    },
    isDirty: false,
    isActive: true,
    order: sessions.length
  };

  await fdbService.saveSession(session);
  return session;
};

// Auto-save on changes
useEffect(() => {
  if (!activeSession) return;

  const saveTimeout = setTimeout(() => {
    fdbService.saveSession(activeSession);
  }, 500);  // Debounced 500ms

  return () => clearTimeout(saveTimeout);
}, [activeSession]);

// Load on mount
useEffect(() => {
  const restoreLastSession = async () => {
    const lastSessionId = localStorage.getItem('lastActiveSession');
    if (lastSessionId) {
      const session = await fdbService.loadSession(lastSessionId);
      setActiveSession(session);
    }
  };

  restoreLastSession();
}, []);

User Authentication & Authorization

JWT Token Flow

1. User Login (Email/Password or OAuth)
   ↓
2. Backend Validates Credentials
   ↓
3. Generate Tokens:
   - Access Token (15 min, contains: user_id, email, role, tenant_id)
   - Refresh Token (7 days, contains: user_id, type='refresh')
   ↓
4. Store Session in FDB:
   - user:sessions/{session_id}
   - Contains: IP, user agent, tokens, timestamps
   ↓
5. Return Tokens to Frontend
   ↓
6. Frontend Stores in Memory (NOT localStorage for security)
   ↓
7. All API Requests: Authorization: Bearer <access_token>
   ↓
8. Backend Middleware Validates JWT
   ↓
9. If Expired: Use Refresh Token to Get New Access Token
   ↓
10. If Refresh Expired: Force Re-login

Permission Checking

// Middleware checks permissions before allowing operation
async function checkPermission(
  user: User,
  resource: string,
  action: string
): Promise<boolean> {
  const role = ROLES[user.role];
  if (!role) return false;

  for (const perm of role.permissions) {
    if (perm.resource === '*' || perm.resource === resource) {
      if (perm.actions.includes('*') || perm.actions.includes(action)) {
        return true;
      }
    }
  }

  return false;
}

// Usage in route handler
app.post('/api/files/save', authenticate, async (req, res) => {
  const hasPermission = await checkPermission(req.user, 'files', 'write');
  if (!hasPermission) {
    return res.status(403).json({ error: 'Permission denied' });
  }

  // ... save file
});

License & Billing

Stripe Integration

License Tiers:

const LICENSE_TIERS = {
  free: {
    price: 0,
    limits: {
      llm_requests: 100,       // 100/month
      storage_gb: 1,            // 1GB
      workspace_hours: 10,      // 10 hours/month
      agents: 1,
      projects: 1,
    }
  },
  starter: {
    price: 29,                  // $29/month
    stripe_price_id: 'price_abc123',
    limits: {
      llm_requests: 10000,
      storage_gb: 10,
      workspace_hours: 100,
      agents: 5,
      projects: 10,
    }
  },
  pro: {
    price: 99,                  // $99/month
    stripe_price_id: 'price_def456',
    limits: {
      llm_requests: 100000,
      storage_gb: 100,
      workspace_hours: 500,
      agents: -1,               // Unlimited
      projects: -1,             // Unlimited
    }
  },
  enterprise: {
    price: null,                // Custom pricing
    limits: {
      llm_requests: -1,
      storage_gb: -1,
      workspace_hours: -1,
      agents: -1,
      projects: -1,
    }
  }
};

Quota Enforcement:

// Before llm call
async function checkllmQuota(user_id: string): Promise<boolean> {
  const quota = await fdbService.get(`quota:${user_id}:monthly`);
  return quota.llm_requests < quota.llm_requests_limit;
}

// After llm call
async function incrementllmUsage(user_id: string, tokens: number): Promise<void> {
  const quota = await fdbService.get(`quota:${user_id}:monthly`);
  quota.llm_requests += 1;
  quota.llm_tokens_used += tokens;
  quota.llm_cost_usd += tokens * 0.00001;  // Example: $0.01 per 1K tokens
  await fdbService.set(`quota:${user_id}:monthly`, quota);
}

Webhook Handler (Stripe → Backend):

app.post('/webhooks/stripe', async (req, res) => {
  const event = stripe.webhooks.constructEvent(
    req.body,
    req.headers['stripe-signature'],
    process.env.STRIPE_WEBHOOK_SECRET
  );

  switch (event.type) {
    case 'customer.subscription.created':
      await handleSubscriptionCreated(event.data.object);
      break;
    case 'customer.subscription.updated':
      await handleSubscriptionUpdated(event.data.object);
      break;
    case 'customer.subscription.deleted':
      await handleSubscriptionCancelled(event.data.object);
      break;
    case 'invoice.payment_failed':
      await handlePaymentFailed(event.data.object);
      break;
  }

  res.json({ received: true });
});

workspace Pod Management

V4 Pattern (Manual Provisioning)

// User → Pod assignment (tracked in FDB)
pub struct workspaceAssignment {
    pub id: Uuid,
    pub user_id: Uuid,
    pub tenant_id: Uuid,
    pub pod_name: String,            // "workspace-abc123"
    pub namespace: String,            // "user-{user_id}"
    pub assigned_at: DateTime<Utc>,
    pub last_active: DateTime<Utc>,
    pub status: workspaceStatus,
    pub resource_usage: ResourceUsage,
}

// Load balancing across pods
pub struct PodAllocation {
    pub pod_name: String,
    pub namespace: String,
    pub total_users: u32,
    pub active_users: u32,
    pub cpu_allocated: f32,
    pub memory_allocated_mb: u64,
    pub last_health_check: DateTime<Utc>,
    pub is_available: bool,
}

V4 Limitations:

❌ Pods were manually created via kubectl apply
❌ No automated provisioning on user signup
❌ No automated cleanup of idle pods
❌ No Helm charts or GitOps (ArgoCD)

Gaps for V5

What V4 Had (Keep)

✅ Multi-tenant FDB schema - Proven and scalable
✅ JWT authentication - Industry standard
✅ Stripe billing - Payment processing ready
✅ License/quota system - Usage tracking and enforcement
✅ Session management - Multi-session architecture
✅ workspace tracking - User → Pod assignments in FDB

What V4 Lacked (Need for V5)

❌ Automated Pod Provisioning - V4 required manual kubectl apply
❌ Kubernetes Operator - No controller for watching user signups
❌ Helm Charts - No templated deployments
❌ ArgoCD/GitOps - No declarative deployment pipeline
❌ Auto-Scaling - Pods not auto-scaled based on load
❌ Idle Pod Cleanup - No automated termination of idle workspaces
❌ RBAC Automation - ServiceAccounts/Roles created manually
❌ PVC Provisioning - Persistent volumes not auto-created
❌ Blue-Green Deploys - No zero-downtime rollout strategy

Recommendations for V5

1. Automated Pod Provisioning System

Architecture:

User Registration → Backend API → Provisioning Controller → Kubernetes API
                                        ↓
                        Creates: Namespace + RBAC + PVC + Pod
                                        ↓
                        Stores: workspaceAssignment in FDB
                                        ↓
                        Returns: Pod URL to user

Provisioning Controller (Rust Kubernetes Operator):

pub struct ProvisioningController {
    k8s_client: Client,
    fdb_client: Database,
}

impl ProvisioningController {
    pub async fn provision_workspace(&self, user_id: &str, user_email: &str) -> Result<workspaceAssignment> {
        let ns_name = format!("user-{}", user_id);

        // 1. Create namespace
        self.create_namespace(&ns_name).await?;

        // 2. Create RBAC (ServiceAccount, Role, RoleBinding)
        self.create_rbac(&ns_name, user_email).await?;

        // 3. Create PVC (10GB default)
        self.create_pvc(&ns_name, "workspace-pvc", "10Gi").await?;

        // 4. Create workspace pod (theia + Sidecar)
        let pod_name = self.create_workspace_pod(&ns_name, user_id).await?;

        // 5. Wait for pod ready
        self.wait_for_pod_ready(&ns_name, &pod_name, Duration::from_secs(120)).await?;

        // 6. Save assignment to FDB
        let assignment = workspaceAssignment {
            id: Uuid::new_v4(),
            user_id: Uuid::parse_str(user_id)?,
            tenant_id: self.get_user_tenant(user_id).await?,
            pod_name: pod_name.clone(),
            namespace: ns_name.clone(),
            assigned_at: Utc::now(),
            last_active: Utc::now(),
            status: workspaceStatus::Active,
            resource_usage: ResourceUsage::default(),
        };

        self.save_assignment(&assignment).await?;

        Ok(assignment)
    }
}

2. Helm Charts for Deployment

Chart Structure:

helm/
├── Chart.yaml
├── values.yaml
├── values-prod.yaml
├── values-staging.yaml
└── templates/
    ├── backend-deployment.yaml
    ├── backend-service.yaml
    ├── workspace-pod-template.yaml     # Template for user pods
    ├── foundationdb-statefulset.yaml
    ├── ingress.yaml
    ├── secrets.yaml
    ├── rbac.yaml
    └── provisioning-controller.yaml

Install Commands:

# Deploy to staging
helm upgrade --install coditect-staging ./helm \
  -f ./helm/values-staging.yaml \
  --namespace coditect-staging \
  --create-namespace

# Deploy to production
helm upgrade --install coditect-prod ./helm \
  -f ./helm/values-prod.yaml \
  --namespace coditect-app \
  --create-namespace

3. ArgoCD GitOps Pipeline

Workflow:

1. Developer commits to main branch
   ↓
2. Cloud Build triggers:
   - Build Docker images (backend, theia, sidecar)
   - Push to GCR
   - Update Helm values.yaml with new image tags
   ↓
3. Commit new values.yaml to Git
   ↓
4. ArgoCD watches Git repo
   ↓
5. ArgoCD syncs changes to GKE cluster
   ↓
6. Rolling update (zero-downtime)
   ↓
7. Health checks pass → Deployment complete

ArgoCD Application:

# argocd/coditect-prod.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: coditect-prod
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/coditect-ai/Coditect-v5-multiple-llm-IDE
    targetRevision: main
    path: helm
    helm:
      valueFiles:
        - values-prod.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: coditect-app
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

4. Idle Pod Cleanup Job

CronJob (runs every hour):

apiVersion: batch/v1
kind: CronJob
metadata:
  name: idle-pod-cleanup
spec:
  schedule: "0 * * * *"  # Every hour
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleanup
            image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/pod-cleanup:latest
            env:
            - name: IDLE_THRESHOLD_HOURS
              value: "2"  # Terminate pods idle for 2+ hours

Cleanup Logic:

async fn cleanup_idle_pods() -> Result<()> {
    let idle_threshold = Duration::hours(2);
    let now = Utc::now();

    // Query FDB for all workspace assignments
    let assignments: Vec<workspaceAssignment> = fdb_client
        .scan("workspaces/")
        .await?;

    for assignment in assignments {
        if assignment.status == workspaceStatus::Active {
            let idle_duration = now - assignment.last_active;

            if idle_duration > idle_threshold {
                // Mark as idle
                assignment.status = workspaceStatus::Idle;
                fdb_client.update_assignment(&assignment).await?;

                // Delete pod
                k8s_client.delete_pod(&assignment.namespace, &assignment.pod_name).await?;

                println!("Terminated idle pod: {}/{}", assignment.namespace, assignment.pod_name);
            }
        }
    }

    Ok(())
}

5. CI/CD Pipeline (Cloud Build)

cloudbuild-v5.yaml:

steps:
  # Build all images in parallel
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/backend-api:$SHORT_SHA', './backend']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/theia-ide:$SHORT_SHA', './theia-app']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/ws-sidecar:$SHORT_SHA', './websocket-sidecar']

  # Push all images
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/backend-api:$SHORT_SHA']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/theia-ide:$SHORT_SHA']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/ws-sidecar:$SHORT_SHA']

  # Update Helm values with new image tags
  - name: 'gcr.io/cloud-builders/git'
    args:
      - 'config'
      - 'user.email'
      - 'cloud-build@coditect.ai'
  - name: 'gcr.io/cloud-builders/git'
    args:
      - 'config'
      - 'user.name'
      - 'Cloud Build'
  - name: 'gcr.io/cloud-builders/git'
    entrypoint: 'bash'
    args:
      - '-c'
      - |
        sed -i "s/backend-api:.*$/backend-api:$SHORT_SHA/" helm/values-prod.yaml
        sed -i "s/theia-ide:.*$/theia-ide:$SHORT_SHA/" helm/values-prod.yaml
        sed -i "s/ws-sidecar:.*$/ws-sidecar:$SHORT_SHA/" helm/values-prod.yaml
        git add helm/values-prod.yaml
        git commit -m "chore: Update production images to $SHORT_SHA"
        git push origin main

timeout: '3600s'
options:
  machineType: 'N1_HIGHCPU_8'

6. Blue-Green Deployment Strategy

Concept: Run two identical production environments (Blue = current, Green = new)

Steps:

Deploy new version to "Green" environment
Run health checks and smoke tests on Green
If tests pass, switch Ingress to point to Green
Monitor Green for 1 hour
If stable, decommission Blue
If issues, rollback Ingress to Blue

Implementation with ArgoCD:

# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coditect-api-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: coditect-api
      version: blue
  template:
    metadata:
      labels:
        app: coditect-api
        version: blue
    spec:
      containers:
      - name: api
        image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/backend-api:v1.0.0

---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: coditect-api-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: coditect-api
      version: green
  template:
    metadata:
      labels:
        app: coditect-api
        version: green
    spec:
      containers:
      - name: api
        image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/backend-api:v1.1.0

---
# Service (switch between blue/green by changing selector)
apiVersion: v1
kind: Service
metadata:
  name: coditect-api
spec:
  selector:
    app: coditect-api
    version: blue  # Change to 'green' for cutover
  ports:
  - port: 8000
    targetPort: 8000

Implementation Priority

Phase 1: MVP (Current - Week 1)

✅ Backend API with JWT auth (DONE)
✅ FoundationDB integration (DONE)
🔲 Frontend wrapper (React + theia embed)
🔲 Basic session management
🔲 Stripe payment integration (registration flow)

Phase 2: Automated Provisioning (Week 2)

🔲 Kubernetes operator (Rust)
🔲 Automated namespace creation
🔲 RBAC automation
🔲 PVC provisioning
🔲 workspace pod deployment

Phase 3: CI/CD & GitOps (Week 3)

🔲 Helm charts for all components
🔲 ArgoCD setup
🔲 Cloud Build pipeline
🔲 Blue-green deployment

Phase 4: Production Ready (Week 4)

🔲 Idle pod cleanup job
🔲 Monitoring (Prometheus + Grafana)
🔲 Logging (Cloud Logging)
🔲 Alerting (PagerDuty integration)
🔲 Beta user onboarding

Conclusion

V4 Provides Solid Foundation:

✅ Multi-tenant FDB schema (proven, scalable)
✅ Authentication & authorization (JWT + OAuth + RBAC)
✅ Billing integration (Stripe + quotas)
✅ Session management patterns
✅ workspace tracking (user → pod assignments)

V5 Needs Automation:

❌ Automated pod provisioning (Kubernetes operator)
❌ Helm charts + ArgoCD (GitOps)
❌ CI/CD pipeline (Cloud Build → ArgoCD)
❌ Idle pod cleanup (CronJob)
❌ Blue-green deployments (zero-downtime)

Next Steps:

Build provisioning controller (Rust Kubernetes operator)
Create Helm charts for all components
Set up ArgoCD for GitOps
Implement CI/CD pipeline (Cloud Build)
Add idle pod cleanup CronJob
Configure blue-green deployment strategy

Timeline: 4 weeks to production-ready MVP with full automation.

Document Status: ✅ Complete Last Updated: 2025-10-07 Next Review: After Phase 1 completion

Executive Summary​

Table of Contents​

FoundationDB Key Schema​

V4 Key Structure (Proven Patterns)​

Core Data Models​

1. User & Tenant Models​

2. Session Models​

3. workspace Pod Models​

4. License & Billing Models​

5. Authentication Models​

ADR Summary​

Critical ADRs for V5​

ADR-004: FoundationDB (Detailed)​

ADR-007: Multi-Session Architecture​

ADR-014: Eclipse theia Foundation​

ADR-017: WebSocket Backend Architecture​

ADR-020: GCP Deployment​

ADR-021: User Management & Authentication​

Multi-Tenant Architecture​

Tenant Isolation Strategy​

Session Management​

Session Types​

Session Lifecycle​

User Authentication & Authorization​

JWT Token Flow​

Permission Checking​

License & Billing​

Stripe Integration​

workspace Pod Management​

V4 Pattern (Manual Provisioning)​

Gaps for V5​

What V4 Had (Keep)​

What V4 Lacked (Need for V5)​

Recommendations for V5​

1. Automated Pod Provisioning System​

2. Helm Charts for Deployment​

3. ArgoCD GitOps Pipeline​

4. Idle Pod Cleanup Job​

5. CI/CD Pipeline (Cloud Build)​

6. Blue-Green Deployment Strategy​

Implementation Priority​

Phase 1: MVP (Current - Week 1)​

Phase 2: Automated Provisioning (Week 2)​

Phase 3: CI/CD & GitOps (Week 3)​

Phase 4: Production Ready (Week 4)​

Conclusion​