V5 FoundationDB Schema and ADR Analysis
Date: 2025-10-07 Purpose: Comprehensive analysis of V4 database models and ADRs to inform V5 multi-tenant architecture Status: Active Reference Document
Executive Summary
This document analyzes V4's FoundationDB schema and Architecture Decision Records (ADRs) to extract proven patterns for V5's multi-tenant, multi-session, multi-llm IDE with automated pod provisioning.
Key Findings:
- ✅ Multi-tenant patterns are battle-tested - V4 uses tenant isolation extensively
- ✅ Session management exists - V4 has WebSocket session tracking with JWT auth
- ✅ License/subscription system ready - Stripe integration, quota tracking, usage monitoring
- ✅ workspace pod allocation - V4 tracks user → pod assignments
- ⚠️ V4 lacks automated provisioning - Pods were manually provisioned, not automated
- ⚠️ No Helm/ArgoCD - V4 used direct kubectl deployments
Table of Contents
- FoundationDB Key Schema
- Core Data Models
- ADR Summary
- Multi-Tenant Architecture
- Session Management
- User Authentication & Authorization
- License & Billing
- workspace Pod Management
- Gaps for V5
- Recommendations for V5
FoundationDB Key Schema
V4 Key Structure (Proven Patterns)
FoundationDB Key Hierarchy (UTF-8 encoded strings)
├── users/
│ ├── {user_id} → User record
│ ├── by_email/{email} → Email → user_id index
│ └── {user_id}/
│ ├── tenants/ → User's tenant associations
│ │ └── {tenant_id} → UserTenantAssociation
│ ├── session/{session_id} → Active user session (JWT)
│ └── apikey/{key_id} → API key metadata
│
├── tenants/
│ ├── {tenant_id} → Tenant record
│ ├── by_domain/{domain} → Domain → tenant_id index
│ └── {tenant_id}/
│ ├── users/ → Reverse index: tenant → users
│ │ └── {user_id} → User ID reference
│ ├── sessions/ → All sessions in tenant
│ │ └── {session_id} → Session metadata
│ └── workspaces/ → workspace assignments
│ └── {user_id} → workspaceAssignment
│
├── sessions/
│ ├── {session_id} → UserSession record (JWT metadata)
│ └── {session_id}/
│ ├── metadata → Session info, timestamps
│ ├── editor-tabs/ → Open files, positions
│ ├── llm-messages/ → Conversation history
│ └── config/ → Session-specific settings
│
├── workspaces/
│ └── {assignment_id} → workspaceAssignment (user → pod mapping)
│
├── pods/
│ └── {pod_name}/ → PodAllocation (load balancing)
│ ├── users/ → Active users in pod
│ └── health → Last health check
│
├── licenses/
│ ├── {user_id} → UserLicense (current license)
│ ├── history/{history_id} → LicenseHistory (audit log)
│ └── usage/{tenant_id}/{month} → TenantUsage (monthly tracking)
│
├── quotas/
│ └── {user_id}/{period} → UserQuota (daily/monthly limits)
│
├── apikeys/
│ └── {key_id} → APIKey record
│
├── files/
│ └── {file_path}/
│ ├── content → File contents
│ ├── metadata → Language, encoding, timestamps
│ └── versions/ → Version history
│
├── settings/
│ ├── global/ → Global preferences
│ └── workspace/ → workspace settings
│
├── models/
│ └── {model_id}/ → llm model configurations
│
└── dspy_prompts/ → DSPy optimization cache
└── {tenant_id}/{user_id}/{task_type}/{signature} → DspyPromptCache
Key Design Principles (from ADR-004):
- Hierarchical Keys: Slash-separated paths for readability and range queries
- Secondary Indexes: Dedicated keys for lookups (email, domain, OAuth)
- UTF-8 Encoding: Human-readable keys for debugging
- Bidirectional Relationships: Both
users/{id}/tenantsANDtenants/{id}/users - Atomic Transactions: All related updates in single FDB transaction
Core Data Models
1. User & Tenant Models
// Core user record (multi-tenant capable)
pub struct User {
pub user_id: Uuid,
pub email: String,
pub first_name: String,
pub last_name: String,
pub is_active: bool,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
pub password_hash: String, // Argon2
pub primary_tenant_id: Uuid, // Self-tenant (deterministic UUID v5)
}
// Tenant (organization/workspace)
pub struct Tenant {
pub tenant_id: Uuid,
pub name: String,
pub domain: Option<String>, // Custom domain for SSO
pub is_active: bool,
pub created_at: DateTime<Utc>,
pub updated_at: DateTime<Utc>,
}
// User-Tenant association with roles
pub struct UserTenantAssociation {
pub user_id: Uuid,
pub tenant_id: Uuid,
pub role: String, // "owner", "admin", "member", "viewer"
pub company: Option<String>,
pub joined_at: DateTime<Utc>,
pub invited_by: Option<Uuid>,
pub is_active: bool,
}
Self-Tenant Pattern (Critical for Multi-Tenancy):
- Every user gets a personal "self-tenant" on registration
- Self-tenant ID is deterministic:
UUID v5(namespace, user_id) - User is "owner" of their self-tenant
- Users can join multiple organization tenants with different roles
- All data is scoped to
tenant_id→ enables perfect isolation
2. Session Models
// From ADR-007 (Multi-Session Architecture)
interface Session {
id: string; // Unique session ID
name: string; // User-defined name
icon?: string; // Optional icon
createdAt: Date;
updatedAt: Date;
// State references
editorState: {
tabs: editorTab[];
activeTabId: string | null;
scrollPosition: number;
};
llmState: {
messages: Message[];
primaryModel: string | null;
secondaryModel: string | null;
mode: WorkflowMode; // 'single' | 'parallel' | 'sequential' | 'consensus'
config: llmConfig;
};
fileState: {
workspaceRoot: string;
expandedFolders: string[];
selectedFile: string | null;
};
terminalState: {
cwd: string;
history: string[];
buffer: string;
};
// Metadata
isDirty: boolean; // Unsaved changes
isActive: boolean; // Currently active
order: number; // Tab order
}
// Backend session tracking (JWT metadata)
pub struct UserSession {
pub session_id: String, // UUID
pub user_id: String,
pub access_token: String, // JWT (15 min)
pub refresh_token: String, // JWT (7 days)
pub ip_address: String,
pub user_agent: String,
pub created_at: i64,
pub expires_at: i64,
pub last_activity_at: i64,
pub is_active: bool,
}
Session Management Pattern:
- Frontend Sessions: Browser tabs with separate editor/llm/terminal state (Zustand store)
- Backend Sessions: JWT-tracked authentication sessions (FoundationDB)
- Auto-save: Every 500ms after changes (debounced)
- Persistence: Sessions survive browser restarts (loaded from FDB)
3. workspace Pod Models
// User → Pod assignment tracking
pub struct workspaceAssignment {
pub id: Uuid,
pub user_id: Uuid,
pub tenant_id: Uuid,
pub pod_name: String, // e.g., "workspace-abc123"
pub namespace: String, // e.g., "user-{user_id}"
pub assigned_at: DateTime<Utc>,
pub last_active: DateTime<Utc>,
pub status: workspaceStatus, // Active | Idle | Suspended | Terminating | Failed
pub resource_usage: ResourceUsage,
}
pub struct ResourceUsage {
pub cpu_cores: f32,
pub memory_mb: u64,
pub storage_gb: f32,
pub network_gb: f32,
}
// Pod allocation for load balancing
pub struct PodAllocation {
pub pod_name: String,
pub namespace: String,
pub total_users: u32,
pub active_users: u32,
pub cpu_allocated: f32,
pub memory_allocated_mb: u64,
pub last_health_check: DateTime<Utc>,
pub is_available: bool,
}
// workspace activity for usage tracking
pub struct workspaceActivity {
pub id: Uuid,
pub workspace_id: Uuid,
pub user_id: Uuid,
pub activity_type: ActivityType, // terminalSession | FileOperation | CodeExecution | BuildProcess | Idle
pub timestamp: DateTime<Utc>,
pub duration_seconds: Option<u64>,
pub metadata: Option<serde_json::Value>,
}
workspace Pattern (V4 Manual, V5 Automated):
- Each user gets a dedicated Kubernetes pod (theia IDE container)
- Pod is provisioned in per-user namespace
- FDB tracks
user → podassignment - Pod allocation tracks load balancing across cluster
- Activity logging for billing and idle detection
4. License & Billing Models
// User's current license
pub struct UserLicense {
pub user_id: Uuid,
pub license_id: Uuid,
pub assigned_at: DateTime<Utc>,
pub previous_license_id: Option<Uuid>,
}
// License change audit trail
pub struct LicenseHistory {
pub id: Uuid,
pub user_id: Uuid,
pub license_id: Uuid,
pub action: LicenseAction, // Created | Upgraded | Downgraded | Renewed | Expired | Cancelled | PaymentFailed
pub reason: String,
pub metadata: serde_json::Value,
pub created_at: DateTime<Utc>,
}
// Monthly usage tracking per tenant
pub struct TenantUsage {
pub tenant_id: Uuid,
pub month: String, // "2025-10"
pub projects_count: i32,
pub agents_count: i32,
pub storage_gb: f32,
pub workspace_hours: f32, // Total pod runtime
pub api_calls: i64,
pub last_updated: DateTime<Utc>,
}
// License enforcement
pub struct TenantLicensePolicy {
pub tenant_id: Uuid,
pub owner_user_id: Uuid,
pub license_type: LicenseType, // Free | Starter | Pro | Enterprise
pub enforcement_mode: EnforcementMode, // Strict | Warning | GracePeriod
pub override_limits: Option<serde_json::Value>,
pub updated_at: DateTime<Utc>,
}
// Real-time usage events
pub struct UsageEvent {
pub id: Uuid,
pub tenant_id: Uuid,
pub user_id: Uuid,
pub resource_type: ResourceType, // Projects | Agents | Storage | workspaceHours
pub operation: UsageOperation,
pub amount: f32,
pub timestamp: DateTime<Utc>,
pub metadata: Option<serde_json::Value>,
}
Billing Integration (from ADR-021):
- Stripe for payment processing
- License tiers: Free, Starter ($29/mo), Pro ($99/mo), Enterprise (custom)
- Quotas enforced via middleware (before llm calls, file operations)
- Usage tracked in real-time → synced to Stripe monthly
- Grace period enforcement for upgrades
5. Authentication Models
// From ADR-021 (User Management & Authentication)
interface User {
user_id: string; // UUID
email: string; // Unique
password_hash?: string; // bcrypt (null for OAuth-only users)
name: string;
avatar_url?: string;
role: 'admin' | 'developer' | 'viewer';
oauth_providers: Array<{
provider: 'google' | 'github';
provider_user_id: string;
access_token?: string; // Encrypted
}>;
created_at: number; // Timestamp
updated_at: number;
email_verified: boolean;
is_active: boolean;
metadata: Record<string, any>;
}
interface APIKey {
key_id: string; // UUID
user_id: string;
key_hash: string; // bcrypt of actual key
key_prefix: string; // First 8 chars for display (e.g., "ak_12ab...")
name: string; // User-defined name
permissions: string[]; // ['read:files', 'write:files', 'llm:chat']
created_at: number;
expires_at?: number;
last_used_at?: number;
is_active: boolean;
}
interface UserQuota {
user_id: string;
period: 'daily' | 'monthly';
period_start: number;
// Usage
llm_requests: number;
llm_tokens_used: number;
llm_cost_usd: number;
storage_bytes: number;
api_requests: number;
// Limits
llm_requests_limit: number;
llm_tokens_limit: number;
llm_cost_limit_usd: number;
storage_bytes_limit: number;
api_requests_limit: number;
}
Auth Pattern (JWT + OAuth + API Keys):
- JWT: Stateless authentication (15-min access token, 7-day refresh token)
- OAuth 2.0: Google + GitHub social login (reduces password breach risk)
- API Keys: For programmatic access (CLI, CI/CD)
- RBAC: Role-based permissions (admin/developer/viewer)
- Quotas: Rate limiting and usage enforcement
ADR Summary
Critical ADRs for V5
| ADR | Decision | Impact on V5 | Status |
|---|---|---|---|
| ADR-004 | Use FoundationDB for Persistence | ✅ KEEP - Multi-tenant key schema proven | Reuse patterns |
| ADR-007 | Multi-Session Architecture | ✅ KEEP - Browser tab sessions with FDB persistence | Implement |
| ADR-014 | Use Eclipse theia as Foundation | ✅ KEEP - VS Code-like IDE in browser | Active |
| ADR-017 | WebSocket Backend Architecture | ✅ KEEP - Sidecar pattern for theia ↔ Backend | Implement |
| ADR-020 | GCP Deployment | ✅ KEEP - GKE + Cloud Run infrastructure | Active |
| ADR-021 | User Management & Authentication | ✅ KEEP - JWT + OAuth + Stripe billing | Implement |
| ADR-022 | Audit Logging Architecture | ✅ KEEP - Compliance tracking (SOC2/GDPR) | Implement |
ADR-004: FoundationDB (Detailed)
Why FoundationDB over alternatives:
- ✅ ACID Transactions: Serializable isolation (critical for multi-tenant)
- ✅ Sub-10ms Latency: Fast session/file operations
- ✅ Horizontal Scaling: Millions of ops/sec
- ✅ Multi-Model: Key-value core + document/graph layers
- ✅ Watch API: Real-time updates for collaborative features
- ✅ Fault Tolerance: Self-healing, automatic replication
Rejected Alternatives:
- ❌ PostgreSQL: Slower, less scalable for key-value workloads
- ❌ MongoDB: Eventual consistency too weak
- ❌ Redis: Not durable enough for primary storage
- ❌ IndexedDB: Browser-only, no multi-client sync
Data Model:
/az1ai-ide/
├── sessions/{session-id}/
│ ├── metadata # Session info, timestamps
│ ├── editor-tabs/ # Open files, positions
│ ├── llm-messages/ # Conversation history
│ └── config/ # Session-specific settings
├── files/{file-path}/
│ ├── content # File contents
│ ├── metadata # Language, encoding, timestamps
│ └── versions/ # Version history
├── settings/
│ ├── global/ # Global preferences
│ └── workspace/ # workspace settings
└── models/{model-id}/ # Model configurations
ADR-007: Multi-Session Architecture
Decision: Tab-based sessions (like browser tabs)
Benefits:
- ✅ Work on multiple projects simultaneously
- ✅ Separate llm conversations per session
- ✅ Independent editor/terminal contexts
- ✅ Persistent across browser restarts
- ✅ Keyboard shortcuts (Cmd+1-9 for session switching)
Session Lifecycle:
- Create: Generate session ID, initialize state
- Auto-save: Debounced saves every 500ms
- Switch: Save current, load target session
- Close: Prompt for unsaved changes, delete from FDB
Session Isolation:
- Each session has isolated editor tabs, llm messages, terminal state
- Sessions saved to FDB with prefix:
sessions/{session-id}/ - OPFS cache for offline mode (FDB is source of truth)
ADR-014: Eclipse theia Foundation
Decision: Use theia framework (not build IDE from scratch)
Why theia:
- ✅ EPL 2.0 License: Free commercial use (no license fees)
- ✅ VS Code Compatible: Runs VS Code extensions
- ✅ Monaco editor: Same editor as VS Code
- ✅ Saves 6-12 months: Don't rebuild file explorer, terminal, settings
- ✅ Dependency Injection: Clean architecture with InversifyJS
- ✅ Multi-Language Support: TypeScript, Python, Rust out-of-box
theia vs VS Code:
| Feature | VS Code | theia |
|---|---|---|
| License | MIT | EPL 2.0 |
| Browser Support | ❌ | ✅ |
| Extension API | Full | Compatible |
| Customization | Limited | Full (framework) |
| Hosting | Desktop only | Cloud-native |
ADR-017: WebSocket Backend Architecture
Decision: Sidecar pattern for theia ↔ Backend communication
Architecture:
┌─────────────────────────────────────────┐
│ workspace Pod │
│ ┌──────────────┐ ┌─────────────────┐ │
│ │ theia IDE │ │ WebSocket │ │
│ │ (Port 3000) │←→│ Sidecar │ │
│ │ │ │ (Port 8080) │ │
│ └──────────────┘ └─────────────────┘ │
│ ↓ ↓ │
│ localhost:3000 localhost:8080 │
└─────────────────────────────────────────┘
↓
┌─────────────┐
│ Auth Backend│
│ (JWT verify)│
└─────────────┘
↓
┌─────────────┐
│ FoundationDB│
└─────────────┘
Why Sidecar:
- ✅ Localhost Communication: No network latency
- ✅ Security: WebSocket gateway validates JWT before forwarding
- ✅ Simplicity: theia doesn't need FDB client
- ✅ Isolation: Each pod has dedicated sidecar
ADR-020: GCP Deployment
Infrastructure (Already deployed):
- ✅ GKE Cluster:
codi-poc-e2-cluster(us-central1-a) - ✅ FoundationDB: 3-node StatefulSet (10.56.x.x:4500)
- ✅ Container Registry:
us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect - ✅ Domain:
coditect.ai(34.8.51.57, Google-managed SSL) - ✅ CI/CD: Cloud Build pipelines
Deployment Strategy:
- Backend API: Cloud Run (stateless, auto-scaling)
- workspace Pods: GKE (stateful, per-user pods)
- FoundationDB: GKE StatefulSet (persistent storage)
- Ingress: NGINX (load balancer + SSL termination)
ADR-021: User Management & Authentication
Authentication Methods:
- Email/Password: Argon2 password hashing
- Google OAuth: Social login (reduces friction)
- GitHub OAuth: Developer-friendly auth
- API Keys: For CLI/API access (bcrypt hashed)
JWT Strategy:
- Access Token: 15 minutes (short-lived for security)
- Refresh Token: 7 days (long-lived, revocable)
- Claims:
user_id,email,role,tenant_id
RBAC Roles:
const ROLES = {
admin: {
permissions: [{ resource: '*', actions: ['*'] }] // Full access
},
developer: {
permissions: [
{ resource: 'files', actions: ['read', 'write', 'delete'] },
{ resource: 'sessions', actions: ['read', 'write', 'delete'] },
{ resource: 'llm', actions: ['read', 'write'] },
{ resource: 'agents', actions: ['read', 'write'] },
{ resource: 'users', actions: ['read'] }, // Own profile only
]
},
viewer: {
permissions: [
{ resource: 'files', actions: ['read'] },
{ resource: 'sessions', actions: ['read'] },
{ resource: 'llm', actions: ['read'] },
]
}
};
Quota System:
interface UserQuota {
llm_requests_limit: 10000; // 10K requests/month
llm_tokens_limit: 10000000; // 10M tokens/month
llm_cost_limit_usd: 100; // $100/month
storage_bytes_limit: 10 * 1024 * 1024 * 1024; // 10GB
api_requests_limit: 100000; // 100K API requests/month
}
Multi-Tenant Architecture
Tenant Isolation Strategy
Key Principle: Every data access is scoped by tenant_id
FDB Key Prefixes:
tenant_id/resource_type/resource_id/...
Examples:
- 123e4567/sessions/abc-session-id/metadata
- 123e4567/files/src/main.ts/content
- 123e4567/workspaces/user-789/assignment
Security Enforcement:
// Middleware extracts tenant_id from JWT
pub async fn require_tenant_access(
req: &Request,
required_tenant_id: &Uuid,
) -> Result<(), AuthError> {
let user = req.user()?;
// Check if user belongs to tenant
let association = UserTenantRepository::get_association(
&user.user_id,
required_tenant_id
).await?;
if !association.is_active {
return Err(AuthError::Forbidden);
}
Ok(())
}
Benefits:
- ✅ Perfect Isolation: Impossible to access other tenant's data
- ✅ Range Queries: Fetch all sessions/files for tenant
- ✅ Efficient: Single FDB transaction per operation
- ✅ Auditable: All access logged with tenant context
Session Management
Session Types
1. Browser UI Sessions (ADR-007):
- Tab-based workspaces in theia frontend
- Persist to FDB:
sessions/{session-id}/ - Auto-save every 500ms (debounced)
- Keyboard shortcuts: Cmd+1-9 for switching
2. Backend Auth Sessions (ADR-021):
- JWT-tracked authentication sessions
- Stored in FDB:
user:sessions/{session-id} - Track: IP, user agent, last activity
- Expire after 7 days or forced logout
3. WebSocket Sessions (ADR-017):
- Real-time connection to backend
- Validated with JWT on connect
- Heartbeat every 30 seconds
- Auto-reconnect on disconnect
Session Lifecycle
// Create session
const createSession = async (name?: string) => {
const session: Session = {
id: nanoid(),
name: name || `Session ${sessions.length + 1}`,
createdAt: new Date(),
updatedAt: new Date(),
editorState: { tabs: [], activeTabId: null, scrollPosition: 0 },
llmState: {
messages: [],
primaryModel: null,
secondaryModel: null,
mode: 'single',
config: defaultllmConfig
},
fileState: {
workspaceRoot: '/',
expandedFolders: [],
selectedFile: null
},
terminalState: {
cwd: '~',
history: [],
buffer: ''
},
isDirty: false,
isActive: true,
order: sessions.length
};
await fdbService.saveSession(session);
return session;
};
// Auto-save on changes
useEffect(() => {
if (!activeSession) return;
const saveTimeout = setTimeout(() => {
fdbService.saveSession(activeSession);
}, 500); // Debounced 500ms
return () => clearTimeout(saveTimeout);
}, [activeSession]);
// Load on mount
useEffect(() => {
const restoreLastSession = async () => {
const lastSessionId = localStorage.getItem('lastActiveSession');
if (lastSessionId) {
const session = await fdbService.loadSession(lastSessionId);
setActiveSession(session);
}
};
restoreLastSession();
}, []);
User Authentication & Authorization
JWT Token Flow
1. User Login (Email/Password or OAuth)
↓
2. Backend Validates Credentials
↓
3. Generate Tokens:
- Access Token (15 min, contains: user_id, email, role, tenant_id)
- Refresh Token (7 days, contains: user_id, type='refresh')
↓
4. Store Session in FDB:
- user:sessions/{session_id}
- Contains: IP, user agent, tokens, timestamps
↓
5. Return Tokens to Frontend
↓
6. Frontend Stores in Memory (NOT localStorage for security)
↓
7. All API Requests: Authorization: Bearer <access_token>
↓
8. Backend Middleware Validates JWT
↓
9. If Expired: Use Refresh Token to Get New Access Token
↓
10. If Refresh Expired: Force Re-login
Permission Checking
// Middleware checks permissions before allowing operation
async function checkPermission(
user: User,
resource: string,
action: string
): Promise<boolean> {
const role = ROLES[user.role];
if (!role) return false;
for (const perm of role.permissions) {
if (perm.resource === '*' || perm.resource === resource) {
if (perm.actions.includes('*') || perm.actions.includes(action)) {
return true;
}
}
}
return false;
}
// Usage in route handler
app.post('/api/files/save', authenticate, async (req, res) => {
const hasPermission = await checkPermission(req.user, 'files', 'write');
if (!hasPermission) {
return res.status(403).json({ error: 'Permission denied' });
}
// ... save file
});
License & Billing
Stripe Integration
License Tiers:
const LICENSE_TIERS = {
free: {
price: 0,
limits: {
llm_requests: 100, // 100/month
storage_gb: 1, // 1GB
workspace_hours: 10, // 10 hours/month
agents: 1,
projects: 1,
}
},
starter: {
price: 29, // $29/month
stripe_price_id: 'price_abc123',
limits: {
llm_requests: 10000,
storage_gb: 10,
workspace_hours: 100,
agents: 5,
projects: 10,
}
},
pro: {
price: 99, // $99/month
stripe_price_id: 'price_def456',
limits: {
llm_requests: 100000,
storage_gb: 100,
workspace_hours: 500,
agents: -1, // Unlimited
projects: -1, // Unlimited
}
},
enterprise: {
price: null, // Custom pricing
limits: {
llm_requests: -1,
storage_gb: -1,
workspace_hours: -1,
agents: -1,
projects: -1,
}
}
};
Quota Enforcement:
// Before llm call
async function checkllmQuota(user_id: string): Promise<boolean> {
const quota = await fdbService.get(`quota:${user_id}:monthly`);
return quota.llm_requests < quota.llm_requests_limit;
}
// After llm call
async function incrementllmUsage(user_id: string, tokens: number): Promise<void> {
const quota = await fdbService.get(`quota:${user_id}:monthly`);
quota.llm_requests += 1;
quota.llm_tokens_used += tokens;
quota.llm_cost_usd += tokens * 0.00001; // Example: $0.01 per 1K tokens
await fdbService.set(`quota:${user_id}:monthly`, quota);
}
Webhook Handler (Stripe → Backend):
app.post('/webhooks/stripe', async (req, res) => {
const event = stripe.webhooks.constructEvent(
req.body,
req.headers['stripe-signature'],
process.env.STRIPE_WEBHOOK_SECRET
);
switch (event.type) {
case 'customer.subscription.created':
await handleSubscriptionCreated(event.data.object);
break;
case 'customer.subscription.updated':
await handleSubscriptionUpdated(event.data.object);
break;
case 'customer.subscription.deleted':
await handleSubscriptionCancelled(event.data.object);
break;
case 'invoice.payment_failed':
await handlePaymentFailed(event.data.object);
break;
}
res.json({ received: true });
});
workspace Pod Management
V4 Pattern (Manual Provisioning)
// User → Pod assignment (tracked in FDB)
pub struct workspaceAssignment {
pub id: Uuid,
pub user_id: Uuid,
pub tenant_id: Uuid,
pub pod_name: String, // "workspace-abc123"
pub namespace: String, // "user-{user_id}"
pub assigned_at: DateTime<Utc>,
pub last_active: DateTime<Utc>,
pub status: workspaceStatus,
pub resource_usage: ResourceUsage,
}
// Load balancing across pods
pub struct PodAllocation {
pub pod_name: String,
pub namespace: String,
pub total_users: u32,
pub active_users: u32,
pub cpu_allocated: f32,
pub memory_allocated_mb: u64,
pub last_health_check: DateTime<Utc>,
pub is_available: bool,
}
V4 Limitations:
- ❌ Pods were manually created via
kubectl apply - ❌ No automated provisioning on user signup
- ❌ No automated cleanup of idle pods
- ❌ No Helm charts or GitOps (ArgoCD)
Gaps for V5
What V4 Had (Keep)
- ✅ Multi-tenant FDB schema - Proven and scalable
- ✅ JWT authentication - Industry standard
- ✅ Stripe billing - Payment processing ready
- ✅ License/quota system - Usage tracking and enforcement
- ✅ Session management - Multi-session architecture
- ✅ workspace tracking - User → Pod assignments in FDB
What V4 Lacked (Need for V5)
- ❌ Automated Pod Provisioning - V4 required manual
kubectl apply - ❌ Kubernetes Operator - No controller for watching user signups
- ❌ Helm Charts - No templated deployments
- ❌ ArgoCD/GitOps - No declarative deployment pipeline
- ❌ Auto-Scaling - Pods not auto-scaled based on load
- ❌ Idle Pod Cleanup - No automated termination of idle workspaces
- ❌ RBAC Automation - ServiceAccounts/Roles created manually
- ❌ PVC Provisioning - Persistent volumes not auto-created
- ❌ Blue-Green Deploys - No zero-downtime rollout strategy
Recommendations for V5
1. Automated Pod Provisioning System
Architecture:
User Registration → Backend API → Provisioning Controller → Kubernetes API
↓
Creates: Namespace + RBAC + PVC + Pod
↓
Stores: workspaceAssignment in FDB
↓
Returns: Pod URL to user
Provisioning Controller (Rust Kubernetes Operator):
pub struct ProvisioningController {
k8s_client: Client,
fdb_client: Database,
}
impl ProvisioningController {
pub async fn provision_workspace(&self, user_id: &str, user_email: &str) -> Result<workspaceAssignment> {
let ns_name = format!("user-{}", user_id);
// 1. Create namespace
self.create_namespace(&ns_name).await?;
// 2. Create RBAC (ServiceAccount, Role, RoleBinding)
self.create_rbac(&ns_name, user_email).await?;
// 3. Create PVC (10GB default)
self.create_pvc(&ns_name, "workspace-pvc", "10Gi").await?;
// 4. Create workspace pod (theia + Sidecar)
let pod_name = self.create_workspace_pod(&ns_name, user_id).await?;
// 5. Wait for pod ready
self.wait_for_pod_ready(&ns_name, &pod_name, Duration::from_secs(120)).await?;
// 6. Save assignment to FDB
let assignment = workspaceAssignment {
id: Uuid::new_v4(),
user_id: Uuid::parse_str(user_id)?,
tenant_id: self.get_user_tenant(user_id).await?,
pod_name: pod_name.clone(),
namespace: ns_name.clone(),
assigned_at: Utc::now(),
last_active: Utc::now(),
status: workspaceStatus::Active,
resource_usage: ResourceUsage::default(),
};
self.save_assignment(&assignment).await?;
Ok(assignment)
}
}
2. Helm Charts for Deployment
Chart Structure:
helm/
├── Chart.yaml
├── values.yaml
├── values-prod.yaml
├── values-staging.yaml
└── templates/
├── backend-deployment.yaml
├── backend-service.yaml
├── workspace-pod-template.yaml # Template for user pods
├── foundationdb-statefulset.yaml
├── ingress.yaml
├── secrets.yaml
├── rbac.yaml
└── provisioning-controller.yaml
Install Commands:
# Deploy to staging
helm upgrade --install coditect-staging ./helm \
-f ./helm/values-staging.yaml \
--namespace coditect-staging \
--create-namespace
# Deploy to production
helm upgrade --install coditect-prod ./helm \
-f ./helm/values-prod.yaml \
--namespace coditect-app \
--create-namespace
3. ArgoCD GitOps Pipeline
Workflow:
1. Developer commits to main branch
↓
2. Cloud Build triggers:
- Build Docker images (backend, theia, sidecar)
- Push to GCR
- Update Helm values.yaml with new image tags
↓
3. Commit new values.yaml to Git
↓
4. ArgoCD watches Git repo
↓
5. ArgoCD syncs changes to GKE cluster
↓
6. Rolling update (zero-downtime)
↓
7. Health checks pass → Deployment complete
ArgoCD Application:
# argocd/coditect-prod.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: coditect-prod
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/coditect-ai/Coditect-v5-multiple-llm-IDE
targetRevision: main
path: helm
helm:
valueFiles:
- values-prod.yaml
destination:
server: https://kubernetes.default.svc
namespace: coditect-app
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
4. Idle Pod Cleanup Job
CronJob (runs every hour):
apiVersion: batch/v1
kind: CronJob
metadata:
name: idle-pod-cleanup
spec:
schedule: "0 * * * *" # Every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/pod-cleanup:latest
env:
- name: IDLE_THRESHOLD_HOURS
value: "2" # Terminate pods idle for 2+ hours
Cleanup Logic:
async fn cleanup_idle_pods() -> Result<()> {
let idle_threshold = Duration::hours(2);
let now = Utc::now();
// Query FDB for all workspace assignments
let assignments: Vec<workspaceAssignment> = fdb_client
.scan("workspaces/")
.await?;
for assignment in assignments {
if assignment.status == workspaceStatus::Active {
let idle_duration = now - assignment.last_active;
if idle_duration > idle_threshold {
// Mark as idle
assignment.status = workspaceStatus::Idle;
fdb_client.update_assignment(&assignment).await?;
// Delete pod
k8s_client.delete_pod(&assignment.namespace, &assignment.pod_name).await?;
println!("Terminated idle pod: {}/{}", assignment.namespace, assignment.pod_name);
}
}
}
Ok(())
}
5. CI/CD Pipeline (Cloud Build)
cloudbuild-v5.yaml:
steps:
# Build all images in parallel
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/backend-api:$SHORT_SHA', './backend']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/theia-ide:$SHORT_SHA', './theia-app']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/ws-sidecar:$SHORT_SHA', './websocket-sidecar']
# Push all images
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/backend-api:$SHORT_SHA']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/theia-ide:$SHORT_SHA']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/$PROJECT_ID/coditect/ws-sidecar:$SHORT_SHA']
# Update Helm values with new image tags
- name: 'gcr.io/cloud-builders/git'
args:
- 'config'
- 'user.email'
- 'cloud-build@coditect.ai'
- name: 'gcr.io/cloud-builders/git'
args:
- 'config'
- 'user.name'
- 'Cloud Build'
- name: 'gcr.io/cloud-builders/git'
entrypoint: 'bash'
args:
- '-c'
- |
sed -i "s/backend-api:.*$/backend-api:$SHORT_SHA/" helm/values-prod.yaml
sed -i "s/theia-ide:.*$/theia-ide:$SHORT_SHA/" helm/values-prod.yaml
sed -i "s/ws-sidecar:.*$/ws-sidecar:$SHORT_SHA/" helm/values-prod.yaml
git add helm/values-prod.yaml
git commit -m "chore: Update production images to $SHORT_SHA"
git push origin main
timeout: '3600s'
options:
machineType: 'N1_HIGHCPU_8'
6. Blue-Green Deployment Strategy
Concept: Run two identical production environments (Blue = current, Green = new)
Steps:
- Deploy new version to "Green" environment
- Run health checks and smoke tests on Green
- If tests pass, switch Ingress to point to Green
- Monitor Green for 1 hour
- If stable, decommission Blue
- If issues, rollback Ingress to Blue
Implementation with ArgoCD:
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: coditect-api-blue
spec:
replicas: 3
selector:
matchLabels:
app: coditect-api
version: blue
template:
metadata:
labels:
app: coditect-api
version: blue
spec:
containers:
- name: api
image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/backend-api:v1.0.0
---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: coditect-api-green
spec:
replicas: 3
selector:
matchLabels:
app: coditect-api
version: green
template:
metadata:
labels:
app: coditect-api
version: green
spec:
containers:
- name: api
image: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/backend-api:v1.1.0
---
# Service (switch between blue/green by changing selector)
apiVersion: v1
kind: Service
metadata:
name: coditect-api
spec:
selector:
app: coditect-api
version: blue # Change to 'green' for cutover
ports:
- port: 8000
targetPort: 8000
Implementation Priority
Phase 1: MVP (Current - Week 1)
- ✅ Backend API with JWT auth (DONE)
- ✅ FoundationDB integration (DONE)
- 🔲 Frontend wrapper (React + theia embed)
- 🔲 Basic session management
- 🔲 Stripe payment integration (registration flow)
Phase 2: Automated Provisioning (Week 2)
- 🔲 Kubernetes operator (Rust)
- 🔲 Automated namespace creation
- 🔲 RBAC automation
- 🔲 PVC provisioning
- 🔲 workspace pod deployment
Phase 3: CI/CD & GitOps (Week 3)
- 🔲 Helm charts for all components
- 🔲 ArgoCD setup
- 🔲 Cloud Build pipeline
- 🔲 Blue-green deployment
Phase 4: Production Ready (Week 4)
- 🔲 Idle pod cleanup job
- 🔲 Monitoring (Prometheus + Grafana)
- 🔲 Logging (Cloud Logging)
- 🔲 Alerting (PagerDuty integration)
- 🔲 Beta user onboarding
Conclusion
V4 Provides Solid Foundation:
- ✅ Multi-tenant FDB schema (proven, scalable)
- ✅ Authentication & authorization (JWT + OAuth + RBAC)
- ✅ Billing integration (Stripe + quotas)
- ✅ Session management patterns
- ✅ workspace tracking (user → pod assignments)
V5 Needs Automation:
- ❌ Automated pod provisioning (Kubernetes operator)
- ❌ Helm charts + ArgoCD (GitOps)
- ❌ CI/CD pipeline (Cloud Build → ArgoCD)
- ❌ Idle pod cleanup (CronJob)
- ❌ Blue-green deployments (zero-downtime)
Next Steps:
- Build provisioning controller (Rust Kubernetes operator)
- Create Helm charts for all components
- Set up ArgoCD for GitOps
- Implement CI/CD pipeline (Cloud Build)
- Add idle pod cleanup CronJob
- Configure blue-green deployment strategy
Timeline: 4 weeks to production-ready MVP with full automation.
Document Status: ✅ Complete Last Updated: 2025-10-07 Next Review: After Phase 1 completion