Persistent Storage Strategy for Dynamic Pod Scaling
Date: 2025-10-28 Status: CRITICAL - MVP Blocker Issue: User workspace data tied to specific pods, lost on pod scale-down/failure
Problem Statement
Current Architecture:
- StatefulSet with pod-local PVCs (50 GB workspace + 5 GB config per pod)
- Session affinity: ClientIP with 3-hour timeout
- HPA scaling: 3-30 pods based on CPU/Memory
Critical Issue:
User creates files in pod-0 → Pod-0 scales down → User data LOST
Why This Blocks MVP:
- ❌ Users lose work when pods scale down
- ❌ Can't switch pods even if another has capacity
- ❌ Session affinity (3 hours) is temporary workaround
- ❌ Unacceptable data loss risk for production
Requirements
| Requirement | Priority | Rationale |
|---|---|---|
| Data Persistence | P0 | No data loss on pod lifecycle events |
| Pod Portability | P0 | Users can access data from any pod |
| Performance | P1 | IDE must be responsive (<100ms file ops) |
| Multi-User Isolation | P1 | Users can't access each other's data |
| Cost Efficiency | P2 | Minimize storage costs at MVP scale |
| Backup/Recovery | P2 | Point-in-time restore capability |
Storage Options Analysis
Option 1: Shared NFS (Google Filestore)
Architecture:
┌─────────────────────────────────────┐
│ Google Filestore (NFS) │
│ /workspace/users/{user_id}/ │
└──────────────┬──────────────────────┘
│
┌───────┴────────┐
│ │
┌───▼───┐ ┌───▼───┐
│ Pod-0 │ │ Pod-1 │ ... Pod-N
└───────┘ └───────┘
(mount /workspace)
Implementation:
# filestore.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: filestore-workspace
spec:
capacity:
storage: 1Ti
accessModes:
- ReadWriteMany
nfs:
server: 10.0.0.2 # Filestore IP
path: /workspace
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-shared
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Ti
Pros:
- ✅ POSIX-compliant (full filesystem semantics)
- ✅ ReadWriteMany - all pods access simultaneously
- ✅ Simple implementation (standard NFS mount)
- ✅ Familiar mental model for users
- ✅ Works with existing IDE tools (no code changes)
Cons:
- ❌ Cost: $200-400/month for 1TB (Basic tier: $0.20/GB/month)
- ❌ Performance: Network latency on file ops (10-50ms)
- ❌ Single Point of Failure: Filestore outage = all pods down
- ❌ Scaling Limits: Max 100TB, 60MB/s per TB
- ❌ No built-in versioning: Need separate backup solution
Cost Breakdown (1TB Filestore Basic):
- Storage: $204.80/month ($0.20/GB × 1024 GB)
- Operations: Included
- Total: ~$205/month
Performance Expectations:
- Read latency: 5-15ms (cached)
- Write latency: 10-30ms (sync)
- Throughput: 60-100 MB/s (Basic tier)
- IOPS: 1000-3000 (depending on file size)
When to Use:
- Need full POSIX compliance
- < 50 concurrent users
- Can tolerate 10-30ms latency
- Budget allows $200+/month storage
Option 2: Google Cloud Storage (GCS) with gcsfuse
Architecture:
┌─────────────────────────────────────┐
│ GCS Bucket: coditect-workspaces │
│ users/{user_id}/files/ │
└──────────────┬──────────────────────┘
│ (gcsfuse mount)
┌───────┴────────┐
│ │
┌───▼───┐ ┌───▼───┐
│ Pod-0 │ │ Pod-1 │ ... Pod-N
└───────┘ └───────┘
(mount /workspace via gcsfuse)
Implementation:
# gcs-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gcs-fuse-mounter
spec:
template:
spec:
containers:
- name: gcs-fuse
image: gcr.io/gcs-fuse-csi-driver/gcs-fuse-csi-driver
securityContext:
privileged: true
volumeMounts:
- name: gcs-mount
mountPath: /workspace
mountPropagation: Bidirectional
volumes:
- name: gcs-mount
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: coditect-workspaces
mountOptions: "implicit-dirs,file-mode=644,dir-mode=755"
Pros:
- ✅ Cost-Effective: $20-50/month for 1TB (Standard: $0.020/GB/month)
- ✅ Scalable: Unlimited capacity, auto-scales
- ✅ Durable: 99.999999999% durability (11 nines)
- ✅ Versioning: Built-in object versioning
- ✅ No SPOF: Highly available by design
Cons:
- ❌ Not POSIX-compliant: Limited metadata, no hard links, atomic ops
- ❌ Eventual Consistency: Directory listings may lag
- ❌ Latency: 50-200ms for small file ops (network roundtrip)
- ❌ Cache Complexity: Need aggressive caching for IDE performance
- ❌ Debugging Harder: FUSE layer adds complexity
Cost Breakdown (1TB GCS Standard):
- Storage: $20.48/month ($0.020/GB × 1024 GB)
- Operations: ~$5/month (Class A: 100K ops, Class B: 1M ops)
- Network egress: Minimal (same region)
- Total: ~$25-30/month
Performance Expectations:
- Read latency: 50-150ms (uncached)
- Write latency: 100-300ms (sync)
- Throughput: Limited by network (typically 100-500 MB/s)
- IOPS: 1000-5000 (highly variable)
POSIX Limitations:
# These operations DON'T work properly with gcsfuse:
ln file1 file2 # Hard links not supported
mv file1 file2 # Not atomic across "directories"
flock file # File locking unreliable
stat file # Some metadata missing (ctime, etc.)
When to Use:
- Cost is primary concern
- Can tolerate eventual consistency
- Workload is read-heavy (can cache aggressively)
- Budget allows $25-30/month storage
Option 3: User-Specific PVCs with Pod Affinity
Architecture:
┌─────────────────┐ ┌─────────────────┐
│ user-abc-pvc │ │ user-xyz-pvc │
│ (50 GB) │ │ (50 GB) │
└────────┬────────┘ └────────┬────────┘
│ (affinity) │ (affinity)
┌───▼───┐ ┌───▼───┐
│ Pod-0 │ │ Pod-1 │ ... Pod-N
└───────┘ └───────┘
(user-abc) (user-xyz)
Implementation:
# user-pvc-template.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-{{ user_id }}
labels:
user: {{ user_id }}
app: coditect-workspace
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: standard-rwo
---
# Modified StatefulSet with dynamic PVC binding
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: coditect-combined
spec:
template:
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: user
operator: In
values: ["{{ user_id }}"]
topologyKey: kubernetes.io/hostname
volumes:
- name: workspace
persistentVolumeClaim:
claimName: workspace-{{ user_id }}
Pros:
- ✅ Full POSIX: Standard block storage (ext4/xfs)
- ✅ Per-User Isolation: Each user gets dedicated PVC
- ✅ Performance: Local SSD possible (sub-ms latency)
- ✅ Kubernetes-Native: No external dependencies
- ✅ Snapshot Support: GKE persistent disk snapshots
Cons:
- ❌ Complex Scheduling: Need custom controller for PVC→Pod binding
- ❌ Scale Challenges: 1000 users = 1000 PVCs (management overhead)
- ❌ Pod Stickiness: User tied to pod, can't easily switch
- ❌ Wasted Capacity: Idle user PVCs still consume storage
- ❌ Cold Start: Attaching PVC to new pod takes 30-60s
Cost Breakdown (50 GB × 20 users):
- Storage: $20/month (20 users × 50 GB × $0.020/GB)
- Snapshots: ~$5/month (weekly backups)
- Total: ~$25/month (at 20 users)
Scaling Cost:
- 100 users: $100/month (5 TB total)
- 500 users: $500/month (25 TB total)
Performance Expectations:
- Read latency: <1ms (SSD)
- Write latency: <5ms (SSD)
- Throughput: 240 MB/s (standard PD)
- IOPS: 15K-30K (SSD)
When to Use:
- Need maximum performance
- User count < 100
- Can implement custom PVC lifecycle management
- Budget scales linearly with users
Option 4: Hybrid - Shared Base + User Overlays (RECOMMENDED)
Architecture:
┌─────────────────────────────────────┐
│ Shared Base Image (Read-Only) │
│ /app/base/ - IDE, tools, templates │
└──────────────┬──────────────────────┘
│ (overlay mount)
┌───────┴────────┐
│ │
┌───▼───────────┐ ┌───▼───────────┐
│ user-abc-pvc │ │ user-xyz-pvc │
│ (10 GB) │ │ (10 GB) │
│ /workspace/ │ │ /workspace/ │
└───────────────┘ └───────────────┘
Implementation:
# hybrid-storage.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: shared-base-config
data:
# IDE templates, default configs, tool binaries
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-{{ user_id }}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi # User files only
---
# Pod mounts both shared base (RO) and user PVC (RW)
apiVersion: v1
kind: Pod
spec:
containers:
- name: theia
volumeMounts:
- name: shared-base
mountPath: /app/base
readOnly: true
- name: user-workspace
mountPath: /workspace
volumes:
- name: shared-base
configMap:
name: shared-base-config
- name: user-workspace
persistentVolumeClaim:
claimName: workspace-{{ user_id }}
Pros:
- ✅ Cost Efficient: 10 GB per user vs 50 GB (80% reduction)
- ✅ Fast Provisioning: Smaller PVCs attach faster (<10s)
- ✅ Shared Updates: Base image updated once, all pods benefit
- ✅ POSIX Compliant: User workspace is standard block storage
- ✅ Good Performance: User files on SSD, base cached in memory
Cons:
- ⚠️ Two-Tier Complexity: Need to manage base + user storage
- ⚠️ Base Image Updates: Require pod restarts to pick up changes
- ⚠️ Path Management: Apps need to know base vs workspace paths
Cost Breakdown (10 GB × 20 users):
- User PVCs: $4/month (20 users × 10 GB × $0.020/GB)
- Shared base: $1/month (50 GB ConfigMap storage)
- Snapshots: $2/month
- Total: ~$7/month (at 20 users)
Scaling Cost:
- 100 users: $20/month (1 TB user data + 50 GB base)
- 500 users: $100/month (5 TB user data + 50 GB base)
Performance Expectations:
- Read latency: <1ms (user files), <0.1ms (base cached)
- Write latency: <5ms (user files only)
- Throughput: 240 MB/s (standard PD)
- IOPS: 15K-30K (SSD)
When to Use:
- Need cost efficiency + performance
- Have clear separation: base (tools) vs user (files)
- Can implement dual-mount architecture
- RECOMMENDED FOR MVP
Cost Comparison Summary
| Option | 20 Users | 100 Users | 500 Users | Cost per User/Month |
|---|---|---|---|---|
| NFS (Filestore) | $205 | $410 | $2,050 | $10.25 (fixed overhead) |
| GCS (gcsfuse) | $30 | $60 | $300 | $1.50 |
| User PVCs (50 GB) | $25 | $100 | $500 | $1.25 |
| Hybrid (10 GB) | $7 | $20 | $100 | $0.35 |
Performance Comparison
| Operation | NFS | GCS | User PVC | Hybrid |
|---|---|---|---|---|
| Small File Read | 10ms | 100ms | <1ms | <1ms |
| Large File Read | 50ms | 200ms | 5ms | 5ms |
| File Write | 30ms | 300ms | 5ms | 5ms |
| Directory List | 20ms | 500ms | 2ms | 2ms |
| Git Clone (100MB) | 5s | 30s | 2s | 2s |
| npm install (500 pkgs) | 45s | 5min | 20s | 20s |
Recommendation: Hybrid Approach (Option 4)
Why Hybrid is Best for MVP:
- Cost: $7/month for 20 users vs $205/month (NFS) - 96% savings
- Performance: SSD latency for user files (<1ms) - IDE responsive
- Scalability: Linear cost growth ($0.35/user/month) - Predictable
- Simplicity: Standard Kubernetes PVCs - No external dependencies
- Isolation: Per-user PVCs - Security compliant
Implementation Plan:
Phase 1: Base Image Design (2-3 hours)
Create shared base with:
/app/base/
├── .coditect/ # IDE configs, agents, skills
├── tools/ # Global tools (git, npm, etc.)
├── templates/ # Project templates
└── extensions/ # theia extensions
ConfigMap Creation:
# Create base image tarball
tar -czf base-image.tar.gz .coditect/ tools/ templates/ extensions/
# Create ConfigMap from tarball
kubectl create configmap shared-base-config \
--from-file=base-image.tar.gz \
-n coditect-app
# Pods extract on startup
kubectl exec -it coditect-combined-0 -- \
tar -xzf /app/base/base-image.tar.gz -C /app/base/
Phase 2: User PVC Provisioning (3-4 hours)
Dynamic PVC Creation Script:
#!/bin/bash
# scripts/create-user-workspace.sh
USER_ID=$1
PVC_SIZE=${2:-10Gi}
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-${USER_ID}
namespace: coditect-app
labels:
user: ${USER_ID}
app: coditect-workspace
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: ${PVC_SIZE}
storageClassName: standard-rwo
EOF
echo "Created PVC workspace-${USER_ID} (${PVC_SIZE})"
Phase 3: StatefulSet Modification (4-5 hours)
Updated StatefulSet:
# k8s/theia-statefulset-hybrid.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: coditect-combined
namespace: coditect-app
spec:
replicas: 3
template:
spec:
containers:
- name: combined
image: us-central1-docker.pkg.dev/.../coditect-combined:latest
volumeMounts:
# Shared base (read-only)
- name: shared-base
mountPath: /app/base
readOnly: true
# User workspace (read-write, dynamically assigned)
- name: user-workspace
mountPath: /workspace
env:
- name: BASE_PATH
value: "/app/base"
- name: WORKSPACE_PATH
value: "/workspace"
volumes:
- name: shared-base
configMap:
name: shared-base-config
# User PVC assigned dynamically based on session
- name: user-workspace
persistentVolumeClaim:
claimName: workspace-{{ user_id }}
Challenge: Dynamic PVC assignment requires custom logic. Options:
- Pod Mutation Webhook - Intercept pod creation, inject correct PVC name
- Session-Based Routing - Load balancer routes user to pod with their PVC
- Pre-Provisioned Pod Pools - Create pods with pre-attached PVCs, assign to users
Recommended: Session-Based Routing (simplest for MVP)
Phase 4: Session-Based Routing (5-6 hours)
Architecture:
User Login → FDB Session → Get user_id → Find pod with workspace-{user_id} → Route to pod
Implementation:
# backend/src/handlers/session.py
async def get_user_pod(user_id: str) -> str:
"""Find pod with user's PVC attached."""
# Query Kubernetes API for pods with volume workspace-{user_id}
pods = await k8s_client.list_namespaced_pod(
namespace="coditect-app",
label_selector=f"user={user_id}"
)
if not pods.items:
# No pod with user's PVC - need to assign one
return await assign_user_to_pod(user_id)
# Return first healthy pod
for pod in pods.items:
if pod.status.phase == "Running":
return pod.metadata.name
raise Exception(f"No healthy pod for user {user_id}")
async def assign_user_to_pod(user_id: str) -> str:
"""Assign user to available pod, attach PVC."""
# Find pod with capacity
available_pods = await find_pods_with_capacity()
if not available_pods:
raise Exception("No pods with capacity")
pod_name = available_pods[0]
# Label pod with user_id
await k8s_client.patch_namespaced_pod(
name=pod_name,
namespace="coditect-app",
body={"metadata": {"labels": {"user": user_id}}}
)
# Attach user's PVC (requires pod restart)
# ... implementation details ...
return pod_name
Challenge: Can't dynamically attach PVCs to running pods. Options:
- Pre-attach PVCs - Each pod has multiple PVC slots, mount on-demand
- Pod Restart - Restart pod with new PVC (30-60s downtime)
- Dedicated User Pods - 1 pod per user (expensive at scale)
Recommended for MVP: Pre-attach PVC slots (mount 5-10 PVCs per pod, activate on user login)
Phase 5: Testing & Validation (3-4 hours)
Test Scenarios:
# 1. User creates files, switches pods
kubectl exec -it coditect-combined-0 -- touch /workspace/test.txt
# User logs out, logs back in (routed to pod-1)
kubectl exec -it coditect-combined-1 -- ls /workspace/test.txt
# Expected: File exists
# 2. Pod scales down, user data persists
kubectl scale statefulset coditect-combined --replicas=2
# User logs in (routed to pod-0)
kubectl exec -it coditect-combined-0 -- ls /workspace/test.txt
# Expected: File exists
# 3. Base image update
kubectl delete configmap shared-base-config
kubectl create configmap shared-base-config --from-file=new-base.tar.gz
kubectl rollout restart statefulset coditect-combined
# Expected: All pods get new base, user data intact
Implementation Timeline
| Phase | Duration | Dependencies | Deliverable |
|---|---|---|---|
| Phase 1: Base Image | 2-3 hours | None | ConfigMap with shared tools |
| Phase 2: User PVCs | 3-4 hours | Phase 1 | PVC creation script |
| Phase 3: StatefulSet | 4-5 hours | Phase 2 | Hybrid storage manifest |
| Phase 4: Routing | 5-6 hours | Phase 3 | Session-based pod assignment |
| Phase 5: Testing | 3-4 hours | Phase 4 | Validated persistent storage |
| Total | 17-22 hours | - | Production-ready storage |
Migration Path (From Current → Hybrid)
Step 1: Backup Current Data (30 min)
# Snapshot all existing PVCs
for i in 0 1 2; do
kubectl exec coditect-combined-$i -- tar -czf /tmp/backup-$i.tar.gz /workspace
kubectl cp coditect-app/coditect-combined-$i:/tmp/backup-$i.tar.gz ./backup-pod-$i.tar.gz
done
Step 2: Create Base Image (1 hour)
# Extract common files from pod-0
kubectl exec coditect-combined-0 -- tar -czf /tmp/base.tar.gz /app/.coditect /app/tools
kubectl cp coditect-app/coditect-combined-0:/tmp/base.tar.gz ./base-image.tar.gz
# Create ConfigMap
kubectl create configmap shared-base-config \
--from-file=base-image.tar.gz \
-n coditect-app
Step 3: Create User PVCs (1 hour)
# For each existing user, create 10 GB PVC
./scripts/create-user-workspace.sh user-001 10Gi
./scripts/create-user-workspace.sh user-002 10Gi
# ... repeat for all users
Step 4: Migrate User Data (2 hours)
# Copy user files from old 50 GB PVC to new 10 GB PVC
kubectl run migration-pod --image=busybox --restart=Never \
--overrides='
{
"spec": {
"volumes": [
{"name": "old-pvc", "persistentVolumeClaim": {"claimName": "workspace-old"}},
{"name": "new-pvc", "persistentVolumeClaim": {"claimName": "workspace-user-001"}}
],
"containers": [{
"name": "migration",
"image": "busybox",
"command": ["sh", "-c", "cp -r /old/workspace/* /new/workspace/"],
"volumeMounts": [
{"name": "old-pvc", "mountPath": "/old"},
{"name": "new-pvc", "mountPath": "/new"}
]
}]
}
}'
# Wait for completion
kubectl wait --for=condition=complete pod/migration-pod --timeout=600s
Step 5: Deploy Hybrid StatefulSet (30 min)
# Apply new manifest
kubectl apply -f k8s/theia-statefulset-hybrid.yaml
# Verify all pods running
kubectl get pods -n coditect-app -l app=coditect-combined
# Check mounts
kubectl exec coditect-combined-0 -- df -h /app/base /workspace
Step 6: Validate & Cleanup (1 hour)
# Test user login and file access
curl -X POST https://api.coditect.ai/login -d '{"username":"user-001","password":"..."}'
# Open IDE, verify files present
# Delete old PVCs (after validation)
kubectl delete pvc workspace-old -n coditect-app
# Monitor for 24 hours before deleting backups
Total Migration Time: 6-7 hours (including validation)
Monitoring & Alerts
Critical Metrics:
# PVC usage per user
kubectl get pvc -n coditect-app -o json | \
jq '.items[] | {name: .metadata.name, capacity: .spec.resources.requests.storage, used: .status.capacity.storage}'
# Pod-PVC binding status
kubectl get pods -n coditect-app -o json | \
jq '.items[] | {pod: .metadata.name, pvcs: [.spec.volumes[] | select(.persistentVolumeClaim) | .persistentVolumeClaim.claimName]}'
Recommended Alerts:
- PVC >90% full → Expand or warn user
- PVC attach failure → User can't access workspace
- Base ConfigMap update failed → Pods running stale configs
- Orphaned PVCs → User deleted but PVC remains (cost waste)
Conclusion
Recommended Solution: Hybrid Storage (Option 4)
Why:
- ✅ 96% cost savings vs NFS ($7/month vs $205/month for 20 users)
- ✅ <1ms latency for user files (SSD performance)
- ✅ Linear scaling ($0.35/user/month - predictable costs)
- ✅ No external dependencies (pure Kubernetes)
- ✅ Security compliant (per-user PVC isolation)
Next Steps:
- ✅ Review and approve this analysis
- Create base image ConfigMap (2-3 hours)
- Implement user PVC provisioning (3-4 hours)
- Update StatefulSet for hybrid mounts (4-5 hours)
- Implement session-based routing (5-6 hours)
- Test with 5 beta users (3-4 hours)
- Migrate existing users (6-7 hours)
Timeline: Ready for MVP in 24-30 hours (3-4 days)
Cost Impact: $7/month for 20 users (vs $500/month with current 10-pod scaling plan)
Questions for User:
- Approve Hybrid approach for MVP?
- Acceptable timeline (3-4 days)?
- Preferred migration strategy (all at once vs phased)?