Skip to main content

Persistent Storage Strategy for Dynamic Pod Scaling

Date: 2025-10-28 Status: CRITICAL - MVP Blocker Issue: User workspace data tied to specific pods, lost on pod scale-down/failure

Problem Statement

Current Architecture:

  • StatefulSet with pod-local PVCs (50 GB workspace + 5 GB config per pod)
  • Session affinity: ClientIP with 3-hour timeout
  • HPA scaling: 3-30 pods based on CPU/Memory

Critical Issue:

User creates files in pod-0 → Pod-0 scales down → User data LOST

Why This Blocks MVP:

  • ❌ Users lose work when pods scale down
  • ❌ Can't switch pods even if another has capacity
  • ❌ Session affinity (3 hours) is temporary workaround
  • ❌ Unacceptable data loss risk for production

Requirements

RequirementPriorityRationale
Data PersistenceP0No data loss on pod lifecycle events
Pod PortabilityP0Users can access data from any pod
PerformanceP1IDE must be responsive (<100ms file ops)
Multi-User IsolationP1Users can't access each other's data
Cost EfficiencyP2Minimize storage costs at MVP scale
Backup/RecoveryP2Point-in-time restore capability

Storage Options Analysis

Option 1: Shared NFS (Google Filestore)

Architecture:

┌─────────────────────────────────────┐
│ Google Filestore (NFS) │
│ /workspace/users/{user_id}/ │
└──────────────┬──────────────────────┘

┌───────┴────────┐
│ │
┌───▼───┐ ┌───▼───┐
│ Pod-0 │ │ Pod-1 │ ... Pod-N
└───────┘ └───────┘
(mount /workspace)

Implementation:

# filestore.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: filestore-workspace
spec:
capacity:
storage: 1Ti
accessModes:
- ReadWriteMany
nfs:
server: 10.0.0.2 # Filestore IP
path: /workspace
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-shared
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Ti

Pros:

  • ✅ POSIX-compliant (full filesystem semantics)
  • ✅ ReadWriteMany - all pods access simultaneously
  • ✅ Simple implementation (standard NFS mount)
  • ✅ Familiar mental model for users
  • ✅ Works with existing IDE tools (no code changes)

Cons:

  • Cost: $200-400/month for 1TB (Basic tier: $0.20/GB/month)
  • Performance: Network latency on file ops (10-50ms)
  • Single Point of Failure: Filestore outage = all pods down
  • Scaling Limits: Max 100TB, 60MB/s per TB
  • No built-in versioning: Need separate backup solution

Cost Breakdown (1TB Filestore Basic):

  • Storage: $204.80/month ($0.20/GB × 1024 GB)
  • Operations: Included
  • Total: ~$205/month

Performance Expectations:

  • Read latency: 5-15ms (cached)
  • Write latency: 10-30ms (sync)
  • Throughput: 60-100 MB/s (Basic tier)
  • IOPS: 1000-3000 (depending on file size)

When to Use:

  • Need full POSIX compliance
  • < 50 concurrent users
  • Can tolerate 10-30ms latency
  • Budget allows $200+/month storage

Option 2: Google Cloud Storage (GCS) with gcsfuse

Architecture:

┌─────────────────────────────────────┐
│ GCS Bucket: coditect-workspaces │
│ users/{user_id}/files/ │
└──────────────┬──────────────────────┘
│ (gcsfuse mount)
┌───────┴────────┐
│ │
┌───▼───┐ ┌───▼───┐
│ Pod-0 │ │ Pod-1 │ ... Pod-N
└───────┘ └───────┘
(mount /workspace via gcsfuse)

Implementation:

# gcs-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: gcs-fuse-mounter
spec:
template:
spec:
containers:
- name: gcs-fuse
image: gcr.io/gcs-fuse-csi-driver/gcs-fuse-csi-driver
securityContext:
privileged: true
volumeMounts:
- name: gcs-mount
mountPath: /workspace
mountPropagation: Bidirectional
volumes:
- name: gcs-mount
csi:
driver: gcsfuse.csi.storage.gke.io
volumeAttributes:
bucketName: coditect-workspaces
mountOptions: "implicit-dirs,file-mode=644,dir-mode=755"

Pros:

  • Cost-Effective: $20-50/month for 1TB (Standard: $0.020/GB/month)
  • Scalable: Unlimited capacity, auto-scales
  • Durable: 99.999999999% durability (11 nines)
  • Versioning: Built-in object versioning
  • No SPOF: Highly available by design

Cons:

  • Not POSIX-compliant: Limited metadata, no hard links, atomic ops
  • Eventual Consistency: Directory listings may lag
  • Latency: 50-200ms for small file ops (network roundtrip)
  • Cache Complexity: Need aggressive caching for IDE performance
  • Debugging Harder: FUSE layer adds complexity

Cost Breakdown (1TB GCS Standard):

  • Storage: $20.48/month ($0.020/GB × 1024 GB)
  • Operations: ~$5/month (Class A: 100K ops, Class B: 1M ops)
  • Network egress: Minimal (same region)
  • Total: ~$25-30/month

Performance Expectations:

  • Read latency: 50-150ms (uncached)
  • Write latency: 100-300ms (sync)
  • Throughput: Limited by network (typically 100-500 MB/s)
  • IOPS: 1000-5000 (highly variable)

POSIX Limitations:

# These operations DON'T work properly with gcsfuse:
ln file1 file2 # Hard links not supported
mv file1 file2 # Not atomic across "directories"
flock file # File locking unreliable
stat file # Some metadata missing (ctime, etc.)

When to Use:

  • Cost is primary concern
  • Can tolerate eventual consistency
  • Workload is read-heavy (can cache aggressively)
  • Budget allows $25-30/month storage

Option 3: User-Specific PVCs with Pod Affinity

Architecture:

┌─────────────────┐  ┌─────────────────┐
│ user-abc-pvc │ │ user-xyz-pvc │
│ (50 GB) │ │ (50 GB) │
└────────┬────────┘ └────────┬────────┘
│ (affinity) │ (affinity)
┌───▼───┐ ┌───▼───┐
│ Pod-0 │ │ Pod-1 │ ... Pod-N
└───────┘ └───────┘
(user-abc) (user-xyz)

Implementation:

# user-pvc-template.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-{{ user_id }}
labels:
user: {{ user_id }}
app: coditect-workspace
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: standard-rwo
---
# Modified StatefulSet with dynamic PVC binding
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: coditect-combined
spec:
template:
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: user
operator: In
values: ["{{ user_id }}"]
topologyKey: kubernetes.io/hostname
volumes:
- name: workspace
persistentVolumeClaim:
claimName: workspace-{{ user_id }}

Pros:

  • Full POSIX: Standard block storage (ext4/xfs)
  • Per-User Isolation: Each user gets dedicated PVC
  • Performance: Local SSD possible (sub-ms latency)
  • Kubernetes-Native: No external dependencies
  • Snapshot Support: GKE persistent disk snapshots

Cons:

  • Complex Scheduling: Need custom controller for PVC→Pod binding
  • Scale Challenges: 1000 users = 1000 PVCs (management overhead)
  • Pod Stickiness: User tied to pod, can't easily switch
  • Wasted Capacity: Idle user PVCs still consume storage
  • Cold Start: Attaching PVC to new pod takes 30-60s

Cost Breakdown (50 GB × 20 users):

  • Storage: $20/month (20 users × 50 GB × $0.020/GB)
  • Snapshots: ~$5/month (weekly backups)
  • Total: ~$25/month (at 20 users)

Scaling Cost:

  • 100 users: $100/month (5 TB total)
  • 500 users: $500/month (25 TB total)

Performance Expectations:

  • Read latency: <1ms (SSD)
  • Write latency: <5ms (SSD)
  • Throughput: 240 MB/s (standard PD)
  • IOPS: 15K-30K (SSD)

When to Use:

  • Need maximum performance
  • User count < 100
  • Can implement custom PVC lifecycle management
  • Budget scales linearly with users

Architecture:

┌─────────────────────────────────────┐
│ Shared Base Image (Read-Only) │
│ /app/base/ - IDE, tools, templates │
└──────────────┬──────────────────────┘
│ (overlay mount)
┌───────┴────────┐
│ │
┌───▼───────────┐ ┌───▼───────────┐
│ user-abc-pvc │ │ user-xyz-pvc │
│ (10 GB) │ │ (10 GB) │
│ /workspace/ │ │ /workspace/ │
└───────────────┘ └───────────────┘

Implementation:

# hybrid-storage.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: shared-base-config
data:
# IDE templates, default configs, tool binaries
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-{{ user_id }}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi # User files only
---
# Pod mounts both shared base (RO) and user PVC (RW)
apiVersion: v1
kind: Pod
spec:
containers:
- name: theia
volumeMounts:
- name: shared-base
mountPath: /app/base
readOnly: true
- name: user-workspace
mountPath: /workspace
volumes:
- name: shared-base
configMap:
name: shared-base-config
- name: user-workspace
persistentVolumeClaim:
claimName: workspace-{{ user_id }}

Pros:

  • Cost Efficient: 10 GB per user vs 50 GB (80% reduction)
  • Fast Provisioning: Smaller PVCs attach faster (<10s)
  • Shared Updates: Base image updated once, all pods benefit
  • POSIX Compliant: User workspace is standard block storage
  • Good Performance: User files on SSD, base cached in memory

Cons:

  • ⚠️ Two-Tier Complexity: Need to manage base + user storage
  • ⚠️ Base Image Updates: Require pod restarts to pick up changes
  • ⚠️ Path Management: Apps need to know base vs workspace paths

Cost Breakdown (10 GB × 20 users):

  • User PVCs: $4/month (20 users × 10 GB × $0.020/GB)
  • Shared base: $1/month (50 GB ConfigMap storage)
  • Snapshots: $2/month
  • Total: ~$7/month (at 20 users)

Scaling Cost:

  • 100 users: $20/month (1 TB user data + 50 GB base)
  • 500 users: $100/month (5 TB user data + 50 GB base)

Performance Expectations:

  • Read latency: <1ms (user files), <0.1ms (base cached)
  • Write latency: <5ms (user files only)
  • Throughput: 240 MB/s (standard PD)
  • IOPS: 15K-30K (SSD)

When to Use:

  • Need cost efficiency + performance
  • Have clear separation: base (tools) vs user (files)
  • Can implement dual-mount architecture
  • RECOMMENDED FOR MVP

Cost Comparison Summary

Option20 Users100 Users500 UsersCost per User/Month
NFS (Filestore)$205$410$2,050$10.25 (fixed overhead)
GCS (gcsfuse)$30$60$300$1.50
User PVCs (50 GB)$25$100$500$1.25
Hybrid (10 GB)$7$20$100$0.35

Performance Comparison

OperationNFSGCSUser PVCHybrid
Small File Read10ms100ms<1ms<1ms
Large File Read50ms200ms5ms5ms
File Write30ms300ms5ms5ms
Directory List20ms500ms2ms2ms
Git Clone (100MB)5s30s2s2s
npm install (500 pkgs)45s5min20s20s

Recommendation: Hybrid Approach (Option 4)

Why Hybrid is Best for MVP:

  1. Cost: $7/month for 20 users vs $205/month (NFS) - 96% savings
  2. Performance: SSD latency for user files (<1ms) - IDE responsive
  3. Scalability: Linear cost growth ($0.35/user/month) - Predictable
  4. Simplicity: Standard Kubernetes PVCs - No external dependencies
  5. Isolation: Per-user PVCs - Security compliant

Implementation Plan:

Phase 1: Base Image Design (2-3 hours)

Create shared base with:

/app/base/
├── .coditect/ # IDE configs, agents, skills
├── tools/ # Global tools (git, npm, etc.)
├── templates/ # Project templates
└── extensions/ # theia extensions

ConfigMap Creation:

# Create base image tarball
tar -czf base-image.tar.gz .coditect/ tools/ templates/ extensions/

# Create ConfigMap from tarball
kubectl create configmap shared-base-config \
--from-file=base-image.tar.gz \
-n coditect-app

# Pods extract on startup
kubectl exec -it coditect-combined-0 -- \
tar -xzf /app/base/base-image.tar.gz -C /app/base/

Phase 2: User PVC Provisioning (3-4 hours)

Dynamic PVC Creation Script:

#!/bin/bash
# scripts/create-user-workspace.sh

USER_ID=$1
PVC_SIZE=${2:-10Gi}

kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: workspace-${USER_ID}
namespace: coditect-app
labels:
user: ${USER_ID}
app: coditect-workspace
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: ${PVC_SIZE}
storageClassName: standard-rwo
EOF

echo "Created PVC workspace-${USER_ID} (${PVC_SIZE})"

Phase 3: StatefulSet Modification (4-5 hours)

Updated StatefulSet:

# k8s/theia-statefulset-hybrid.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: coditect-combined
namespace: coditect-app
spec:
replicas: 3
template:
spec:
containers:
- name: combined
image: us-central1-docker.pkg.dev/.../coditect-combined:latest
volumeMounts:
# Shared base (read-only)
- name: shared-base
mountPath: /app/base
readOnly: true
# User workspace (read-write, dynamically assigned)
- name: user-workspace
mountPath: /workspace
env:
- name: BASE_PATH
value: "/app/base"
- name: WORKSPACE_PATH
value: "/workspace"
volumes:
- name: shared-base
configMap:
name: shared-base-config
# User PVC assigned dynamically based on session
- name: user-workspace
persistentVolumeClaim:
claimName: workspace-{{ user_id }}

Challenge: Dynamic PVC assignment requires custom logic. Options:

  1. Pod Mutation Webhook - Intercept pod creation, inject correct PVC name
  2. Session-Based Routing - Load balancer routes user to pod with their PVC
  3. Pre-Provisioned Pod Pools - Create pods with pre-attached PVCs, assign to users

Recommended: Session-Based Routing (simplest for MVP)

Phase 4: Session-Based Routing (5-6 hours)

Architecture:

User Login → FDB Session → Get user_id → Find pod with workspace-{user_id} → Route to pod

Implementation:

# backend/src/handlers/session.py

async def get_user_pod(user_id: str) -> str:
"""Find pod with user's PVC attached."""
# Query Kubernetes API for pods with volume workspace-{user_id}
pods = await k8s_client.list_namespaced_pod(
namespace="coditect-app",
label_selector=f"user={user_id}"
)

if not pods.items:
# No pod with user's PVC - need to assign one
return await assign_user_to_pod(user_id)

# Return first healthy pod
for pod in pods.items:
if pod.status.phase == "Running":
return pod.metadata.name

raise Exception(f"No healthy pod for user {user_id}")

async def assign_user_to_pod(user_id: str) -> str:
"""Assign user to available pod, attach PVC."""
# Find pod with capacity
available_pods = await find_pods_with_capacity()
if not available_pods:
raise Exception("No pods with capacity")

pod_name = available_pods[0]

# Label pod with user_id
await k8s_client.patch_namespaced_pod(
name=pod_name,
namespace="coditect-app",
body={"metadata": {"labels": {"user": user_id}}}
)

# Attach user's PVC (requires pod restart)
# ... implementation details ...

return pod_name

Challenge: Can't dynamically attach PVCs to running pods. Options:

  1. Pre-attach PVCs - Each pod has multiple PVC slots, mount on-demand
  2. Pod Restart - Restart pod with new PVC (30-60s downtime)
  3. Dedicated User Pods - 1 pod per user (expensive at scale)

Recommended for MVP: Pre-attach PVC slots (mount 5-10 PVCs per pod, activate on user login)

Phase 5: Testing & Validation (3-4 hours)

Test Scenarios:

# 1. User creates files, switches pods
kubectl exec -it coditect-combined-0 -- touch /workspace/test.txt
# User logs out, logs back in (routed to pod-1)
kubectl exec -it coditect-combined-1 -- ls /workspace/test.txt
# Expected: File exists

# 2. Pod scales down, user data persists
kubectl scale statefulset coditect-combined --replicas=2
# User logs in (routed to pod-0)
kubectl exec -it coditect-combined-0 -- ls /workspace/test.txt
# Expected: File exists

# 3. Base image update
kubectl delete configmap shared-base-config
kubectl create configmap shared-base-config --from-file=new-base.tar.gz
kubectl rollout restart statefulset coditect-combined
# Expected: All pods get new base, user data intact

Implementation Timeline

PhaseDurationDependenciesDeliverable
Phase 1: Base Image2-3 hoursNoneConfigMap with shared tools
Phase 2: User PVCs3-4 hoursPhase 1PVC creation script
Phase 3: StatefulSet4-5 hoursPhase 2Hybrid storage manifest
Phase 4: Routing5-6 hoursPhase 3Session-based pod assignment
Phase 5: Testing3-4 hoursPhase 4Validated persistent storage
Total17-22 hours-Production-ready storage

Migration Path (From Current → Hybrid)

Step 1: Backup Current Data (30 min)

# Snapshot all existing PVCs
for i in 0 1 2; do
kubectl exec coditect-combined-$i -- tar -czf /tmp/backup-$i.tar.gz /workspace
kubectl cp coditect-app/coditect-combined-$i:/tmp/backup-$i.tar.gz ./backup-pod-$i.tar.gz
done

Step 2: Create Base Image (1 hour)

# Extract common files from pod-0
kubectl exec coditect-combined-0 -- tar -czf /tmp/base.tar.gz /app/.coditect /app/tools
kubectl cp coditect-app/coditect-combined-0:/tmp/base.tar.gz ./base-image.tar.gz

# Create ConfigMap
kubectl create configmap shared-base-config \
--from-file=base-image.tar.gz \
-n coditect-app

Step 3: Create User PVCs (1 hour)

# For each existing user, create 10 GB PVC
./scripts/create-user-workspace.sh user-001 10Gi
./scripts/create-user-workspace.sh user-002 10Gi
# ... repeat for all users

Step 4: Migrate User Data (2 hours)

# Copy user files from old 50 GB PVC to new 10 GB PVC
kubectl run migration-pod --image=busybox --restart=Never \
--overrides='
{
"spec": {
"volumes": [
{"name": "old-pvc", "persistentVolumeClaim": {"claimName": "workspace-old"}},
{"name": "new-pvc", "persistentVolumeClaim": {"claimName": "workspace-user-001"}}
],
"containers": [{
"name": "migration",
"image": "busybox",
"command": ["sh", "-c", "cp -r /old/workspace/* /new/workspace/"],
"volumeMounts": [
{"name": "old-pvc", "mountPath": "/old"},
{"name": "new-pvc", "mountPath": "/new"}
]
}]
}
}'

# Wait for completion
kubectl wait --for=condition=complete pod/migration-pod --timeout=600s

Step 5: Deploy Hybrid StatefulSet (30 min)

# Apply new manifest
kubectl apply -f k8s/theia-statefulset-hybrid.yaml

# Verify all pods running
kubectl get pods -n coditect-app -l app=coditect-combined

# Check mounts
kubectl exec coditect-combined-0 -- df -h /app/base /workspace

Step 6: Validate & Cleanup (1 hour)

# Test user login and file access
curl -X POST https://api.coditect.ai/login -d '{"username":"user-001","password":"..."}'
# Open IDE, verify files present

# Delete old PVCs (after validation)
kubectl delete pvc workspace-old -n coditect-app

# Monitor for 24 hours before deleting backups

Total Migration Time: 6-7 hours (including validation)


Monitoring & Alerts

Critical Metrics:

# PVC usage per user
kubectl get pvc -n coditect-app -o json | \
jq '.items[] | {name: .metadata.name, capacity: .spec.resources.requests.storage, used: .status.capacity.storage}'

# Pod-PVC binding status
kubectl get pods -n coditect-app -o json | \
jq '.items[] | {pod: .metadata.name, pvcs: [.spec.volumes[] | select(.persistentVolumeClaim) | .persistentVolumeClaim.claimName]}'

Recommended Alerts:

  1. PVC >90% full → Expand or warn user
  2. PVC attach failure → User can't access workspace
  3. Base ConfigMap update failed → Pods running stale configs
  4. Orphaned PVCs → User deleted but PVC remains (cost waste)

Conclusion

Recommended Solution: Hybrid Storage (Option 4)

Why:

  • 96% cost savings vs NFS ($7/month vs $205/month for 20 users)
  • <1ms latency for user files (SSD performance)
  • Linear scaling ($0.35/user/month - predictable costs)
  • No external dependencies (pure Kubernetes)
  • Security compliant (per-user PVC isolation)

Next Steps:

  1. ✅ Review and approve this analysis
  2. Create base image ConfigMap (2-3 hours)
  3. Implement user PVC provisioning (3-4 hours)
  4. Update StatefulSet for hybrid mounts (4-5 hours)
  5. Implement session-based routing (5-6 hours)
  6. Test with 5 beta users (3-4 hours)
  7. Migrate existing users (6-7 hours)

Timeline: Ready for MVP in 24-30 hours (3-4 days)

Cost Impact: $7/month for 20 users (vs $500/month with current 10-pod scaling plan)


Questions for User:

  1. Approve Hybrid approach for MVP?
  2. Acceptable timeline (3-4 days)?
  3. Preferred migration strategy (all at once vs phased)?