Persistent Storage Strategy for Dynamic Pod Scaling

Date: 2025-10-28 Status: CRITICAL - MVP Blocker Issue: User workspace data tied to specific pods, lost on pod scale-down/failure

Problem Statement

Current Architecture:

StatefulSet with pod-local PVCs (50 GB workspace + 5 GB config per pod)
Session affinity: ClientIP with 3-hour timeout
HPA scaling: 3-30 pods based on CPU/Memory

Critical Issue:

User creates files in pod-0 → Pod-0 scales down → User data LOST

Why This Blocks MVP:

❌ Users lose work when pods scale down
❌ Can't switch pods even if another has capacity
❌ Session affinity (3 hours) is temporary workaround
❌ Unacceptable data loss risk for production

Requirements

Requirement	Priority	Rationale
Data Persistence	P0	No data loss on pod lifecycle events
Pod Portability	P0	Users can access data from any pod
Performance	P1	IDE must be responsive (<100ms file ops)
Multi-User Isolation	P1	Users can't access each other's data
Cost Efficiency	P2	Minimize storage costs at MVP scale
Backup/Recovery	P2	Point-in-time restore capability

Storage Options Analysis

Option 1: Shared NFS (Google Filestore)

Architecture:

┌─────────────────────────────────────┐
│  Google Filestore (NFS)             │
│  /workspace/users/{user_id}/        │
└──────────────┬──────────────────────┘
               │
       ┌───────┴────────┐
       │                │
   ┌───▼───┐       ┌───▼───┐
   │ Pod-0 │       │ Pod-1 │  ... Pod-N
   └───────┘       └───────┘
   (mount /workspace)

Implementation:

# filestore.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: filestore-workspace
spec:
  capacity:
    storage: 1Ti
  accessModes:
    - ReadWriteMany
  nfs:
    server: 10.0.0.2  # Filestore IP
    path: /workspace
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: workspace-shared
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Ti

Pros:

✅ POSIX-compliant (full filesystem semantics)
✅ ReadWriteMany - all pods access simultaneously
✅ Simple implementation (standard NFS mount)
✅ Familiar mental model for users
✅ Works with existing IDE tools (no code changes)

Cons:

❌ Cost: $200-400/month for 1TB (Basic tier: $0.20/GB/month)
❌ Performance: Network latency on file ops (10-50ms)
❌ Single Point of Failure: Filestore outage = all pods down
❌ Scaling Limits: Max 100TB, 60MB/s per TB
❌ No built-in versioning: Need separate backup solution

Cost Breakdown (1TB Filestore Basic):

Storage: $204.80/month ($0.20/GB × 1024 GB)
Operations: Included
Total: ~$205/month

Performance Expectations:

Read latency: 5-15ms (cached)
Write latency: 10-30ms (sync)
Throughput: 60-100 MB/s (Basic tier)
IOPS: 1000-3000 (depending on file size)

When to Use:

Need full POSIX compliance
< 50 concurrent users
Can tolerate 10-30ms latency
Budget allows $200+/month storage

Option 2: Google Cloud Storage (GCS) with gcsfuse

Architecture:

┌─────────────────────────────────────┐
│  GCS Bucket: coditect-workspaces    │
│  users/{user_id}/files/             │
└──────────────┬──────────────────────┘
               │ (gcsfuse mount)
       ┌───────┴────────┐
       │                │
   ┌───▼───┐       ┌───▼───┐
   │ Pod-0 │       │ Pod-1 │  ... Pod-N
   └───────┘       └───────┘
   (mount /workspace via gcsfuse)

Implementation:

# gcs-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: gcs-fuse-mounter
spec:
  template:
    spec:
      containers:
      - name: gcs-fuse
        image: gcr.io/gcs-fuse-csi-driver/gcs-fuse-csi-driver
        securityContext:
          privileged: true
        volumeMounts:
        - name: gcs-mount
          mountPath: /workspace
          mountPropagation: Bidirectional
      volumes:
      - name: gcs-mount
        csi:
          driver: gcsfuse.csi.storage.gke.io
          volumeAttributes:
            bucketName: coditect-workspaces
            mountOptions: "implicit-dirs,file-mode=644,dir-mode=755"

Pros:

✅ Cost-Effective: $20-50/month for 1TB (Standard: $0.020/GB/month)
✅ Scalable: Unlimited capacity, auto-scales
✅ Durable: 99.999999999% durability (11 nines)
✅ Versioning: Built-in object versioning
✅ No SPOF: Highly available by design

Cons:

❌ Not POSIX-compliant: Limited metadata, no hard links, atomic ops
❌ Eventual Consistency: Directory listings may lag
❌ Latency: 50-200ms for small file ops (network roundtrip)
❌ Cache Complexity: Need aggressive caching for IDE performance
❌ Debugging Harder: FUSE layer adds complexity

Cost Breakdown (1TB GCS Standard):

Storage: $20.48/month ($0.020/GB × 1024 GB)
Operations: ~$5/month (Class A: 100K ops, Class B: 1M ops)
Network egress: Minimal (same region)
Total: ~$25-30/month

Performance Expectations:

Read latency: 50-150ms (uncached)
Write latency: 100-300ms (sync)
Throughput: Limited by network (typically 100-500 MB/s)
IOPS: 1000-5000 (highly variable)

POSIX Limitations:

# These operations DON'T work properly with gcsfuse:
ln file1 file2           # Hard links not supported
mv file1 file2           # Not atomic across "directories"
flock file               # File locking unreliable
stat file                # Some metadata missing (ctime, etc.)

When to Use:

Cost is primary concern
Can tolerate eventual consistency
Workload is read-heavy (can cache aggressively)
Budget allows $25-30/month storage

Option 3: User-Specific PVCs with Pod Affinity

Architecture:

┌─────────────────┐  ┌─────────────────┐
│ user-abc-pvc    │  │ user-xyz-pvc    │
│ (50 GB)         │  │ (50 GB)         │
└────────┬────────┘  └────────┬────────┘
         │ (affinity)         │ (affinity)
     ┌───▼───┐            ┌───▼───┐
     │ Pod-0 │            │ Pod-1 │  ... Pod-N
     └───────┘            └───────┘
     (user-abc)           (user-xyz)

Implementation:

# user-pvc-template.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: workspace-{{ user_id }}
  labels:
    user: {{ user_id }}
    app: coditect-workspace
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: standard-rwo
---
# Modified StatefulSet with dynamic PVC binding
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: coditect-combined
spec:
  template:
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: user
                operator: In
                values: ["{{ user_id }}"]
            topologyKey: kubernetes.io/hostname
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: workspace-{{ user_id }}

Pros:

✅ Full POSIX: Standard block storage (ext4/xfs)
✅ Per-User Isolation: Each user gets dedicated PVC
✅ Performance: Local SSD possible (sub-ms latency)
✅ Kubernetes-Native: No external dependencies
✅ Snapshot Support: GKE persistent disk snapshots

Cons:

❌ Complex Scheduling: Need custom controller for PVC→Pod binding
❌ Scale Challenges: 1000 users = 1000 PVCs (management overhead)
❌ Pod Stickiness: User tied to pod, can't easily switch
❌ Wasted Capacity: Idle user PVCs still consume storage
❌ Cold Start: Attaching PVC to new pod takes 30-60s

Cost Breakdown (50 GB × 20 users):

Storage: $20/month (20 users × 50 GB × $0.020/GB)
Snapshots: ~$5/month (weekly backups)
Total: ~$25/month (at 20 users)

Scaling Cost:

100 users: $100/month (5 TB total)
500 users: $500/month (25 TB total)

Performance Expectations:

Read latency: <1ms (SSD)
Write latency: <5ms (SSD)
Throughput: 240 MB/s (standard PD)
IOPS: 15K-30K (SSD)

When to Use:

Need maximum performance
User count < 100
Can implement custom PVC lifecycle management
Budget scales linearly with users

Option 4: Hybrid - Shared Base + User Overlays (RECOMMENDED)

Architecture:

┌─────────────────────────────────────┐
│  Shared Base Image (Read-Only)      │
│  /app/base/ - IDE, tools, templates │
└──────────────┬──────────────────────┘
               │ (overlay mount)
       ┌───────┴────────┐
       │                │
   ┌───▼───────────┐   ┌───▼───────────┐
   │ user-abc-pvc  │   │ user-xyz-pvc  │
   │ (10 GB)       │   │ (10 GB)       │
   │ /workspace/   │   │ /workspace/   │
   └───────────────┘   └───────────────┘

Implementation:

# hybrid-storage.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: shared-base-config
data:
  # IDE templates, default configs, tool binaries
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: workspace-{{ user_id }}
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi  # User files only
---
# Pod mounts both shared base (RO) and user PVC (RW)
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: theia
    volumeMounts:
    - name: shared-base
      mountPath: /app/base
      readOnly: true
    - name: user-workspace
      mountPath: /workspace
  volumes:
  - name: shared-base
    configMap:
      name: shared-base-config
  - name: user-workspace
    persistentVolumeClaim:
      claimName: workspace-{{ user_id }}

Pros:

✅ Cost Efficient: 10 GB per user vs 50 GB (80% reduction)
✅ Fast Provisioning: Smaller PVCs attach faster (<10s)
✅ Shared Updates: Base image updated once, all pods benefit
✅ POSIX Compliant: User workspace is standard block storage
✅ Good Performance: User files on SSD, base cached in memory

Cons:

⚠️ Two-Tier Complexity: Need to manage base + user storage
⚠️ Base Image Updates: Require pod restarts to pick up changes
⚠️ Path Management: Apps need to know base vs workspace paths

Cost Breakdown (10 GB × 20 users):

User PVCs: $4/month (20 users × 10 GB × $0.020/GB)
Shared base: $1/month (50 GB ConfigMap storage)
Snapshots: $2/month
Total: ~$7/month (at 20 users)

Scaling Cost:

100 users: $20/month (1 TB user data + 50 GB base)
500 users: $100/month (5 TB user data + 50 GB base)

Performance Expectations:

Read latency: <1ms (user files), <0.1ms (base cached)
Write latency: <5ms (user files only)
Throughput: 240 MB/s (standard PD)
IOPS: 15K-30K (SSD)

When to Use:

Need cost efficiency + performance
Have clear separation: base (tools) vs user (files)
Can implement dual-mount architecture
RECOMMENDED FOR MVP

Cost Comparison Summary

Option	20 Users	100 Users	500 Users	Cost per User/Month
NFS (Filestore)	$205	$410	$2,050	$10.25 (fixed overhead)
GCS (gcsfuse)	$30	$60	$300	$1.50
User PVCs (50 GB)	$25	$100	$500	$1.25
Hybrid (10 GB)	$7	$20	$100	$0.35

Performance Comparison

Operation	NFS	GCS	User PVC	Hybrid
Small File Read	10ms	100ms	<1ms	<1ms
Large File Read	50ms	200ms	5ms	5ms
File Write	30ms	300ms	5ms	5ms
Directory List	20ms	500ms	2ms	2ms
Git Clone (100MB)	5s	30s	2s	2s
npm install (500 pkgs)	45s	5min	20s	20s

Recommendation: Hybrid Approach (Option 4)

Why Hybrid is Best for MVP:

Cost: $7/month for 20 users vs $205/month (NFS) - 96% savings
Performance: SSD latency for user files (<1ms) - IDE responsive
Scalability: Linear cost growth ($0.35/user/month) - Predictable
Simplicity: Standard Kubernetes PVCs - No external dependencies
Isolation: Per-user PVCs - Security compliant

Implementation Plan:

Phase 1: Base Image Design (2-3 hours)

Create shared base with:

/app/base/
├── .coditect/          # IDE configs, agents, skills
├── tools/              # Global tools (git, npm, etc.)
├── templates/          # Project templates
└── extensions/         # theia extensions

ConfigMap Creation:

# Create base image tarball
tar -czf base-image.tar.gz .coditect/ tools/ templates/ extensions/

# Create ConfigMap from tarball
kubectl create configmap shared-base-config \
  --from-file=base-image.tar.gz \
  -n coditect-app

# Pods extract on startup
kubectl exec -it coditect-combined-0 -- \
  tar -xzf /app/base/base-image.tar.gz -C /app/base/

Phase 2: User PVC Provisioning (3-4 hours)

Dynamic PVC Creation Script:

#!/bin/bash
# scripts/create-user-workspace.sh

USER_ID=$1
PVC_SIZE=${2:-10Gi}

kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: workspace-${USER_ID}
  namespace: coditect-app
  labels:
    user: ${USER_ID}
    app: coditect-workspace
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: ${PVC_SIZE}
  storageClassName: standard-rwo
EOF

echo "Created PVC workspace-${USER_ID} (${PVC_SIZE})"

Phase 3: StatefulSet Modification (4-5 hours)

Updated StatefulSet:

# k8s/theia-statefulset-hybrid.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: coditect-combined
  namespace: coditect-app
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: combined
        image: us-central1-docker.pkg.dev/.../coditect-combined:latest
        volumeMounts:
        # Shared base (read-only)
        - name: shared-base
          mountPath: /app/base
          readOnly: true
        # User workspace (read-write, dynamically assigned)
        - name: user-workspace
          mountPath: /workspace
        env:
        - name: BASE_PATH
          value: "/app/base"
        - name: WORKSPACE_PATH
          value: "/workspace"
      volumes:
      - name: shared-base
        configMap:
          name: shared-base-config
      # User PVC assigned dynamically based on session
      - name: user-workspace
        persistentVolumeClaim:
          claimName: workspace-{{ user_id }}

Challenge: Dynamic PVC assignment requires custom logic. Options:

Pod Mutation Webhook - Intercept pod creation, inject correct PVC name
Session-Based Routing - Load balancer routes user to pod with their PVC
Pre-Provisioned Pod Pools - Create pods with pre-attached PVCs, assign to users

Recommended: Session-Based Routing (simplest for MVP)

Phase 4: Session-Based Routing (5-6 hours)

Architecture:

User Login → FDB Session → Get user_id → Find pod with workspace-{user_id} → Route to pod

Implementation:

# backend/src/handlers/session.py

async def get_user_pod(user_id: str) -> str:
    """Find pod with user's PVC attached."""
    # Query Kubernetes API for pods with volume workspace-{user_id}
    pods = await k8s_client.list_namespaced_pod(
        namespace="coditect-app",
        label_selector=f"user={user_id}"
    )

    if not pods.items:
        # No pod with user's PVC - need to assign one
        return await assign_user_to_pod(user_id)

    # Return first healthy pod
    for pod in pods.items:
        if pod.status.phase == "Running":
            return pod.metadata.name

    raise Exception(f"No healthy pod for user {user_id}")

async def assign_user_to_pod(user_id: str) -> str:
    """Assign user to available pod, attach PVC."""
    # Find pod with capacity
    available_pods = await find_pods_with_capacity()
    if not available_pods:
        raise Exception("No pods with capacity")

    pod_name = available_pods[0]

    # Label pod with user_id
    await k8s_client.patch_namespaced_pod(
        name=pod_name,
        namespace="coditect-app",
        body={"metadata": {"labels": {"user": user_id}}}
    )

    # Attach user's PVC (requires pod restart)
    # ... implementation details ...

    return pod_name

Challenge: Can't dynamically attach PVCs to running pods. Options:

Pre-attach PVCs - Each pod has multiple PVC slots, mount on-demand
Pod Restart - Restart pod with new PVC (30-60s downtime)
Dedicated User Pods - 1 pod per user (expensive at scale)

Recommended for MVP: Pre-attach PVC slots (mount 5-10 PVCs per pod, activate on user login)

Phase 5: Testing & Validation (3-4 hours)

Test Scenarios:

# 1. User creates files, switches pods
kubectl exec -it coditect-combined-0 -- touch /workspace/test.txt
# User logs out, logs back in (routed to pod-1)
kubectl exec -it coditect-combined-1 -- ls /workspace/test.txt
# Expected: File exists

# 2. Pod scales down, user data persists
kubectl scale statefulset coditect-combined --replicas=2
# User logs in (routed to pod-0)
kubectl exec -it coditect-combined-0 -- ls /workspace/test.txt
# Expected: File exists

# 3. Base image update
kubectl delete configmap shared-base-config
kubectl create configmap shared-base-config --from-file=new-base.tar.gz
kubectl rollout restart statefulset coditect-combined
# Expected: All pods get new base, user data intact

Implementation Timeline

Phase	Duration	Dependencies	Deliverable
Phase 1: Base Image	2-3 hours	None	ConfigMap with shared tools
Phase 2: User PVCs	3-4 hours	Phase 1	PVC creation script
Phase 3: StatefulSet	4-5 hours	Phase 2	Hybrid storage manifest
Phase 4: Routing	5-6 hours	Phase 3	Session-based pod assignment
Phase 5: Testing	3-4 hours	Phase 4	Validated persistent storage
Total	17-22 hours	-	Production-ready storage

Migration Path (From Current → Hybrid)

Step 1: Backup Current Data (30 min)

# Snapshot all existing PVCs
for i in 0 1 2; do
  kubectl exec coditect-combined-$i -- tar -czf /tmp/backup-$i.tar.gz /workspace
  kubectl cp coditect-app/coditect-combined-$i:/tmp/backup-$i.tar.gz ./backup-pod-$i.tar.gz
done

Step 2: Create Base Image (1 hour)

# Extract common files from pod-0
kubectl exec coditect-combined-0 -- tar -czf /tmp/base.tar.gz /app/.coditect /app/tools
kubectl cp coditect-app/coditect-combined-0:/tmp/base.tar.gz ./base-image.tar.gz

# Create ConfigMap
kubectl create configmap shared-base-config \
  --from-file=base-image.tar.gz \
  -n coditect-app

Step 3: Create User PVCs (1 hour)

# For each existing user, create 10 GB PVC
./scripts/create-user-workspace.sh user-001 10Gi
./scripts/create-user-workspace.sh user-002 10Gi
# ... repeat for all users

Step 4: Migrate User Data (2 hours)

# Copy user files from old 50 GB PVC to new 10 GB PVC
kubectl run migration-pod --image=busybox --restart=Never \
  --overrides='
  {
    "spec": {
      "volumes": [
        {"name": "old-pvc", "persistentVolumeClaim": {"claimName": "workspace-old"}},
        {"name": "new-pvc", "persistentVolumeClaim": {"claimName": "workspace-user-001"}}
      ],
      "containers": [{
        "name": "migration",
        "image": "busybox",
        "command": ["sh", "-c", "cp -r /old/workspace/* /new/workspace/"],
        "volumeMounts": [
          {"name": "old-pvc", "mountPath": "/old"},
          {"name": "new-pvc", "mountPath": "/new"}
        ]
      }]
    }
  }'

# Wait for completion
kubectl wait --for=condition=complete pod/migration-pod --timeout=600s

Step 5: Deploy Hybrid StatefulSet (30 min)

# Apply new manifest
kubectl apply -f k8s/theia-statefulset-hybrid.yaml

# Verify all pods running
kubectl get pods -n coditect-app -l app=coditect-combined

# Check mounts
kubectl exec coditect-combined-0 -- df -h /app/base /workspace

Step 6: Validate & Cleanup (1 hour)

# Test user login and file access
curl -X POST https://api.coditect.ai/login -d '{"username":"user-001","password":"..."}'
# Open IDE, verify files present

# Delete old PVCs (after validation)
kubectl delete pvc workspace-old -n coditect-app

# Monitor for 24 hours before deleting backups

Total Migration Time: 6-7 hours (including validation)

Monitoring & Alerts

Critical Metrics:

# PVC usage per user
kubectl get pvc -n coditect-app -o json | \
  jq '.items[] | {name: .metadata.name, capacity: .spec.resources.requests.storage, used: .status.capacity.storage}'

# Pod-PVC binding status
kubectl get pods -n coditect-app -o json | \
  jq '.items[] | {pod: .metadata.name, pvcs: [.spec.volumes[] | select(.persistentVolumeClaim) | .persistentVolumeClaim.claimName]}'

Recommended Alerts:

PVC >90% full → Expand or warn user
PVC attach failure → User can't access workspace
Base ConfigMap update failed → Pods running stale configs
Orphaned PVCs → User deleted but PVC remains (cost waste)

Conclusion

Recommended Solution: Hybrid Storage (Option 4)

Why:

✅ 96% cost savings vs NFS ($7/month vs $205/month for 20 users)
✅ <1ms latency for user files (SSD performance)
✅ Linear scaling ($0.35/user/month - predictable costs)
✅ No external dependencies (pure Kubernetes)
✅ Security compliant (per-user PVC isolation)

Next Steps:

✅ Review and approve this analysis
Create base image ConfigMap (2-3 hours)
Implement user PVC provisioning (3-4 hours)
Update StatefulSet for hybrid mounts (4-5 hours)
Implement session-based routing (5-6 hours)
Test with 5 beta users (3-4 hours)
Migrate existing users (6-7 hours)

Timeline: Ready for MVP in 24-30 hours (3-4 days)

Cost Impact: $7/month for 20 users (vs $500/month with current 10-pod scaling plan)

Questions for User:

Approve Hybrid approach for MVP?
Acceptable timeline (3-4 days)?
Preferred migration strategy (all at once vs phased)?

Problem Statement​

Requirements​

Storage Options Analysis​

Option 1: Shared NFS (Google Filestore)​

Option 2: Google Cloud Storage (GCS) with gcsfuse​

Option 3: User-Specific PVCs with Pod Affinity​

Option 4: Hybrid - Shared Base + User Overlays (RECOMMENDED)​

Cost Comparison Summary​

Performance Comparison​

Recommendation: Hybrid Approach (Option 4)​

Phase 1: Base Image Design (2-3 hours)​

Phase 2: User PVC Provisioning (3-4 hours)​

Phase 3: StatefulSet Modification (4-5 hours)​

Phase 4: Session-Based Routing (5-6 hours)​

Phase 5: Testing & Validation (3-4 hours)​

Implementation Timeline​

Migration Path (From Current → Hybrid)​

Step 1: Backup Current Data (30 min)​

Step 2: Create Base Image (1 hour)​

Step 3: Create User PVCs (1 hour)​

Step 4: Migrate User Data (2 hours)​

Step 5: Deploy Hybrid StatefulSet (30 min)​

Step 6: Validate & Cleanup (1 hour)​

Monitoring & Alerts​

Conclusion​

Problem Statement

Requirements

Storage Options Analysis

Option 1: Shared NFS (Google Filestore)

Option 2: Google Cloud Storage (GCS) with gcsfuse

Option 3: User-Specific PVCs with Pod Affinity

Option 4: Hybrid - Shared Base + User Overlays (RECOMMENDED)

Cost Comparison Summary

Performance Comparison

Recommendation: Hybrid Approach (Option 4)

Phase 1: Base Image Design (2-3 hours)

Phase 2: User PVC Provisioning (3-4 hours)

Phase 3: StatefulSet Modification (4-5 hours)

Phase 4: Session-Based Routing (5-6 hours)

Phase 5: Testing & Validation (3-4 hours)

Implementation Timeline

Migration Path (From Current → Hybrid)

Step 1: Backup Current Data (30 min)

Step 2: Create Base Image (1 hour)

Step 3: Create User PVCs (1 hour)

Step 4: Migrate User Data (2 hours)

Step 5: Deploy Hybrid StatefulSet (30 min)

Step 6: Validate & Cleanup (1 hour)

Monitoring & Alerts

Conclusion