ADR-028 Part 1: Hybrid Storage Architecture - Problem & Analysis

Date: 2025-10-28 Status: Under Review → QA: ✅ CONDITIONAL PASS Deciders: System Architect, Infrastructure Team Related: ADR-029 (StatefulSet Migration), Analysis: docs/11-analysis/2025-10-28-persistent-storage-dynamic-pods.md

Context

The initial GKE deployment used StatefulSet volumeClaimTemplates with pod-local storage (50 GB per pod). This pattern is standard for databases and distributed systems where each pod manages unique data.

Example from Kubernetes documentation:

# Standard StatefulSet pattern (databases, Kafka, Cassandra)
volumeClaimTemplates:
- metadata:
    name: data
  spec:
    accessModes: ["ReadWriteOnce"]
    resources:
      requests:
        storage: 50Gi

Works great for:

✅ Databases (Postgres, MySQL) - each pod is a replica
✅ Distributed systems (Kafka, Cassandra) - each pod has unique data

Fails for:

❌ Multi-user IDEs - users need to access workspace from ANY pod

Problem Statement

Issue Discovered: 2025-10-28

During MVP scaling analysis for 20 users with HPA (3-30 pods), a critical data loss scenario was identified:

User creates files in Pod-0 → Pod-0 scales down → User data PERMANENTLY LOST

Root Cause: Architecture Mismatch

What we built (database pattern):

Pod-0 → workspace-coditect-combined-0 (50 GB) → User A's files LOCKED to Pod-0
Pod-1 → workspace-coditect-combined-1 (50 GB) → User B's files LOCKED to Pod-1
Pod-2 → workspace-coditect-combined-2 (50 GB) → User C's files LOCKED to Pod-2

Scale-down event: Pod-0 deleted → workspace-coditect-combined-0 deleted → User A data LOST

What we SHOULD have built (multi-user pattern):

Pod-0 ┐
Pod-1 ├→ workspace-user-A (10 GB) ← User A can access from ANY pod
Pod-2 ┘   workspace-user-B (10 GB) ← User B can access from ANY pod
          workspace-user-C (10 GB) ← User C can access from ANY pod

Scale-down event: Pod-0 deleted → User A logs in → routed to Pod-1 → sees same files

Current Failure Modes

Scenario	Current Behavior	Expected Behavior
User logs out, pod scales down	Data LOST	Data persists, accessible from any pod
User switches IP address	Routed to different pod, can't access files	Same files regardless of pod
Pod crashes	User loses all work since last save	User reconnects, sees exact same state
HPA scales 3→2 pods	1/3 of users lose ALL data	All users unaffected

Why This Wasn't Caught Earlier

1. Cargo Cult Kubernetes

What happened: Copied StatefulSet pattern from database examples without adapting for our use case.

Source: Kubernetes documentation, Helm charts for databases

Assumption: "StatefulSet = persistent storage = correct for all stateful apps"

Reality: StatefulSet is for pod-specific state, not user-portable state

2. Single-User Testing

Test coverage:

✅ One user logs in, creates files, sees files
❌ User switches between pods
❌ User logs in after pod restart
❌ User logs in after pod scale-down

Gap: Multi-user, multi-pod scenarios never tested

3. No Autoscaling Testing

Deployment history:

October 13: Deployed StatefulSet with 3 fixed replicas
October 19-26: Build/deploy iterations (no scaling changes)
October 28: First discussion of autoscaling → problem discovered

Gap: Scaling wasn't considered until MVP planning

4. Missing Architectural Review

What was missing:

❌ No ADR for storage strategy
❌ No question: "What happens to user data when pods scale?"
❌ No multi-user access pattern analysis
❌ No comparison of storage options

Result: Fundamental architectural flaw shipped to production

Requirements

Functional Requirements

ID	Requirement	Priority	Acceptance Criteria
FR-1	Data Persistence	P0	User files survive pod deletion, scale-down, crashes
FR-2	Pod Portability	P0	User can access workspace from any pod (no pod stickiness)
FR-3	Performance	P1	File operations <100ms (IDE responsive)
FR-4	Multi-User Isolation	P1	Users cannot access each other's workspaces
FR-5	Cost Efficiency	P2	Storage costs scale linearly with users
FR-6	Backup/Recovery	P2	Daily snapshots, 7-day retention, point-in-time restore

Non-Functional Requirements

ID	Requirement	Target	Measurement
NFR-1	File Read Latency	<1ms	`fio` benchmark on SSD PVC
NFR-2	File Write Latency	<5ms	`fio` benchmark on SSD PVC
NFR-3	PVC Attach Time	<10s	`kubectl` PVC attach duration
NFR-4	Storage IOPS	15K-30K	GCE Persistent Disk SSD spec
NFR-5	User Provisioning	<30s	PVC creation + pod assignment
NFR-6	Cost per User	<$1/month	GCP billing (storage only)

Options Analysis

Option 1: Shared NFS (Google Filestore)

Architecture:

┌─────────────────────────────────────┐
│  Google Filestore (NFS Server)      │
│  IP: 10.0.0.2                       │
│  /workspace/users/{user_id}/        │
└──────────────┬──────────────────────┘
               │ (NFS mount)
       ┌───────┴────────┐
       │                │
   ┌───▼───┐       ┌───▼───┐
   │ Pod-0 │       │ Pod-1 │  ... Pod-N
   └───────┘       └───────┘
   All pods mount /workspace via NFS

Implementation:

# Persistent Volume (NFS)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: filestore-workspace
spec:
  capacity:
    storage: 1Ti
  accessModes:
    - ReadWriteMany
  nfs:
    server: 10.0.0.2  # Filestore IP
    path: /workspace
---
# PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: workspace-shared
  namespace: coditect-app
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: ""  # No storage class (manual PV binding)
  resources:
    requests:
      storage: 1Ti

Pros:

✅ POSIX-Compliant: Full filesystem semantics (hard links, file locking, atomic operations)
✅ ReadWriteMany: All pods can mount simultaneously
✅ Simple Implementation: Standard NFS mount (no custom code)
✅ Familiar Model: Works like a traditional NAS
✅ IDE Compatible: No code changes needed (theia sees local filesystem)

Cons:

❌ Expensive: $204.80/month for 1TB (Basic tier: $0.20/GB/month)
❌ Fixed Overhead: Minimum 1TB even for 3 users
❌ Network Latency: 10-50ms for file operations (vs <1ms for local SSD)
❌ Single Point of Failure: Filestore outage = all pods down
❌ Scaling Limits: Max 100TB, 60MB/s per TB throughput
❌ No Built-in Versioning: Need separate backup solution

Cost Breakdown (Filestore Basic):

Storage: 1024 GB × $0.20/GB = $204.80/month
Operations: Included
Total: $204.80/month (fixed cost regardless of users)

Performance (Google spec + real-world):

Read latency: 5-15ms (cached), 10-50ms (uncached)
Write latency: 10-30ms (sync writes)
Throughput: 60-100 MB/s (Basic tier)
IOPS: 1000-3000 (file-size dependent)

When to use:

50+ concurrent users
Can tolerate 10-30ms latency
Budget allows $200+/month storage
Need true multi-user file sharing (Git, collaborative editing)

Decision: ❌ REJECTED - Too expensive for 10-20 user MVP ($204/month vs $7/month for Hybrid)

Option 2: Google Cloud Storage (GCS) with gcsfuse

Architecture:

┌─────────────────────────────────────┐
│  GCS Bucket: coditect-workspaces    │
│  gs://coditect-workspaces/users/    │
└──────────────┬──────────────────────┘
               │ (gcsfuse FUSE mount)
       ┌───────┴────────┐
       │                │
   ┌───▼───┐       ┌───▼───┐
   │ Pod-0 │       │ Pod-1 │  ... Pod-N
   └───────┘       └───────┘
   gcsfuse mounts GCS bucket at /workspace

Implementation:

# CSI Driver for GCS FUSE
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: gcs-fuse-csi-driver
spec:
  template:
    spec:
      containers:
      - name: gcs-fuse
        image: gcr.io/gcs-fuse-csi-driver/gcs-fuse-csi-driver:latest
        securityContext:
          privileged: true
        volumeMounts:
        - name: gcs-mount
          mountPath: /workspace
          mountPropagation: Bidirectional
      volumes:
      - name: gcs-mount
        csi:
          driver: gcsfuse.csi.storage.gke.io
          volumeAttributes:
            bucketName: coditect-workspaces
            mountOptions: "implicit-dirs,file-mode=644,dir-mode=755"

Pros:

✅ Cost-Effective: $20-30/month for 1TB (Standard: $0.020/GB/month)
✅ Scalable: Unlimited capacity, auto-scales
✅ Durable: 99.999999999% (11 nines) durability
✅ Versioning: Built-in object versioning
✅ No Single Point of Failure: Highly available by design
✅ No Capacity Planning: No pre-provisioning needed

Cons:

❌ Not POSIX-Compliant: Limited metadata, no hard links, no atomic rename across directories
❌ Eventual Consistency: Directory listings may lag behind writes
❌ High Latency: 50-200ms for small file ops (network roundtrip to GCS API)
❌ Cache Complexity: Need aggressive caching for acceptable IDE performance
❌ Debugging Harder: FUSE layer adds complexity to troubleshooting

POSIX Limitations (breaks IDE features):

# These operations DON'T work properly with gcsfuse:
ln file1 file2           # Hard links not supported
mv dir1/file dir2/file   # Not atomic across "directories" (objects with prefixes)
flock /workspace/file    # File locking unreliable
stat /workspace/file     # Missing metadata: ctime, inode number

Cost Breakdown (GCS Standard, 1TB):

Storage: 1024 GB × $0.020/GB = $20.48/month
Class A operations (writes): ~100K/month × $0.05/10K = $0.50
Class B operations (reads): ~1M/month × $0.004/10K = $0.40
Total: ~$21-30/month (variable with usage)

Performance (Google spec + gcsfuse overhead):

Read latency: 50-150ms (uncached), 5-10ms (cached)
Write latency: 100-300ms (API roundtrip + object creation)
Throughput: 100-500 MB/s (network-limited)
IOPS: 1000-5000 (highly variable, not guaranteed)

When to use:

Cost is primary concern
Workload is read-heavy (can cache aggressively)
Can tolerate eventual consistency
Don't need full POSIX (no Git, no file locking)

Decision: ❌ REJECTED - Latency (50-200ms) breaks IDE responsiveness, POSIX incompatibility breaks Git workflows

Option 3: User-Specific PVCs (50 GB per user)

Architecture:

┌─────────────────┐  ┌─────────────────┐
│ workspace-user-a│  │ workspace-user-b│
│ (50 GB PVC)     │  │ (50 GB PVC)     │
└────────┬────────┘  └────────┬────────┘
         │ (pod affinity)     │ (pod affinity)
     ┌───▼───┐            ┌───▼───┐
     │ Pod-0 │            │ Pod-1 │  ... Pod-N
     └───────┘            └───────┘
     (user-a assigned)    (user-b assigned)

Implementation:

# User-specific PVC (created per user)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: workspace-user-{{ user_id }}
  namespace: coditect-app
  labels:
    user: {{ user_id }}
    app: coditect-workspace
spec:
  accessModes:
    - ReadWriteOnce  # Only one pod can mount
  resources:
    requests:
      storage: 50Gi
  storageClassName: standard-rwo
---
# StatefulSet with pod affinity (assigns user to pod with their PVC)
apiVersion: apps/v1
kind: StatefulSet
spec:
  template:
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: user
                operator: In
                values: ["{{ user_id }}"]
            topologyKey: kubernetes.io/hostname
      volumes:
      - name: workspace
        persistentVolumeClaim:
          claimName: workspace-{{ user_id }}

Pros:

✅ Full POSIX: Standard block storage (ext4/xfs filesystem)
✅ Per-User Isolation: Each user gets dedicated PVC (security compliant)
✅ Performance: Local SSD possible (sub-ms latency)
✅ Kubernetes-Native: No external dependencies (Filestore, GCS)
✅ Snapshot Support: GKE persistent disk snapshots built-in

Cons:

❌ Complex Scheduling: Need custom controller for PVC→Pod binding
❌ Scale Challenges: 1000 users = 1000 PVCs (management overhead with kubectl)
❌ Pod Stickiness: User tied to pod, can't easily switch
❌ Wasted Capacity: Idle user PVCs still consume storage (50 GB × 100 idle users = 5 TB waste)
❌ Cold Start: Attaching PVC to new pod takes 30-60s (user waits)

Cost Breakdown (50 GB × 20 users):

Storage: 20 users × 50 GB × $0.020/GB = $20.00/month
Snapshots: 20 users × 7 daily snapshots × 5 GB × $0.026/GB = $18.20/month
Total: ~$38/month (at 20 users)

Scaling Cost:

100 users: $100/month storage + $91/month snapshots = $191/month
500 users: $500/month storage + $455/month snapshots = $955/month

Performance (GCE Persistent Disk SSD):

Read latency: <1ms (local SSD)
Write latency: <5ms (SSD)
Throughput: 240 MB/s (standard PD)
IOPS: 15K-30K (SSD)

When to use:

Need maximum performance
User count < 100
Can implement custom PVC lifecycle management
Budget scales linearly with users

Decision: ⚠️ PARTIAL - Good performance, but wastes storage on duplicated base files (IDE tools, configs). Leads to Option 4 (Hybrid).

➡️ Option 4: Hybrid Storage - Shared Base + User Overlays (RECOMMENDED)

See ADR-028 Part 2 for full decision and implementation.

Comparative Summary

Criteria	NFS (Filestore)	GCS (gcsfuse)	User PVCs (50 GB)	Hybrid (10 GB)
Cost (20 users)	$205/month	$30/month	$38/month	$7/month
Cost per user	$10.25	$1.50	$1.90	$0.35
File Read Latency	10-50ms	50-200ms	<1ms	<1ms
File Write Latency	10-30ms	100-300ms	<5ms	<5ms
POSIX Compliant	✅ Yes	❌ No	✅ Yes	✅ Yes
Pod Portability	✅ Yes	✅ Yes	⚠️ Limited	✅ Yes
Storage Waste	❌ 1TB min	✅ None	⚠️ Moderate	✅ Minimal
Implementation	Simple	Medium	Complex	Medium
Scaling (500 users)	$2,050/month	$300/month	$955/month	$100/month

Winner: Hybrid Storage - 96% cost savings vs NFS, <1ms performance, $0.35/user/month

ADR-029: StatefulSet with Persistent Storage Migration (2025-10-27) - Documents initial StatefulSet migration
ADR-004: FoundationDB for Persistence - Session metadata storage (orthogonal to workspace files)
ADR-006: OPFS for Browser Storage - Browser-side cache (orthogonal to server-side storage)
ADR-020: GCP Deployment Strategy - OUTDATED (references Cloud Run, project uses GKE)
ADR-026: Wrapper Persistence Architecture - UI wrapper state (orthogonal to workspace storage)

Lessons Learned

What Went Wrong

❌ Cargo Cult Kubernetes: Copied StatefulSet pattern without adapting for multi-user access
❌ No Testing of Scaling Scenarios: Didn't test pod scale-down, user switching between pods
❌ Assumed "Working Deployment" = "Correct Architecture": Initial deployment worked for 1 user, shipped without multi-user validation
❌ No ADR for Storage Strategy: Skipped architectural review, missed fundamental flaw

What Should Have Happened

✅ Written ADR for storage strategy BEFORE implementation
✅ Asked: "What happens to user data when pods scale down?"
✅ Tested multi-user scenarios: User switches pods, pod failures, autoscaling
✅ Designed for 100 users from day 1 (not just 3 pods)

Takeaway

Don't cargo cult Kubernetes patterns. Understand YOUR access patterns first:

Databases: Pod-specific state (StatefulSet volumeClaimTemplates = correct)
Multi-user IDEs: User-portable state (StatefulSet volumeClaimTemplates = WRONG)

QA Review Status

Quality Gate: ✅ CONDITIONAL PASS (2025-10-28)

Issues Found: 5 (2 Critical, 3 Important)

Issue	Severity	Status
ConfigMap size limit (1 MB max, not 50 GB)	Critical	Addressed in Part 2
Dynamic PVC strategy unclear	Critical	Addressed in Part 2
Timeline underestimated (17-22h → 30-38h)	Important	Corrected in Part 2
Missing backup strategy	Important	Added as Phase 6 in Part 2
Missing ADR-029 reference	Minor	✅ Fixed above

Next: See ADR-028 Part 2 for decision, implementation plan, and QA-validated architecture.

Document Status: ✅ Part 1 Complete Next Step: Review Part 2 (Decision & Implementation) Approval Required: System Architect sign-off before implementation

Context​

Problem Statement​

Issue Discovered: 2025-10-28​

Root Cause: Architecture Mismatch​

Current Failure Modes​

Why This Wasn't Caught Earlier​

1. Cargo Cult Kubernetes​

2. Single-User Testing​

3. No Autoscaling Testing​

4. Missing Architectural Review​

Requirements​

Functional Requirements​

Non-Functional Requirements​

Options Analysis​

Option 1: Shared NFS (Google Filestore)​

Option 2: Google Cloud Storage (GCS) with gcsfuse​

Option 3: User-Specific PVCs (50 GB per user)​

➡️ Option 4: Hybrid Storage - Shared Base + User Overlays (RECOMMENDED)​

Comparative Summary​

Related ADRs​

Lessons Learned​

What Went Wrong​

What Should Have Happened​

Takeaway​

QA Review Status​

Context

Problem Statement

Issue Discovered: 2025-10-28

Root Cause: Architecture Mismatch

Current Failure Modes

Why This Wasn't Caught Earlier

1. Cargo Cult Kubernetes

2. Single-User Testing

3. No Autoscaling Testing

4. Missing Architectural Review

Requirements

Functional Requirements

Non-Functional Requirements

Options Analysis

Option 1: Shared NFS (Google Filestore)

Option 2: Google Cloud Storage (GCS) with gcsfuse

Option 3: User-Specific PVCs (50 GB per user)

➡️ Option 4: Hybrid Storage - Shared Base + User Overlays (RECOMMENDED)

Comparative Summary

Related ADRs

Lessons Learned

What Went Wrong

What Should Have Happened

Takeaway

QA Review Status