Scaling and HPA Strategy
Date: 2025-10-29 Status: ✅ DEPLOYED - HPA Active and Monitoring Deployment: 2025-10-28 03:44 UTC (29 hours ago)
Executive Summary
HorizontalPodAutoscaler (HPA) is DEPLOYED and actively monitoring the coditect-combined-hybrid StatefulSet to provide automatic scaling based on CPU and memory usage.
Current Status:
NAME: coditect-combined-hpa
AGE: 29h
TARGETS: CPU 0% / 70%, Memory 18% / 75%
REPLICAS: 3 (min) - 30 (max)
SCALING: Ready to scale based on metrics
What is HPA?
HorizontalPodAutoscaler is a Kubernetes controller that automatically scales the number of pod replicas based on observed resource metrics.
Key Concepts:
- Watches: Resource utilization (CPU, memory) of StatefulSet pods
- Adjusts: Pod count up or down based on thresholds
- Protects: Application availability during traffic spikes
- Optimizes: Cost by scaling down during low usage
Architecture
How HPA Works with StatefulSet
User Traffic → NGINX Ingress → Service → HPA Monitors → StatefulSet Creates → Pods (with PVCs)
↓
CPU: 0% / 70% (threshold)
Memory: 18% / 75% (threshold)
↓
When threshold exceeded:
HPA modifies StatefulSet.replicas
↓
StatefulSet creates new pods
Each pod gets 10Gi workspace PVC
Current Configuration
File: k8s/coditect-combined-hpa.yaml (Last modified: Oct 28, 03:44)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: coditect-combined-hpa
namespace: coditect-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: coditect-combined-hybrid
minReplicas: 3 # Never go below 3 pods
maxReplicas: 30 # Never go above 30 pods (60 users capacity)
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when CPU >70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75 # Scale up when Memory >75%
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 1 min before scaling up
policies:
- type: Pods
value: 3 # Add 3 pods at a time
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Pods
value: 1 # Remove 1 pod at a time
periodSeconds: 60
Scaling Behavior
Scale-Up Triggers
HPA will scale UP (3→6→9...→30) when:
- CPU utilization exceeds 70% across pods
- OR Memory utilization exceeds 75% across pods
Stabilization: Waits 60 seconds to confirm sustained high usage before scaling
Rate: Adds 3 pods at a time every 60 seconds
Scale-Down Triggers
HPA will scale DOWN (30→29→28...→3) when:
- CPU AND Memory both fall below thresholds
- Sustained for 300 seconds (5 minutes)
Stabilization: Longer wait time (5 min) prevents flapping
Rate: Removes 1 pod at a time every 60 seconds (conservative)
Example Scaling Scenario
Current State: 3 pods, CPU 40%, Memory 50%
↓
Traffic Spike: User surge (10 → 50 concurrent users)
↓
Metrics: CPU 75%, Memory 80% (both exceed thresholds)
↓
Stabilization: HPA waits 60 seconds to confirm
↓
Scale Up: HPA modifies StatefulSet.replicas from 3 → 6
↓
Kubernetes: Creates 3 new pods (coditect-combined-3/4/5)
↓
Storage: Each new pod gets 10Gi workspace PVC
↓
Ready: New pods ready after ~2 minutes
↓
Result: 6 pods, CPU 40%, Memory 42% (below thresholds)
Capacity Planning
Current Configuration (Starter Tier)
Target: 10-20 users with headroom for 60 users
| Metric | Value | Calculation |
|---|---|---|
| Min Pods | 3 | Always available |
| Max Pods | 30 | Peak capacity |
| Users per Pod | ~2 | Conservative estimate |
| Total Capacity | 60 users | 30 pods × 2 users |
| Storage per Pod | 10 GB | Hybrid approach (system tools in image) |
| Total Storage | 300 GB | 30 pods × 10 GB |
Thresholds
| Threshold | Value | Rationale |
|---|---|---|
| CPU | 70% | Leaves 30% headroom for spikes |
| Memory | 75% | More aggressive (memory pressure worse than CPU) |
| Scale-Up Wait | 60s | Fast response to traffic spikes |
| Scale-Down Wait | 300s | Conservative to prevent flapping |
Monitoring
Check HPA Status
# View current HPA status
kubectl get hpa -n coditect-app
# Detailed HPA information
kubectl describe hpa coditect-combined-hpa -n coditect-app
# Watch HPA in real-time
kubectl get hpa coditect-combined-hpa -n coditect-app --watch
Check Pod Metrics
# View pod resource usage
kubectl top pods -n coditect-app
# View StatefulSet replicas
kubectl get statefulset coditect-combined-hybrid -n coditect-app
# Check pod scaling events
kubectl get events -n coditect-app --field-selector involvedObject.name=coditect-combined-hpa
Metrics to Monitor
| Metric | Current | Threshold | Alert |
|---|---|---|---|
| CPU Utilization | 0% | 70% | >80% sustained |
| Memory Utilization | 18% | 75% | >85% sustained |
| Pod Count | 3 | 30 max | =30 (capacity limit) |
| Scaling Events | 0 | - | >10/hour (flapping) |
Cost Optimization
Current Cost Profile
Assumptions:
- GKE Standard tier: $0.10/hour per vCPU
- Memory: Included in compute cost
- Storage: $0.17/GB/month (Standard PD)
| Component | Cost Calculation | Monthly Cost |
|---|---|---|
| Min Pods (3) | 3 pods × 0.5 vCPU × 730h × $0.10 | $109.50 |
| Storage (30 GB) | 30 GB × $0.17 | $5.10 |
| Total Baseline | Min pods + min storage | $114.60 |
| Max Pods (30) | 30 pods × 0.5 vCPU × 730h × $0.10 | $1,095.00 |
| Storage (300 GB) | 300 GB × $0.17 | $51.00 |
| Total Peak | Max pods + max storage | $1,146.00 |
Average Cost (assuming 50% utilization):
- 15 pods average: $657 /month
- 150 GB storage: $28 /month
- Total: ~$685 /month
Cost Monitoring
# View resource requests/limits
kubectl describe statefulset coditect-combined-hybrid -n coditect-app | grep -A 5 "Requests\|Limits"
# Estimate current cost
kubectl top nodes
kubectl top pods -n coditect-app
GCP Cost Dashboard:
- Navigate to: https://console.cloud.google.com/billing
- Filter by: GKE cluster
codi-poc-e2-cluster - View: Compute Engine + Persistent Disk costs
HPA vs StatefulSet
Comparison
| Aspect | StatefulSet (theia-statefulset-hybrid.yaml) | HPA (coditect-combined-hpa.yaml) |
|---|---|---|
| Purpose | Manages WHAT runs (pods, storage, resources) | Manages HOW MANY run (scaling) |
| Type | Workload resource (creates pods) | Autoscaling controller (modifies workload) |
| Replicas | Fixed: replicas: 3 | Dynamic: 3-30 based on metrics |
| Storage | Defines PVCs: 10Gi workspace + 5Gi config | No storage (uses StatefulSet's PVCs) |
| Resources | Defines limits: 512Mi-2Gi RAM, 500m-2000m CPU | Monitors these resources for scaling |
| Pod Lifecycle | Creates/deletes pods with persistent storage | Triggers pod creation/deletion via StatefulSet |
| Modified | Oct 28, 02:41 | Oct 28, 03:44 (1 hour later) |
Relationship
Analogy:
- StatefulSet = Factory that builds cars (defines HOW to build pods)
- HPA = Production manager (decides HOW MANY cars to build based on demand)
Flow:
User Traffic Increases
↓
CPU/Memory Usage > 70-75%
↓
HPA detects high utilization
↓
HPA modifies StatefulSet replicas: 3 → 6
↓
StatefulSet creates 3 new pods with 10Gi PVCs
↓
New pods ready to serve users
Roadmap
Current State ✅ COMPLETE
- HPA deployed and active
- Monitoring CPU and memory metrics
- 3-30 pod range configured
- Scale-up: 3 pods at a time, 60s stabilization
- Scale-down: 1 pod at a time, 300s stabilization
Near-Term Improvements (Q1 2026)
1. Custom Metrics ⏳ PLANNED
Add application-specific metrics for smarter scaling:
metrics:
- type: Pods
pods:
metric:
name: active_user_sessions # Custom metric from app
target:
type: AverageValue
averageValue: "10" # Scale when >10 users per pod
Implementation:
- Instrument application to export Prometheus metrics
- Deploy Prometheus Adapter
- Configure HPA to use custom metrics
2. Predictive Scaling ⏳ PLANNED
Use historical data to scale proactively:
behavior:
scaleUp:
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 30
selectPolicy: Max
Implementation:
- Analyze usage patterns (hourly, daily, weekly)
- Pre-scale before known traffic spikes
- Use GKE Workload Identity for forecasting
3. Cost Optimization ⏳ PLANNED
Idle Pod Cleanup CronJob:
apiVersion: batch/v1
kind: CronJob
metadata:
name: idle-pod-cleanup
namespace: coditect-app
spec:
schedule: "0 * * * *" # Run every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: idle-cleanup:latest
env:
- name: IDLE_THRESHOLD_HOURS
value: "2"
Logic:
- Query FoundationDB for last_active timestamps
- Identify pods with no activity > 2 hours
- Deprovision idle pods (delete namespace)
- Save costs by running only active pods
4. Multi-Region Scaling ⏳ FUTURE
Expand to multiple GKE clusters for global presence:
- US:
us-central1(current) - EU:
europe-west1 - Asia:
asia-southeast1
Benefits:
- Lower latency for international users
- High availability (region failures)
- Compliance (data residency)
Troubleshooting
HPA Not Scaling
Symptoms: Pod count stuck at 3 despite high CPU/memory
Checks:
# Check HPA status
kubectl describe hpa coditect-combined-hpa -n coditect-app
# Check metrics server
kubectl top pods -n coditect-app
# Check pod resource requests/limits
kubectl describe pod coditect-combined-0 -n coditect-app | grep -A 5 "Requests\|Limits"
Common Causes:
- Metrics server not running:
kubectl get deployment metrics-server -n kube-system - No resource requests defined: HPA requires
resources.requestsin pod spec - Stabilization window: Wait 60s (scale-up) or 300s (scale-down)
- Max replicas reached: Already at 30 pods
Rapid Scaling (Flapping)
Symptoms: Pod count oscillating (3→6→3→6...)
Fix: Increase stabilization windows
behavior:
scaleUp:
stabilizationWindowSeconds: 120 # Increase from 60s
scaleDown:
stabilizationWindowSeconds: 600 # Increase from 300s
Out of Resources
Symptoms: Pods stuck in Pending, events show FailedScheduling
Checks:
# Check node capacity
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check pending pods
kubectl get pods -n coditect-app --field-selector=status.phase=Pending
Fix: Enable GKE cluster autoscaling
gcloud container clusters update codi-poc-e2-cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=50 \
--zone=us-central1-a
References
Files
- HPA Configuration:
k8s/coditect-combined-hpa.yaml - StatefulSet Configuration:
k8s/theia-statefulset-hybrid.yaml - Hybrid Storage Strategy:
docs/10-execution-plans/2025-10-29-hybrid-storage-migration.md
Documentation
- Kubernetes HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
- GKE Autoscaling: https://cloud.google.com/kubernetes-engine/docs/concepts/horizontalpodautoscaler
- StatefulSet Scaling: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#scaling-a-statefulset
Status: ✅ HPA ACTIVE - 29 hours uptime, 0 scaling events, monitoring healthy Next Action: Monitor for 7 days to validate scaling behavior under various load patterns Review: Weekly during initial deployment, monthly after stabilization