Skip to main content

Scaling and HPA Strategy

Date: 2025-10-29 Status: ✅ DEPLOYED - HPA Active and Monitoring Deployment: 2025-10-28 03:44 UTC (29 hours ago)


Executive Summary

HorizontalPodAutoscaler (HPA) is DEPLOYED and actively monitoring the coditect-combined-hybrid StatefulSet to provide automatic scaling based on CPU and memory usage.

Current Status:

NAME: coditect-combined-hpa
AGE: 29h
TARGETS: CPU 0% / 70%, Memory 18% / 75%
REPLICAS: 3 (min) - 30 (max)
SCALING: Ready to scale based on metrics

What is HPA?

HorizontalPodAutoscaler is a Kubernetes controller that automatically scales the number of pod replicas based on observed resource metrics.

Key Concepts:

  • Watches: Resource utilization (CPU, memory) of StatefulSet pods
  • Adjusts: Pod count up or down based on thresholds
  • Protects: Application availability during traffic spikes
  • Optimizes: Cost by scaling down during low usage

Architecture

How HPA Works with StatefulSet

User Traffic → NGINX Ingress → Service → HPA Monitors → StatefulSet Creates → Pods (with PVCs)

CPU: 0% / 70% (threshold)
Memory: 18% / 75% (threshold)

When threshold exceeded:
HPA modifies StatefulSet.replicas

StatefulSet creates new pods
Each pod gets 10Gi workspace PVC

Current Configuration

File: k8s/coditect-combined-hpa.yaml (Last modified: Oct 28, 03:44)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: coditect-combined-hpa
namespace: coditect-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: coditect-combined-hybrid
minReplicas: 3 # Never go below 3 pods
maxReplicas: 30 # Never go above 30 pods (60 users capacity)
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when CPU >70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75 # Scale up when Memory >75%
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Wait 1 min before scaling up
policies:
- type: Pods
value: 3 # Add 3 pods at a time
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Pods
value: 1 # Remove 1 pod at a time
periodSeconds: 60

Scaling Behavior

Scale-Up Triggers

HPA will scale UP (3→6→9...→30) when:

  • CPU utilization exceeds 70% across pods
  • OR Memory utilization exceeds 75% across pods

Stabilization: Waits 60 seconds to confirm sustained high usage before scaling

Rate: Adds 3 pods at a time every 60 seconds

Scale-Down Triggers

HPA will scale DOWN (30→29→28...→3) when:

  • CPU AND Memory both fall below thresholds
  • Sustained for 300 seconds (5 minutes)

Stabilization: Longer wait time (5 min) prevents flapping

Rate: Removes 1 pod at a time every 60 seconds (conservative)

Example Scaling Scenario

Current State: 3 pods, CPU 40%, Memory 50%

Traffic Spike: User surge (10 → 50 concurrent users)

Metrics: CPU 75%, Memory 80% (both exceed thresholds)

Stabilization: HPA waits 60 seconds to confirm

Scale Up: HPA modifies StatefulSet.replicas from 3 → 6

Kubernetes: Creates 3 new pods (coditect-combined-3/4/5)

Storage: Each new pod gets 10Gi workspace PVC

Ready: New pods ready after ~2 minutes

Result: 6 pods, CPU 40%, Memory 42% (below thresholds)

Capacity Planning

Current Configuration (Starter Tier)

Target: 10-20 users with headroom for 60 users

MetricValueCalculation
Min Pods3Always available
Max Pods30Peak capacity
Users per Pod~2Conservative estimate
Total Capacity60 users30 pods × 2 users
Storage per Pod10 GBHybrid approach (system tools in image)
Total Storage300 GB30 pods × 10 GB

Thresholds

ThresholdValueRationale
CPU70%Leaves 30% headroom for spikes
Memory75%More aggressive (memory pressure worse than CPU)
Scale-Up Wait60sFast response to traffic spikes
Scale-Down Wait300sConservative to prevent flapping

Monitoring

Check HPA Status

# View current HPA status
kubectl get hpa -n coditect-app

# Detailed HPA information
kubectl describe hpa coditect-combined-hpa -n coditect-app

# Watch HPA in real-time
kubectl get hpa coditect-combined-hpa -n coditect-app --watch

Check Pod Metrics

# View pod resource usage
kubectl top pods -n coditect-app

# View StatefulSet replicas
kubectl get statefulset coditect-combined-hybrid -n coditect-app

# Check pod scaling events
kubectl get events -n coditect-app --field-selector involvedObject.name=coditect-combined-hpa

Metrics to Monitor

MetricCurrentThresholdAlert
CPU Utilization0%70%>80% sustained
Memory Utilization18%75%>85% sustained
Pod Count330 max=30 (capacity limit)
Scaling Events0->10/hour (flapping)

Cost Optimization

Current Cost Profile

Assumptions:

  • GKE Standard tier: $0.10/hour per vCPU
  • Memory: Included in compute cost
  • Storage: $0.17/GB/month (Standard PD)
ComponentCost CalculationMonthly Cost
Min Pods (3)3 pods × 0.5 vCPU × 730h × $0.10$109.50
Storage (30 GB)30 GB × $0.17$5.10
Total BaselineMin pods + min storage$114.60
Max Pods (30)30 pods × 0.5 vCPU × 730h × $0.10$1,095.00
Storage (300 GB)300 GB × $0.17$51.00
Total PeakMax pods + max storage$1,146.00

Average Cost (assuming 50% utilization):

  • 15 pods average: $657 /month
  • 150 GB storage: $28 /month
  • Total: ~$685 /month

Cost Monitoring

# View resource requests/limits
kubectl describe statefulset coditect-combined-hybrid -n coditect-app | grep -A 5 "Requests\|Limits"

# Estimate current cost
kubectl top nodes
kubectl top pods -n coditect-app

GCP Cost Dashboard:


HPA vs StatefulSet

Comparison

AspectStatefulSet (theia-statefulset-hybrid.yaml)HPA (coditect-combined-hpa.yaml)
PurposeManages WHAT runs (pods, storage, resources)Manages HOW MANY run (scaling)
TypeWorkload resource (creates pods)Autoscaling controller (modifies workload)
ReplicasFixed: replicas: 3Dynamic: 3-30 based on metrics
StorageDefines PVCs: 10Gi workspace + 5Gi configNo storage (uses StatefulSet's PVCs)
ResourcesDefines limits: 512Mi-2Gi RAM, 500m-2000m CPUMonitors these resources for scaling
Pod LifecycleCreates/deletes pods with persistent storageTriggers pod creation/deletion via StatefulSet
ModifiedOct 28, 02:41Oct 28, 03:44 (1 hour later)

Relationship

Analogy:

  • StatefulSet = Factory that builds cars (defines HOW to build pods)
  • HPA = Production manager (decides HOW MANY cars to build based on demand)

Flow:

User Traffic Increases

CPU/Memory Usage > 70-75%

HPA detects high utilization

HPA modifies StatefulSet replicas: 3 → 6

StatefulSet creates 3 new pods with 10Gi PVCs

New pods ready to serve users

Roadmap

Current State ✅ COMPLETE

  • HPA deployed and active
  • Monitoring CPU and memory metrics
  • 3-30 pod range configured
  • Scale-up: 3 pods at a time, 60s stabilization
  • Scale-down: 1 pod at a time, 300s stabilization

Near-Term Improvements (Q1 2026)

1. Custom Metrics ⏳ PLANNED

Add application-specific metrics for smarter scaling:

metrics:
- type: Pods
pods:
metric:
name: active_user_sessions # Custom metric from app
target:
type: AverageValue
averageValue: "10" # Scale when >10 users per pod

Implementation:

  • Instrument application to export Prometheus metrics
  • Deploy Prometheus Adapter
  • Configure HPA to use custom metrics

2. Predictive Scaling ⏳ PLANNED

Use historical data to scale proactively:

behavior:
scaleUp:
policies:
- type: Percent
value: 100 # Double pods
periodSeconds: 30
selectPolicy: Max

Implementation:

  • Analyze usage patterns (hourly, daily, weekly)
  • Pre-scale before known traffic spikes
  • Use GKE Workload Identity for forecasting

3. Cost Optimization ⏳ PLANNED

Idle Pod Cleanup CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
name: idle-pod-cleanup
namespace: coditect-app
spec:
schedule: "0 * * * *" # Run every hour
jobTemplate:
spec:
template:
spec:
containers:
- name: cleanup
image: idle-cleanup:latest
env:
- name: IDLE_THRESHOLD_HOURS
value: "2"

Logic:

  • Query FoundationDB for last_active timestamps
  • Identify pods with no activity > 2 hours
  • Deprovision idle pods (delete namespace)
  • Save costs by running only active pods

4. Multi-Region Scaling ⏳ FUTURE

Expand to multiple GKE clusters for global presence:

  • US: us-central1 (current)
  • EU: europe-west1
  • Asia: asia-southeast1

Benefits:

  • Lower latency for international users
  • High availability (region failures)
  • Compliance (data residency)

Troubleshooting

HPA Not Scaling

Symptoms: Pod count stuck at 3 despite high CPU/memory

Checks:

# Check HPA status
kubectl describe hpa coditect-combined-hpa -n coditect-app

# Check metrics server
kubectl top pods -n coditect-app

# Check pod resource requests/limits
kubectl describe pod coditect-combined-0 -n coditect-app | grep -A 5 "Requests\|Limits"

Common Causes:

  1. Metrics server not running: kubectl get deployment metrics-server -n kube-system
  2. No resource requests defined: HPA requires resources.requests in pod spec
  3. Stabilization window: Wait 60s (scale-up) or 300s (scale-down)
  4. Max replicas reached: Already at 30 pods

Rapid Scaling (Flapping)

Symptoms: Pod count oscillating (3→6→3→6...)

Fix: Increase stabilization windows

behavior:
scaleUp:
stabilizationWindowSeconds: 120 # Increase from 60s
scaleDown:
stabilizationWindowSeconds: 600 # Increase from 300s

Out of Resources

Symptoms: Pods stuck in Pending, events show FailedScheduling

Checks:

# Check node capacity
kubectl describe nodes | grep -A 5 "Allocated resources"

# Check pending pods
kubectl get pods -n coditect-app --field-selector=status.phase=Pending

Fix: Enable GKE cluster autoscaling

gcloud container clusters update codi-poc-e2-cluster \
--enable-autoscaling \
--min-nodes=3 \
--max-nodes=50 \
--zone=us-central1-a

References

Files

  • HPA Configuration: k8s/coditect-combined-hpa.yaml
  • StatefulSet Configuration: k8s/theia-statefulset-hybrid.yaml
  • Hybrid Storage Strategy: docs/10-execution-plans/2025-10-29-hybrid-storage-migration.md

Documentation


Status: ✅ HPA ACTIVE - 29 hours uptime, 0 scaling events, monitoring healthy Next Action: Monitor for 7 days to validate scaling behavior under various load patterns Review: Weekly during initial deployment, monthly after stabilization