Scaling and HPA Strategy

Date: 2025-10-29 Status: ✅ DEPLOYED - HPA Active and Monitoring Deployment: 2025-10-28 03:44 UTC (29 hours ago)

Executive Summary

HorizontalPodAutoscaler (HPA) is DEPLOYED and actively monitoring the coditect-combined-hybrid StatefulSet to provide automatic scaling based on CPU and memory usage.

Current Status:

NAME: coditect-combined-hpa
AGE: 29h
TARGETS: CPU 0% / 70%, Memory 18% / 75%
REPLICAS: 3 (min) - 30 (max)
SCALING: Ready to scale based on metrics

What is HPA?

HorizontalPodAutoscaler is a Kubernetes controller that automatically scales the number of pod replicas based on observed resource metrics.

Key Concepts:

Watches: Resource utilization (CPU, memory) of StatefulSet pods
Adjusts: Pod count up or down based on thresholds
Protects: Application availability during traffic spikes
Optimizes: Cost by scaling down during low usage

Architecture

How HPA Works with StatefulSet

User Traffic → NGINX Ingress → Service → HPA Monitors → StatefulSet Creates → Pods (with PVCs)
                                             ↓
                               CPU: 0% / 70% (threshold)
                               Memory: 18% / 75% (threshold)
                                             ↓
                            When threshold exceeded:
                            HPA modifies StatefulSet.replicas
                                             ↓
                            StatefulSet creates new pods
                            Each pod gets 10Gi workspace PVC

Current Configuration

File: k8s/coditect-combined-hpa.yaml (Last modified: Oct 28, 03:44)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coditect-combined-hpa
  namespace: coditect-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: coditect-combined-hybrid
  minReplicas: 3     # Never go below 3 pods
  maxReplicas: 30    # Never go above 30 pods (60 users capacity)
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up when CPU >70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 75  # Scale up when Memory >75%
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60  # Wait 1 min before scaling up
      policies:
      - type: Pods
        value: 3  # Add 3 pods at a time
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Pods
        value: 1  # Remove 1 pod at a time
        periodSeconds: 60

Scaling Behavior

Scale-Up Triggers

HPA will scale UP (3→6→9...→30) when:

CPU utilization exceeds 70% across pods
OR Memory utilization exceeds 75% across pods

Stabilization: Waits 60 seconds to confirm sustained high usage before scaling

Rate: Adds 3 pods at a time every 60 seconds

Scale-Down Triggers

HPA will scale DOWN (30→29→28...→3) when:

CPU AND Memory both fall below thresholds
Sustained for 300 seconds (5 minutes)

Stabilization: Longer wait time (5 min) prevents flapping

Rate: Removes 1 pod at a time every 60 seconds (conservative)

Example Scaling Scenario

Current State: 3 pods, CPU 40%, Memory 50%
    ↓
Traffic Spike: User surge (10 → 50 concurrent users)
    ↓
Metrics: CPU 75%, Memory 80% (both exceed thresholds)
    ↓
Stabilization: HPA waits 60 seconds to confirm
    ↓
Scale Up: HPA modifies StatefulSet.replicas from 3 → 6
    ↓
Kubernetes: Creates 3 new pods (coditect-combined-3/4/5)
    ↓
Storage: Each new pod gets 10Gi workspace PVC
    ↓
Ready: New pods ready after ~2 minutes
    ↓
Result: 6 pods, CPU 40%, Memory 42% (below thresholds)

Capacity Planning

Current Configuration (Starter Tier)

Target: 10-20 users with headroom for 60 users

Metric	Value	Calculation
Min Pods	3	Always available
Max Pods	30	Peak capacity
Users per Pod	~2	Conservative estimate
Total Capacity	60 users	30 pods × 2 users
Storage per Pod	10 GB	Hybrid approach (system tools in image)
Total Storage	300 GB	30 pods × 10 GB

Thresholds

Threshold	Value	Rationale
CPU	70%	Leaves 30% headroom for spikes
Memory	75%	More aggressive (memory pressure worse than CPU)
Scale-Up Wait	60s	Fast response to traffic spikes
Scale-Down Wait	300s	Conservative to prevent flapping

Monitoring

Check HPA Status

# View current HPA status
kubectl get hpa -n coditect-app

# Detailed HPA information
kubectl describe hpa coditect-combined-hpa -n coditect-app

# Watch HPA in real-time
kubectl get hpa coditect-combined-hpa -n coditect-app --watch

Check Pod Metrics

# View pod resource usage
kubectl top pods -n coditect-app

# View StatefulSet replicas
kubectl get statefulset coditect-combined-hybrid -n coditect-app

# Check pod scaling events
kubectl get events -n coditect-app --field-selector involvedObject.name=coditect-combined-hpa

Metrics to Monitor

Metric	Current	Threshold	Alert
CPU Utilization	0%	70%	>80% sustained
Memory Utilization	18%	75%	>85% sustained
Pod Count	3	30 max	=30 (capacity limit)
Scaling Events	0	-	>10/hour (flapping)

Cost Optimization

Current Cost Profile

Assumptions:

GKE Standard tier: $0.10/hour per vCPU
Memory: Included in compute cost
Storage: $0.17/GB/month (Standard PD)

Component	Cost Calculation	Monthly Cost
Min Pods (3)	3 pods × 0.5 vCPU × 730h × $0.10	$109.50
Storage (30 GB)	30 GB × $0.17	$5.10
Total Baseline	Min pods + min storage	$114.60
Max Pods (30)	30 pods × 0.5 vCPU × 730h × $0.10	$1,095.00
Storage (300 GB)	300 GB × $0.17	$51.00
Total Peak	Max pods + max storage	$1,146.00

Average Cost (assuming 50% utilization):

15 pods average: $657 /month
150 GB storage: $28 /month
Total: ~$685 /month

Cost Monitoring

# View resource requests/limits
kubectl describe statefulset coditect-combined-hybrid -n coditect-app | grep -A 5 "Requests\|Limits"

# Estimate current cost
kubectl top nodes
kubectl top pods -n coditect-app

GCP Cost Dashboard:

Navigate to: https://console.cloud.google.com/billing
Filter by: GKE cluster codi-poc-e2-cluster
View: Compute Engine + Persistent Disk costs

HPA vs StatefulSet

Comparison

Aspect	StatefulSet (theia-statefulset-hybrid.yaml)	HPA (coditect-combined-hpa.yaml)
Purpose	Manages WHAT runs (pods, storage, resources)	Manages HOW MANY run (scaling)
Type	Workload resource (creates pods)	Autoscaling controller (modifies workload)
Replicas	Fixed: `replicas: 3`	Dynamic: 3-30 based on metrics
Storage	Defines PVCs: 10Gi workspace + 5Gi config	No storage (uses StatefulSet's PVCs)
Resources	Defines limits: 512Mi-2Gi RAM, 500m-2000m CPU	Monitors these resources for scaling
Pod Lifecycle	Creates/deletes pods with persistent storage	Triggers pod creation/deletion via StatefulSet
Modified	Oct 28, 02:41	Oct 28, 03:44 (1 hour later)

Relationship

Analogy:

StatefulSet = Factory that builds cars (defines HOW to build pods)
HPA = Production manager (decides HOW MANY cars to build based on demand)

Flow:

User Traffic Increases
        ↓
CPU/Memory Usage > 70-75%
        ↓
HPA detects high utilization
        ↓
HPA modifies StatefulSet replicas: 3 → 6
        ↓
StatefulSet creates 3 new pods with 10Gi PVCs
        ↓
New pods ready to serve users

Roadmap

Current State ✅ COMPLETE

HPA deployed and active
Monitoring CPU and memory metrics
3-30 pod range configured
Scale-up: 3 pods at a time, 60s stabilization
Scale-down: 1 pod at a time, 300s stabilization

Near-Term Improvements (Q1 2026)

1. Custom Metrics ⏳ PLANNED

Add application-specific metrics for smarter scaling:

metrics:
- type: Pods
  pods:
    metric:
      name: active_user_sessions  # Custom metric from app
    target:
      type: AverageValue
      averageValue: "10"  # Scale when >10 users per pod

Implementation:

Instrument application to export Prometheus metrics
Deploy Prometheus Adapter
Configure HPA to use custom metrics

2. Predictive Scaling ⏳ PLANNED

Use historical data to scale proactively:

behavior:
  scaleUp:
    policies:
    - type: Percent
      value: 100  # Double pods
      periodSeconds: 30
    selectPolicy: Max

Implementation:

Analyze usage patterns (hourly, daily, weekly)
Pre-scale before known traffic spikes
Use GKE Workload Identity for forecasting

3. Cost Optimization ⏳ PLANNED

Idle Pod Cleanup CronJob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: idle-pod-cleanup
  namespace: coditect-app
spec:
  schedule: "0 * * * *"  # Run every hour
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cleanup
            image: idle-cleanup:latest
            env:
            - name: IDLE_THRESHOLD_HOURS
              value: "2"

Logic:

Query FoundationDB for last_active timestamps
Identify pods with no activity > 2 hours
Deprovision idle pods (delete namespace)
Save costs by running only active pods

4. Multi-Region Scaling ⏳ FUTURE

Expand to multiple GKE clusters for global presence:

US: us-central1 (current)
EU: europe-west1
Asia: asia-southeast1

Benefits:

Lower latency for international users
High availability (region failures)
Compliance (data residency)

Troubleshooting

HPA Not Scaling

Symptoms: Pod count stuck at 3 despite high CPU/memory

Checks:

# Check HPA status
kubectl describe hpa coditect-combined-hpa -n coditect-app

# Check metrics server
kubectl top pods -n coditect-app

# Check pod resource requests/limits
kubectl describe pod coditect-combined-0 -n coditect-app | grep -A 5 "Requests\|Limits"

Common Causes:

Metrics server not running: kubectl get deployment metrics-server -n kube-system
No resource requests defined: HPA requires resources.requests in pod spec
Stabilization window: Wait 60s (scale-up) or 300s (scale-down)
Max replicas reached: Already at 30 pods

Rapid Scaling (Flapping)

Symptoms: Pod count oscillating (3→6→3→6...)

Fix: Increase stabilization windows

behavior:
  scaleUp:
    stabilizationWindowSeconds: 120  # Increase from 60s
  scaleDown:
    stabilizationWindowSeconds: 600  # Increase from 300s

Out of Resources

Symptoms: Pods stuck in Pending, events show FailedScheduling

Checks:

# Check node capacity
kubectl describe nodes | grep -A 5 "Allocated resources"

# Check pending pods
kubectl get pods -n coditect-app --field-selector=status.phase=Pending

Fix: Enable GKE cluster autoscaling

gcloud container clusters update codi-poc-e2-cluster \
  --enable-autoscaling \
  --min-nodes=3 \
  --max-nodes=50 \
  --zone=us-central1-a

References

Files

HPA Configuration: k8s/coditect-combined-hpa.yaml
StatefulSet Configuration: k8s/theia-statefulset-hybrid.yaml
Hybrid Storage Strategy: docs/10-execution-plans/2025-10-29-hybrid-storage-migration.md

Documentation

Kubernetes HPA: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
GKE Autoscaling: https://cloud.google.com/kubernetes-engine/docs/concepts/horizontalpodautoscaler
StatefulSet Scaling: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#scaling-a-statefulset

Status: ✅ HPA ACTIVE - 29 hours uptime, 0 scaling events, monitoring healthy Next Action: Monitor for 7 days to validate scaling behavior under various load patterns Review: Weekly during initial deployment, monthly after stabilization

Executive Summary​

What is HPA?​

Architecture​

How HPA Works with StatefulSet​

Current Configuration​

Scaling Behavior​

Scale-Up Triggers​

Scale-Down Triggers​

Example Scaling Scenario​

Capacity Planning​

Current Configuration (Starter Tier)​

Thresholds​

Monitoring​

Check HPA Status​

Check Pod Metrics​

Metrics to Monitor​

Cost Optimization​

Current Cost Profile​

Cost Monitoring​

HPA vs StatefulSet​

Comparison​

Relationship​

Roadmap​

Current State ✅ COMPLETE​

Near-Term Improvements (Q1 2026)​

1. Custom Metrics ⏳ PLANNED​

2. Predictive Scaling ⏳ PLANNED​

3. Cost Optimization ⏳ PLANNED​

4. Multi-Region Scaling ⏳ FUTURE​

Troubleshooting​

HPA Not Scaling​

Rapid Scaling (Flapping)​

Out of Resources​

References​

Files​

Documentation​