Skip to main content

V5 Deployment Guide

Version: 5.0.0 Last Updated: 2025-10-14 Status: Production Ready


🎯 Overview

This guide covers the complete end-to-end deployment process for Coditect V5, including the critical Kubernetes image caching issue and how to ensure your deployments actually update.

Architecture

V5 uses a combined container architecture:

  • V5 Frontend Wrapper (React 18 + Vite) on port 80
  • theia IDE Backend (Eclipse theia 1.65) on port 3000
  • NGINX Routing proxies between them

Single Docker image = coditect-combined contains all three components.


🚨 CRITICAL: Kubernetes Image Caching Issue

The Problem

Symptom: Cloud Build succeeds, but pods never update with new code.

Root Cause: Kubernetes caches the :latest tag's SHA256 digest and never pulls new images unless explicitly told to.

Evidence:

# Pods show old age despite multiple deployments
kubectl get pods -n coditect-app
# NAME READY STATUS AGE
# coditect-combined-7f86d778c8-gsphv 1/1 Running 19h ⬅️ Should be minutes, not hours!

# Deployment uses cached SHA256
kubectl describe deployment coditect-combined -n coditect-app | grep Image:
# Image: us-central1-docker.pkg.dev/.../coditect-combined@sha256:65a4236222...
# ⬆️ Old SHA from 19 hours ago

Why This Happens

  1. Cloud Build pushes new image with :latest tag → SUCCESS
  2. gke-deploy updates deployment manifest → SUCCESS
  3. Kubernetes sees :latest tag hasn't changed → SKIPS PULL
  4. Pods keep running old cached image → NO UPDATE

The Fix

After every Cloud Build deployment, you MUST manually force a rollout:

# Method 1: Restart deployment (recommended)
kubectl rollout restart deployment/coditect-combined -n coditect-app

# Method 2: Force image pull (alternative)
kubectl set image deployment/coditect-combined -n coditect-app \
combined=us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/coditect-combined:latest \
&& kubectl rollout restart deployment/coditect-combined -n coditect-app

Wait for rollout to complete:

kubectl rollout status deployment/coditect-combined -n coditect-app --timeout=5m

📋 Complete Deployment Process

Prerequisites

  1. Google Cloud SDK installed and configured
  2. kubectl configured for GKE cluster
  3. Git repository up to date
  4. Source code changes committed

Step 1: Pre-Deployment Checklist

# 1. Verify you're on the correct branch
git branch --show-current
# Should show: main

# 2. Ensure all changes are committed
git status
# Should show: nothing to commit, working tree clean

# 3. Verify environment files
cat .env.production | grep VITE_API_URL
# Should show: VITE_API_URL=/api/v5

# 4. Verify package.json has correct script
grep "prototype:build" package.json
# Should show: "prototype:build": "vite build --mode production"

Step 2: Build and Deploy

# Run Cloud Build (builds image + attempts deployment)
gcloud builds submit --config cloudbuild-combined.yaml --project=serene-voltage-464305-n2 2>&1 | tee cloud-build-$(date +%Y%m%d-%H%M%S).log

# Wait for build to complete (~12-15 minutes)
# Watch for: "STATUS: SUCCESS"

Step 3: Force Kubernetes Update (CRITICAL!)

This step is NOT optional - without it, your deployment didn't actually update!

# Force pods to restart with new image
kubectl rollout restart deployment/coditect-combined -n coditect-app

# Wait for rollout (2-3 minutes)
kubectl rollout status deployment/coditect-combined -n coditect-app --timeout=5m

# Expected output:
# Waiting for deployment "coditect-combined" rollout to finish: 1 old replicas are pending termination...
# Waiting for deployment "coditect-combined" rollout to finish: 2 old replicas are pending termination...
# Waiting for deployment "coditect-combined" rollout to finish: 1 old replicas are pending termination...
# deployment "coditect-combined" successfully rolled out

Step 4: Verify Deployment

# 1. Check pod ages (should be < 5 minutes)
kubectl get pods -n coditect-app | grep coditect-combined
# coditect-combined-xxxxx-xxxxx 1/1 Running 2m ⬅️ Age should be recent!

# 2. Check image SHA (should be different from before)
kubectl describe deployment coditect-combined -n coditect-app | grep Image:
# Image: us-central1-docker.pkg.dev/.../coditect-combined@sha256:NEW_SHA_HERE

# 3. Check frontend version (via browser)
curl -sI https://coditect.ai/ | grep -E "HTTP|Server"
# HTTP/1.1 200 OK
# Server: nginx/1.22.1

# 4. Check frontend JavaScript bundle name changed
curl -s https://coditect.ai/ | grep -oP 'index-[A-Za-z0-9_-]+\.js' | head -1
# index-NEW_HASH.js ⬅️ Should be different from previous deployment

Step 5: Functional Testing

# 1. Test frontend loads
curl -I https://coditect.ai/
# Expected: HTTP/1.1 200 OK

# 2. Test health endpoint
curl -I https://coditect.ai/health
# Expected: HTTP/1.1 200 OK

# 3. Test auth endpoint (expect error without credentials)
curl -s -X POST https://coditect.ai/api/v5/auth/login \
-H 'Content-Type: application/json' \
-d '{"email":"test@example.com","password":"wrong"}'
# Expected: {"success":false,"error":{"code":"AUTH_FAILED","message":"Invalid email or password"}}

# 4. Browser test
# - Open https://coditect.ai/
# - Open browser console (F12)
# - Verify no JavaScript errors
# - Test login flow

Step 6: Post-Deployment Monitoring

# Watch pod logs for errors
kubectl logs -f deployment/coditect-combined -n coditect-app --tail=50

# Check pod resource usage
kubectl top pods -n coditect-app | grep coditect-combined

# Monitor pod restarts (should be 0)
kubectl get pods -n coditect-app -o wide | grep coditect-combined

🔧 Troubleshooting

Issue: Cloud Build Succeeds But Nothing Changes

Symptoms:

  • Build shows SUCCESS
  • Pods still have old AGE (hours/days)
  • Code changes not visible in production

Root Cause: Kubernetes image caching (see Critical section above)

Fix:

kubectl rollout restart deployment/coditect-combined -n coditect-app
kubectl rollout status deployment/coditect-combined -n coditect-app --timeout=5m

Issue: Pods Crash After Rollout

Check logs:

kubectl logs deployment/coditect-combined -n coditect-app --tail=100

Common errors:

  1. NGINX fails to start - Check nginx-combined.conf syntax
  2. Node.js OOM - Increase memory limit in k8s-combined-deployment.yaml
  3. Missing files - Verify dockerfile.local-test COPY commands
  4. Port conflicts - Ensure NGINX listens on 80, theia on 3000

Rollback to previous version:

kubectl rollout undo deployment/coditect-combined -n coditect-app

Issue: Build Takes Too Long (>20 minutes)

Possible causes:

  1. npm install downloading large packages
  2. theia webpack build is slow
  3. Cloud Build machine type too small

Fix:

# In cloudbuild-combined.yaml, ensure:
options:
machineType: 'E2_HIGHCPU_32' # 32 CPUs
diskSizeGb: 100
env:
- 'NODE_OPTIONS=--max_old_space_size=8192' # 8GB heap

Issue: gke-deploy Step Fails

Error:

ERROR: (gcloud.run.deploy) Error parsing YAML spec: ...

Fix: Check k8s-combined-deployment.yaml syntax:

kubectl apply --dry-run=client -f k8s-combined-deployment.yaml

Issue: Old Pods Won't Terminate

Symptoms: Rollout hangs with "X old replicas are pending termination"

Causes:

  1. Open WebSocket connections
  2. Long-running terminal sessions
  3. High CPU/memory preventing graceful shutdown

Force termination:

# Get old pod names
kubectl get pods -n coditect-app -o wide | grep Terminating

# Force delete stuck pods
kubectl delete pod <pod-name> -n coditect-app --force --grace-period=0

📊 Deployment Verification Checklist

Use this checklist after every deployment:

Build Phase

  • Cloud Build status shows SUCCESS
  • Build log shows no errors in npm install
  • Build log shows Vite build completed
  • Docker image pushed to registry
  • Image tagged with both BUILD_ID and :latest

Deployment Phase

  • kubectl rollout restart executed
  • kubectl rollout status shows "successfully rolled out"
  • New pods created (AGE < 5 minutes)
  • Old pods terminated
  • All pods show READY 1/1
  • No CrashLoopBackOff errors

Functional Testing

Infrastructure

  • Pod resource usage normal (CPU < 1000m, Memory < 2Gi)
  • No pod restarts in last 10 minutes
  • NGINX routing working
  • SSL certificate valid

🚀 Production Deployment Best Practices

1. Always Use Feature Branches

# Create feature branch
git checkout -b fix/session-store-bug

# Make changes, test locally
npm run dev

# Commit and push
git add .
git commit -m "fix: Add defensive checks to sessionStore"
git push origin fix/session-store-bug

# Deploy from main after PR review
git checkout main
git pull origin main

2. Tag Releases

# After successful deployment
git tag -a v5.0.12 -m "Build #12: Fix sessionStore not iterable error"
git push origin v5.0.12

3. Keep Deployment Logs

# Always save build logs with timestamps
gcloud builds submit ... 2>&1 | tee cloud-build-$(date +%Y%m%d-%H%M%S).log

4. Document Changes

Create docs for each build:

# Example: docs/BUILD-12-RELEASE-NOTES.md
- Fixed: sessionStore "not iterable" error
- Changed: Added defensive Array.isArray checks
- Verified: All 18 tests passing

5. Monitor After Deployment

# Watch logs for 5-10 minutes after deployment
kubectl logs -f deployment/coditect-combined -n coditect-app --tail=100

# Check for errors
kubectl logs deployment/coditect-combined -n coditect-app --tail=500 | grep -i error

📁 Key Files Reference

FilePurposeLocation
cloudbuild-combined.yamlCloud Build configRoot
dockerfile.local-testCombined image DockerfileRoot
k8s-combined-deployment.yamlKubernetes manifestRoot
nginx-combined.confNGINX routing configRoot
.env.productionFrontend environment varsRoot
package.jsonNPM scripts and dependenciesRoot

🔐 Security Checklist

Before Deployment

  • No secrets in git (check .env files)
  • JWT secret configured in backend
  • CORS origins correct in nginx-combined.conf
  • SSL certificate valid (Google-managed)
  • Service account permissions minimal

After Deployment

  • Auth middleware enforcing JWT
  • Protected endpoints return 401 without token
  • No sensitive data in JavaScript bundles
  • HTTPS redirect working
  • Security headers present (X-Frame-Options, etc.)

📈 Performance Benchmarks

Expected Build Times

  • npm install: ~2-3 minutes
  • Vite frontend build: ~30 seconds
  • theia webpack build: ~8-10 minutes
  • Docker build: ~1-2 minutes
  • Total Cloud Build: 12-15 minutes

Expected Deployment Times

  • kubectl rollout restart: 10 seconds
  • Pod termination: 30-60 seconds per pod
  • New pod startup: 60-90 seconds
  • Health check ready: 30-60 seconds
  • Total rollout: 2-4 minutes

Resource Usage (Per Pod)

  • CPU: 1-2 millicores idle, 500m under load
  • Memory: 74-76 Mi idle, up to 1Gi under load
  • Disk: ~5GB (Docker image + node_modules)

🆘 Emergency Procedures

Rollback to Previous Version

# Method 1: Undo last rollout
kubectl rollout undo deployment/coditect-combined -n coditect-app

# Method 2: Rollback to specific revision
kubectl rollout history deployment/coditect-combined -n coditect-app
kubectl rollout undo deployment/coditect-combined -n coditect-app --to-revision=<N>

# Method 3: Deploy old image by BUILD_ID
kubectl set image deployment/coditect-combined -n coditect-app \
combined=us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/coditect-combined:<OLD_BUILD_ID>

Scale Down (Maintenance Mode)

# Reduce to 1 pod
kubectl scale deployment/coditect-combined -n coditect-app --replicas=1

# Scale to 0 (full maintenance)
kubectl scale deployment/coditect-combined -n coditect-app --replicas=0

# Restore to 3 pods
kubectl scale deployment/coditect-combined -n coditect-app --replicas=3

Emergency Log Dump

# Save all pod logs
kubectl logs deployment/coditect-combined -n coditect-app --all-containers=true \
--tail=1000 > emergency-logs-$(date +%Y%m%d-%H%M%S).log

# Get full cluster state
kubectl get all -n coditect-app > cluster-state-$(date +%Y%m%d-%H%M%S).txt

📞 Support Contacts

GCP Project: serene-voltage-464305-n2 GKE Cluster: codi-poc-e2-cluster (us-central1-a) Domain: coditect.ai (34.8.51.57) Container Registry: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect

Monitoring:


🔄 Deployment Flow Diagram

┌─────────────────────────────────────────────────────────────┐
│ 1. DEVELOPER │
│ • git commit -m "fix: Bug description" │
│ • git push origin main │
└────────────────────────┬────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 2. CLOUD BUILD (12-15 min) │
│ • npm install (3 min) │
│ • npm run prototype:build (30 sec) │
│ • docker build (10 min) │
│ • docker push (1 min) │
│ • gke-deploy (attempts update) ❌ DOESN'T WORK │
└────────────────────────┬────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 3. KUBERNETES (WITHOUT MANUAL RESTART) │
│ • Sees :latest tag (no change) │
│ • Checks cached SHA256 │
│ • Skips image pull ❌ PODS NOT UPDATED │
│ • Result: Old code still running │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ 3b. MANUAL RESTART (REQUIRED!) │
│ kubectl rollout restart deployment/coditect-combined │
└────────────────────────┬────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 4. KUBERNETES (WITH MANUAL RESTART) │
│ • Forces new ReplicaSet │
│ • Pulls :latest image (gets new SHA) │
│ • Creates new pods (1-3 min) ✅ UPDATED │
│ • Terminates old pods │
└────────────────────────┬────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────┐
│ 5. VERIFICATION │
│ • kubectl get pods (AGE < 5m) ✅ │
│ • curl https://coditect.ai/ (new bundle) ✅ │
│ • Browser test (no errors) ✅ │
│ • Production ready ✅ │
└─────────────────────────────────────────────────────────────┘

Last Verified: 2025-10-14 Build Version: #12 Status: ✅ Deployment process working with manual rollout restart