V5 Deployment Guide
Version: 5.0.0 Last Updated: 2025-10-14 Status: Production Ready
🎯 Overview
This guide covers the complete end-to-end deployment process for Coditect V5, including the critical Kubernetes image caching issue and how to ensure your deployments actually update.
Architecture
V5 uses a combined container architecture:
- V5 Frontend Wrapper (React 18 + Vite) on port 80
- theia IDE Backend (Eclipse theia 1.65) on port 3000
- NGINX Routing proxies between them
Single Docker image = coditect-combined contains all three components.
🚨 CRITICAL: Kubernetes Image Caching Issue
The Problem
Symptom: Cloud Build succeeds, but pods never update with new code.
Root Cause: Kubernetes caches the :latest tag's SHA256 digest and never pulls new images unless explicitly told to.
Evidence:
# Pods show old age despite multiple deployments
kubectl get pods -n coditect-app
# NAME READY STATUS AGE
# coditect-combined-7f86d778c8-gsphv 1/1 Running 19h ⬅️ Should be minutes, not hours!
# Deployment uses cached SHA256
kubectl describe deployment coditect-combined -n coditect-app | grep Image:
# Image: us-central1-docker.pkg.dev/.../coditect-combined@sha256:65a4236222...
# ⬆️ Old SHA from 19 hours ago
Why This Happens
- Cloud Build pushes new image with
:latesttag → SUCCESS - gke-deploy updates deployment manifest → SUCCESS
- Kubernetes sees
:latesttag hasn't changed → SKIPS PULL - Pods keep running old cached image → NO UPDATE
The Fix
After every Cloud Build deployment, you MUST manually force a rollout:
# Method 1: Restart deployment (recommended)
kubectl rollout restart deployment/coditect-combined -n coditect-app
# Method 2: Force image pull (alternative)
kubectl set image deployment/coditect-combined -n coditect-app \
combined=us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/coditect-combined:latest \
&& kubectl rollout restart deployment/coditect-combined -n coditect-app
Wait for rollout to complete:
kubectl rollout status deployment/coditect-combined -n coditect-app --timeout=5m
📋 Complete Deployment Process
Prerequisites
- Google Cloud SDK installed and configured
- kubectl configured for GKE cluster
- Git repository up to date
- Source code changes committed
Step 1: Pre-Deployment Checklist
# 1. Verify you're on the correct branch
git branch --show-current
# Should show: main
# 2. Ensure all changes are committed
git status
# Should show: nothing to commit, working tree clean
# 3. Verify environment files
cat .env.production | grep VITE_API_URL
# Should show: VITE_API_URL=/api/v5
# 4. Verify package.json has correct script
grep "prototype:build" package.json
# Should show: "prototype:build": "vite build --mode production"
Step 2: Build and Deploy
# Run Cloud Build (builds image + attempts deployment)
gcloud builds submit --config cloudbuild-combined.yaml --project=serene-voltage-464305-n2 2>&1 | tee cloud-build-$(date +%Y%m%d-%H%M%S).log
# Wait for build to complete (~12-15 minutes)
# Watch for: "STATUS: SUCCESS"
Step 3: Force Kubernetes Update (CRITICAL!)
This step is NOT optional - without it, your deployment didn't actually update!
# Force pods to restart with new image
kubectl rollout restart deployment/coditect-combined -n coditect-app
# Wait for rollout (2-3 minutes)
kubectl rollout status deployment/coditect-combined -n coditect-app --timeout=5m
# Expected output:
# Waiting for deployment "coditect-combined" rollout to finish: 1 old replicas are pending termination...
# Waiting for deployment "coditect-combined" rollout to finish: 2 old replicas are pending termination...
# Waiting for deployment "coditect-combined" rollout to finish: 1 old replicas are pending termination...
# deployment "coditect-combined" successfully rolled out
Step 4: Verify Deployment
# 1. Check pod ages (should be < 5 minutes)
kubectl get pods -n coditect-app | grep coditect-combined
# coditect-combined-xxxxx-xxxxx 1/1 Running 2m ⬅️ Age should be recent!
# 2. Check image SHA (should be different from before)
kubectl describe deployment coditect-combined -n coditect-app | grep Image:
# Image: us-central1-docker.pkg.dev/.../coditect-combined@sha256:NEW_SHA_HERE
# 3. Check frontend version (via browser)
curl -sI https://coditect.ai/ | grep -E "HTTP|Server"
# HTTP/1.1 200 OK
# Server: nginx/1.22.1
# 4. Check frontend JavaScript bundle name changed
curl -s https://coditect.ai/ | grep -oP 'index-[A-Za-z0-9_-]+\.js' | head -1
# index-NEW_HASH.js ⬅️ Should be different from previous deployment
Step 5: Functional Testing
# 1. Test frontend loads
curl -I https://coditect.ai/
# Expected: HTTP/1.1 200 OK
# 2. Test health endpoint
curl -I https://coditect.ai/health
# Expected: HTTP/1.1 200 OK
# 3. Test auth endpoint (expect error without credentials)
curl -s -X POST https://coditect.ai/api/v5/auth/login \
-H 'Content-Type: application/json' \
-d '{"email":"test@example.com","password":"wrong"}'
# Expected: {"success":false,"error":{"code":"AUTH_FAILED","message":"Invalid email or password"}}
# 4. Browser test
# - Open https://coditect.ai/
# - Open browser console (F12)
# - Verify no JavaScript errors
# - Test login flow
Step 6: Post-Deployment Monitoring
# Watch pod logs for errors
kubectl logs -f deployment/coditect-combined -n coditect-app --tail=50
# Check pod resource usage
kubectl top pods -n coditect-app | grep coditect-combined
# Monitor pod restarts (should be 0)
kubectl get pods -n coditect-app -o wide | grep coditect-combined
🔧 Troubleshooting
Issue: Cloud Build Succeeds But Nothing Changes
Symptoms:
- Build shows SUCCESS
- Pods still have old AGE (hours/days)
- Code changes not visible in production
Root Cause: Kubernetes image caching (see Critical section above)
Fix:
kubectl rollout restart deployment/coditect-combined -n coditect-app
kubectl rollout status deployment/coditect-combined -n coditect-app --timeout=5m
Issue: Pods Crash After Rollout
Check logs:
kubectl logs deployment/coditect-combined -n coditect-app --tail=100
Common errors:
- NGINX fails to start - Check nginx-combined.conf syntax
- Node.js OOM - Increase memory limit in k8s-combined-deployment.yaml
- Missing files - Verify dockerfile.local-test COPY commands
- Port conflicts - Ensure NGINX listens on 80, theia on 3000
Rollback to previous version:
kubectl rollout undo deployment/coditect-combined -n coditect-app
Issue: Build Takes Too Long (>20 minutes)
Possible causes:
- npm install downloading large packages
- theia webpack build is slow
- Cloud Build machine type too small
Fix:
# In cloudbuild-combined.yaml, ensure:
options:
machineType: 'E2_HIGHCPU_32' # 32 CPUs
diskSizeGb: 100
env:
- 'NODE_OPTIONS=--max_old_space_size=8192' # 8GB heap
Issue: gke-deploy Step Fails
Error:
ERROR: (gcloud.run.deploy) Error parsing YAML spec: ...
Fix: Check k8s-combined-deployment.yaml syntax:
kubectl apply --dry-run=client -f k8s-combined-deployment.yaml
Issue: Old Pods Won't Terminate
Symptoms: Rollout hangs with "X old replicas are pending termination"
Causes:
- Open WebSocket connections
- Long-running terminal sessions
- High CPU/memory preventing graceful shutdown
Force termination:
# Get old pod names
kubectl get pods -n coditect-app -o wide | grep Terminating
# Force delete stuck pods
kubectl delete pod <pod-name> -n coditect-app --force --grace-period=0
📊 Deployment Verification Checklist
Use this checklist after every deployment:
Build Phase
- Cloud Build status shows SUCCESS
- Build log shows no errors in npm install
- Build log shows Vite build completed
- Docker image pushed to registry
- Image tagged with both BUILD_ID and :latest
Deployment Phase
- kubectl rollout restart executed
- kubectl rollout status shows "successfully rolled out"
- New pods created (AGE < 5 minutes)
- Old pods terminated
- All pods show READY 1/1
- No CrashLoopBackOff errors
Functional Testing
- Frontend loads (https://coditect.ai/)
- Health endpoint responds (https://coditect.ai/health)
- JavaScript bundle name changed
- No console errors in browser
- Auth endpoints respond correctly
- Login flow works
Infrastructure
- Pod resource usage normal (CPU < 1000m, Memory < 2Gi)
- No pod restarts in last 10 minutes
- NGINX routing working
- SSL certificate valid
🚀 Production Deployment Best Practices
1. Always Use Feature Branches
# Create feature branch
git checkout -b fix/session-store-bug
# Make changes, test locally
npm run dev
# Commit and push
git add .
git commit -m "fix: Add defensive checks to sessionStore"
git push origin fix/session-store-bug
# Deploy from main after PR review
git checkout main
git pull origin main
2. Tag Releases
# After successful deployment
git tag -a v5.0.12 -m "Build #12: Fix sessionStore not iterable error"
git push origin v5.0.12
3. Keep Deployment Logs
# Always save build logs with timestamps
gcloud builds submit ... 2>&1 | tee cloud-build-$(date +%Y%m%d-%H%M%S).log
4. Document Changes
Create docs for each build:
# Example: docs/BUILD-12-RELEASE-NOTES.md
- Fixed: sessionStore "not iterable" error
- Changed: Added defensive Array.isArray checks
- Verified: All 18 tests passing
5. Monitor After Deployment
# Watch logs for 5-10 minutes after deployment
kubectl logs -f deployment/coditect-combined -n coditect-app --tail=100
# Check for errors
kubectl logs deployment/coditect-combined -n coditect-app --tail=500 | grep -i error
📁 Key Files Reference
| File | Purpose | Location |
|---|---|---|
cloudbuild-combined.yaml | Cloud Build config | Root |
dockerfile.local-test | Combined image Dockerfile | Root |
k8s-combined-deployment.yaml | Kubernetes manifest | Root |
nginx-combined.conf | NGINX routing config | Root |
.env.production | Frontend environment vars | Root |
package.json | NPM scripts and dependencies | Root |
🔐 Security Checklist
Before Deployment
- No secrets in git (check .env files)
- JWT secret configured in backend
- CORS origins correct in nginx-combined.conf
- SSL certificate valid (Google-managed)
- Service account permissions minimal
After Deployment
- Auth middleware enforcing JWT
- Protected endpoints return 401 without token
- No sensitive data in JavaScript bundles
- HTTPS redirect working
- Security headers present (X-Frame-Options, etc.)
📈 Performance Benchmarks
Expected Build Times
- npm install: ~2-3 minutes
- Vite frontend build: ~30 seconds
- theia webpack build: ~8-10 minutes
- Docker build: ~1-2 minutes
- Total Cloud Build: 12-15 minutes
Expected Deployment Times
- kubectl rollout restart: 10 seconds
- Pod termination: 30-60 seconds per pod
- New pod startup: 60-90 seconds
- Health check ready: 30-60 seconds
- Total rollout: 2-4 minutes
Resource Usage (Per Pod)
- CPU: 1-2 millicores idle, 500m under load
- Memory: 74-76 Mi idle, up to 1Gi under load
- Disk: ~5GB (Docker image + node_modules)
🆘 Emergency Procedures
Rollback to Previous Version
# Method 1: Undo last rollout
kubectl rollout undo deployment/coditect-combined -n coditect-app
# Method 2: Rollback to specific revision
kubectl rollout history deployment/coditect-combined -n coditect-app
kubectl rollout undo deployment/coditect-combined -n coditect-app --to-revision=<N>
# Method 3: Deploy old image by BUILD_ID
kubectl set image deployment/coditect-combined -n coditect-app \
combined=us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect/coditect-combined:<OLD_BUILD_ID>
Scale Down (Maintenance Mode)
# Reduce to 1 pod
kubectl scale deployment/coditect-combined -n coditect-app --replicas=1
# Scale to 0 (full maintenance)
kubectl scale deployment/coditect-combined -n coditect-app --replicas=0
# Restore to 3 pods
kubectl scale deployment/coditect-combined -n coditect-app --replicas=3
Emergency Log Dump
# Save all pod logs
kubectl logs deployment/coditect-combined -n coditect-app --all-containers=true \
--tail=1000 > emergency-logs-$(date +%Y%m%d-%H%M%S).log
# Get full cluster state
kubectl get all -n coditect-app > cluster-state-$(date +%Y%m%d-%H%M%S).txt
📞 Support Contacts
GCP Project: serene-voltage-464305-n2 GKE Cluster: codi-poc-e2-cluster (us-central1-a) Domain: coditect.ai (34.8.51.57) Container Registry: us-central1-docker.pkg.dev/serene-voltage-464305-n2/coditect
Monitoring:
- Cloud Build: https://console.cloud.google.com/cloud-build/builds
- GKE Workloads: https://console.cloud.google.com/kubernetes/workload
- Logs: https://console.cloud.google.com/logs
🔄 Deployment Flow Diagram
┌─────────────────────────────────────────────────────────────┐
│ 1. DEVELOPER │
│ • git commit -m "fix: Bug description" │
│ • git push origin main │
└────────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────┐
│ 2. CLOUD BUILD (12-15 min) │
│ • npm install (3 min) │
│ • npm run prototype:build (30 sec) │
│ • docker build (10 min) │
│ • docker push (1 min) │
│ • gke-deploy (attempts update) ❌ DOESN'T WORK │
└────────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────┐
│ 3. KUBERNETES (WITHOUT MANUAL RESTART) │
│ • Sees :latest tag (no change) │
│ • Checks cached SHA256 │
│ • Skips image pull ❌ PODS NOT UPDATED │
│ • Result: Old code still running │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 3b. MANUAL RESTART (REQUIRED!) │
│ kubectl rollout restart deployment/coditect-combined │
└────────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────┐
│ 4. KUBERNETES (WITH MANUAL RESTART) │
│ • Forces new ReplicaSet │
│ • Pulls :latest image (gets new SHA) │
│ • Creates new pods (1-3 min) ✅ UPDATED │
│ • Terminates old pods │
└────────────────────────┬────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────┐
│ 5. VERIFICATION │
│ • kubectl get pods (AGE < 5m) ✅ │
│ • curl https://coditect.ai/ (new bundle) ✅ │
│ • Browser test (no errors) ✅ │
│ • Production ready ✅ │
└─────────────────────────────────────────────────────────────┘
Last Verified: 2025-10-14 Build Version: #12 Status: ✅ Deployment process working with manual rollout restart