Staging Deployment Status - CODITECT Cloud Backend
Date: November 30, 2025
Status: BLOCKED - GCR Permission Issue
Progress: 90% Complete
✅ Successfully Completed
-
✅ Docker image built (Python 3.12.12)
- Image:
coditect-cloud-backend:test-v1.0.0 - Size: 737MB disk, 136MB content
- Image:
-
✅ Image pushed to GCR
- Repository:
gcr.io/coditect-cloud-infra/coditect-cloud-backend - Tag:
v1.0.0-staging - Digest:
sha256:ebca8fb332ffcbcbb6125f6a2b121d5ece38ac47ea97a90872f0c9cbaa3baa69
- Repository:
-
✅ Kubernetes manifests created and configured
- Namespace:
coditect-staging(created) - Service Account:
coditect-cloud-backend(created) - Secrets:
backend-secrets(created with random values) - Services: LoadBalancer + ClusterIP (created)
- Namespace:
-
✅ GKE cluster verified
- Cluster:
coditect-clusterinus-central1 - Nodes: 3x n1-standard-2 (RUNNING)
- Credentials: configured locally
- Cluster:
❌ Current Blocker: GCR Image Pull Permissions
Issue Description
Kubernetes pods cannot pull the Docker image from Google Container Registry due to authentication failures.
Error Messages:
403 Forbidden: failed to authorize: failed to fetch oauth token
401 Unauthorized: failed to authorize: failed to fetch oauth token
Attempts Made
-
Granted Storage Object Viewer role to Compute Engine service account
gcloud projects add-iam-policy-binding coditect-cloud-infra \
--member="serviceAccount:374018874256-compute@developer.gserviceaccount.com" \
--role="roles/storage.objectViewer"Result: ❌ Failed - Still getting 403
-
Created imagePullSecret with user credentials
kubectl create secret docker-registry gcr-json-key \
--docker-server=gcr.io \
--docker-username=_json_key \
--docker-password="$(gcloud auth print-access-token)" \
-n coditect-stagingResult: ❌ Failed - Getting 401/403
Root Cause Analysis
The issue appears to be related to Container Registry API enablement or bucket-level permissions. Possible causes:
- Container Registry API not fully initialized for
coditect-cloud-infraproject - GCS bucket permissions not properly configured for GCR storage
- Service account scopes on GKE nodes may not include
storage-roorcloud-platform
🔧 Recommended Solutions
Option 1: Enable Container Registry API (Recommended)
# Enable Container Registry API
gcloud services enable containerregistry.googleapis.com --project=coditect-cloud-infra
# Verify API is enabled
gcloud services list --enabled --project=coditect-cloud-infra | grep containerregistry
# Wait 1-2 minutes for API propagation
# Delete existing pods to force retry
kubectl delete pods -n coditect-staging -l app=coditect-backend
Option 2: Use Artifact Registry Instead of GCR
Artifact Registry is the newer, recommended solution:
# Enable Artifact Registry API
gcloud services enable artifactregistry.googleapis.com --project=coditect-cloud-infra
# Create Artifact Registry repository
gcloud artifacts repositories create coditect-backend \
--repository-format=docker \
--location=us-central1 \
--project=coditect-cloud-infra
# Retag and push image to Artifact Registry
docker tag coditect-cloud-backend:test-v1.0.0 \
us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
docker push us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
# Update deployment manifest
# Change image from:
# gcr.io/coditect-cloud-infra/coditect-cloud-backend:v1.0.0-staging
# To:
# us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
Option 3: Recreate GKE Cluster with Correct Scopes
# Check current node scopes
gcloud container clusters describe coditect-cluster \
--region=us-central1 \
--format="value(nodeConfig.oauthScopes)"
# If scopes don't include cloud-platform or storage-ro:
# Create new node pool with correct scopes
gcloud container node-pools create coditect-pool-v2 \
--cluster=coditect-cluster \
--region=us-central1 \
--num-nodes=3 \
--machine-type=n1-standard-2 \
--scopes="https://www.googleapis.com/auth/cloud-platform"
# Cordon and drain old nodes
kubectl cordon -l cloud.google.com/gke-nodepool=default-pool
kubectl drain -l cloud.google.com/gke-nodepool=default-pool --force --ignore-daemonsets
# Delete old node pool
gcloud container node-pools delete default-pool \
--cluster=coditect-cluster \
--region=us-central1
Option 4: Use Service Account Key (Quick Fix for Testing)
# Create service account key
gcloud iam service-accounts keys create ~/gcr-key.json \
--iam-account=374018874256-compute@developer.gserviceaccount.com
# Create Kubernetes secret from key
kubectl create secret docker-registry gcr-sa-key \
--docker-server=gcr.io \
--docker-username=_json_key \
--docker-password="$(cat ~/gcr-key.json)" \
--docker-email=1@az1.ai \
-n coditect-staging
# Update deployment to use gcr-sa-key instead of gcr-json-key
kubectl patch deployment coditect-backend -n coditect-staging \
-p '{"spec":{"template":{"spec":{"imagePullSecrets":[{"name":"gcr-sa-key"}]}}}}'
# Clean up key file
rm ~/gcr-key.json
📊 Current Deployment State
Namespace: coditect-staging
- Status: Created and active
- Service Account: coditect-cloud-backend (created)
- Secrets: backend-secrets (created with random Django key and DB password)
Deployment: coditect-backend
- Status: Running but pods failing to start
- Replicas: 0/2 ready
- Reason: ImagePullBackOff
Pods
NAME READY STATUS RESTARTS AGE
coditect-backend-5c7689fcbc-mmj7x 0/1 ImagePullBackOff 0 5m
coditect-backend-69b6c8c6d6-92cmb 0/1 ImagePullBackOff 0 30m
coditect-backend-69b6c8c6d6-zr4md 0/1 ImagePullBackOff 0 30m
Services
NAME TYPE EXTERNAL-IP PORT(S)
coditect-backend LoadBalancer Pending 80:xxxxx/TCP, 443:xxxxx/TCP
coditect-backend-internal ClusterIP 10.x.x.x 8000/TCP
⏭️ Immediate Next Steps
- Run Option 1 (Enable Container Registry API) - Fastest fix
- Wait 2 minutes for API propagation
- Delete pods to force image pull retry
- Verify pods start successfully
- Get LoadBalancer IP and run smoke tests
- Update todos and mark deployment complete
Expected Timeline
- Container Registry API enablement: 2-3 minutes
- Pod startup: 1-2 minutes
- LoadBalancer IP assignment: 1-2 minutes
- Total: ~5-7 minutes to working deployment
📝 Lessons Learned
- Always enable Container Registry API before first docker push
- Verify GKE node scopes include
cloud-platformorstorage-ro - Test image pull locally before deploying to Kubernetes
- Use Artifact Registry for new projects (GCR is legacy)
🆘 If All Else Fails
Use Artifact Registry (modern, recommended):
# Full migration script
gcloud services enable artifactregistry.googleapis.com --project=coditect-cloud-infra
gcloud artifacts repositories create coditect-backend \
--repository-format=docker \
--location=us-central1 \
--project=coditect-cloud-infra
docker tag coditect-cloud-backend:test-v1.0.0 \
us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
docker push us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
# Update deployment
kubectl set image deployment/coditect-backend \
backend=us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging \
-n coditect-staging
Status: Ready for resolution via Option 1 (recommended)
Estimated Time to Resolution: 5-7 minutes
Last Updated: November 30, 2025