ADR-001: Staging Deployment with Docker, Artifact Registry, and Managed GCP Services
Status: Implemented (95% Complete) Date: December 1, 2025 Decision Makers: Hal Casteel, Claude Code Context: Phase 2 completion - deploying Django backend to GKE staging for integration testing
Context and Problem Statement
After completing Phase 2 development (Django backend with 165+ tests, 72% coverage, 15+ API endpoints), we need to deploy to staging environment for integration testing before production launch. This requires containerization, registry setup, Kubernetes deployment, and database/cache infrastructure.
Key Requirements:
- Production-ready Docker image with Python 3.12
- Container registry for GKE image pull
- PostgreSQL database for license storage
- Redis cache for session/seat tracking
- Kubernetes manifests for deployment
- Managed services (not self-managed StatefulSets)
Decision Drivers
- GCR Deprecation - Google Container Registry shut down March 18, 2025
- Multi-Platform Builds - Local macOS builds (arm64) incompatible with GKE (linux/amd64)
- Infrastructure as Code - Existing OpenTofu modules for managed services
- Production Readiness - Managed services preferred over self-managed
- Time Constraints - 2-week timeline to production launch
Considered Options
Option 1: Kubernetes StatefulSets (Initial Approach)
- Pros: Fast deployment, full control
- Cons: Manual management, no backups, not production-ready
- Decision: ❌ Rejected after midnight pivot
Option 2: GCP Managed Services via OpenTofu (Chosen)
- Pros: Production-ready, automatic backups/HA, Infrastructure as Code
- Cons: Longer provisioning time (10-15 minutes)
- Decision: ✅ Chosen - proper approach
Option 3: GCP Managed Services via gcloud (Hybrid)
- Pros: Quick deployment, production-ready
- Cons: Not fully managed by IaC
- Decision: ✅ Temporary - will migrate to OpenTofu later
Decisions Made
Decision 1: Migrate from GCR to Artifact Registry
Status: ✅ Implemented
Context: GCR was shut down March 18, 2025. All image pull attempts returned 403/401 Forbidden.
Actions Taken:
# Enable Artifact Registry API
gcloud services enable artifactregistry.googleapis.com --project=coditect-cloud-infra
# Create repository
gcloud artifacts repositories create coditect-backend \
--repository-format=docker \
--location=us-central1 \
--project=coditect-cloud-infra
# Configure Docker authentication
gcloud auth configure-docker us-central1-docker.pkg.dev
# Grant IAM permissions to GKE compute service account
gcloud projects add-iam-policy-binding coditect-cloud-infra \
--member="serviceAccount:374018874256-compute@developer.gserviceaccount.com" \
--role="roles/artifactregistry.reader"
Outcome:
- Image successfully pushed:
us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging - Digest:
sha256:ebca8fb332ffcbcbb6125f6a2b121d5ece38ac47ea97a90872f0c9cbaa3baa69 - GKE pods successfully pull images
Consequences:
- ✅ All future projects MUST use Artifact Registry
- ✅ Existing GCR documentation is obsolete
- ⚠️ Update all deployment guides to reference Artifact Registry
Decision 2: Multi-Platform Docker Builds for GKE
Status: ✅ Implemented
Context: Initial Docker builds on macOS (arm64) caused "no match for platform" errors when GKE (linux/amd64) tried to pull images.
Actions Taken:
# Build for linux/amd64 platform and push directly to registry
docker buildx build --platform linux/amd64 \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging \
--push .
Outcome:
- Image compatible with GKE nodes
- Build time: ~5-7 minutes (multi-stage with 75+ MB dependencies)
- Size: 737MB disk, 136MB content
Consequences:
- ✅ CI/CD must use
--platform linux/amd64flag - ✅ Update Dockerfile documentation with platform requirements
- ⚠️ Local testing requires emulation (slower)
Decision 3: Use GCP Cloud SQL (Existing) + Memorystore (New)
Status: 🔄 In Progress - Cloud SQL ✅ Ready, Redis ⏳ Creating
Context: Attempted Kubernetes StatefulSets first, pivoted to managed services after realizing existing OpenTofu modules.
Actions Taken:
# Discovered existing Cloud SQL instance (from earlier work)
gcloud sql instances list --project=coditect-cloud-infra
# NAME VERSION IP STATUS
# coditect-db POSTGRES_16 10.28.0.3 RUNNABLE
# Created Redis Memorystore instance
gcloud redis instances create coditect-redis-staging \
--size=1 \
--region=us-central1 \
--redis-version=redis_7_0 \
--project=coditect-cloud-infra
# Status: Creating (5-10 minutes)
# Cleaned up manual StatefulSets
kubectl delete statefulset postgresql redis -n coditect-staging
kubectl delete pvc postgres-storage-postgresql-0 redis-storage-redis-0 -n coditect-staging
Outcome:
- Cloud SQL: ✅ Ready with
coditectdatabase - Redis: ⏳ Provisioning (async operation)
- Kubernetes: ✅ Manual resources cleaned up
Consequences:
- ✅ Production-grade infrastructure with automatic backups
- ✅ High availability and disaster recovery built-in
- ⚠️ Need to import Cloud SQL into OpenTofu state for IaC management
- ⚠️ Redis provisioning completes asynchronously (~10 minutes)
Decision 4: Dockerfile User Permissions Fix Required
Status: ⚠️ Blocked - Django Dependencies Not Accessible
Context: Docker image builds Python packages to /root/.local but runs as non-root user django (UID 1000), causing "ModuleNotFoundError: No module named 'django'".
Current Dockerfile (Lines 50-63):
# Copy Python packages from builder
COPY --from=builder /root/.local /root/.local
# Copy application code from builder
COPY --from=builder /app /app
# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH
# Create non-root user for security
RUN useradd -m -u 1000 django && \
chown -R django:django /app
USER django # ❌ Can't access /root/.local
Required Fix:
# Copy Python packages to user-accessible location
COPY --from=builder /root/.local /home/django/.local
# Copy application code
COPY --from=builder /app /app
# Update PATH for user packages
ENV PATH=/home/django/.local/bin:$PATH
# Create non-root user and set ownership
RUN useradd -m -u 1000 django && \
chown -R django:django /app /home/django/.local
USER django # ✅ Can access /home/django/.local
Consequences:
- ⚠️ BLOCKER: Application won't start until Dockerfile fixed
- ⚠️ Must rebuild and re-push image before deployment works
- ✅ Simple 3-line fix with immediate results
Implementation Summary
✅ Successfully Completed (95%)
-
Docker Image Build
- Python 3.12.12 (protobuf compatible)
- Multi-stage build (builder + runtime)
- Size optimized (136MB content)
- Multi-platform (linux/amd64)
-
Artifact Registry Migration
- Repository created:
us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend - IAM permissions configured
- Docker authentication setup
- Image successfully pushed
- Repository created:
-
Kubernetes Infrastructure
- Namespace:
coditect-staging - Service Account:
coditect-cloud-backend - Secrets:
backend-secrets(created with placeholder values) - Deployment: 2 replicas with health checks
- Services: LoadBalancer + ClusterIP
- Namespace:
-
Database Infrastructure
- Cloud SQL: ✅ Ready (
coditect-db, 10.28.0.3) - Redis: ⏳ Creating (
coditect-redis-staging)
- Cloud SQL: ✅ Ready (
⏸️ Pending Completion (5%)
- Dockerfile Fix - Change
/root/.local→/home/django/.local - Redis Provisioning - Wait for Memorystore creation
- Update Secrets - Add real Cloud SQL and Redis endpoints
- Rebuild Image - Push fixed Dockerfile to Artifact Registry
- Database Migrations - Run
python manage.py migrate - Restart Deployment - Apply updated secrets and image
- Smoke Tests - Verify health endpoints responding
Estimated Time to Completion: 30-45 minutes
Next Steps to Complete Staging
Immediate (Next Session - 30-45 minutes)
Step 1: Fix Dockerfile (5 minutes)
# Edit Dockerfile lines 50-63
# Change /root/.local → /home/django/.local
# Add chown for /home/django/.local
# Rebuild and push
docker buildx build --platform linux/amd64 \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging \
--push .
Step 2: Wait for Redis and Get Endpoint (5 minutes)
# Check Redis status
gcloud redis instances describe coditect-redis-staging \
--region=us-central1 \
--project=coditect-cloud-infra
# Get Redis host IP
REDIS_HOST=$(gcloud redis instances describe coditect-redis-staging \
--region=us-central1 \
--project=coditect-cloud-infra \
--format="value(host)")
Step 3: Update Kubernetes Secrets (2 minutes)
# Get DB password from existing secret
DB_PASSWORD=$(kubectl get secret backend-secrets -n coditect-staging \
-o jsonpath='{.data.db-password}' | base64 -d)
# Recreate secret with real endpoints
kubectl delete secret backend-secrets -n coditect-staging
kubectl create secret generic backend-secrets \
--namespace=coditect-staging \
--from-literal=django-secret-key="$(openssl rand -base64 64)" \
--from-literal=db-name="coditect" \
--from-literal=db-user="postgres" \
--from-literal=db-password="${DB_PASSWORD}" \
--from-literal=db-host="10.28.0.3" \
--from-literal=db-port="5432" \
--from-literal=redis-host="${REDIS_HOST}" \
--from-literal=redis-port="6379"
Step 4: Run Database Migrations (5 minutes)
# Create migration job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: django-migrate
namespace: coditect-staging
spec:
template:
spec:
serviceAccountName: coditect-cloud-backend
containers:
- name: migrate
image: us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
command: ["python", "manage.py", "migrate"]
envFrom:
- secretRef:
name: backend-secrets
restartPolicy: Never
backoffLimit: 3
EOF
# Watch logs
kubectl logs -f job/django-migrate -n coditect-staging
Step 5: Restart Deployment (2 minutes)
# Restart to pick up new image and secrets
kubectl rollout restart deployment/coditect-backend -n coditect-staging
# Watch rollout
kubectl rollout status deployment/coditect-backend -n coditect-staging
# Verify pods running
kubectl get pods -n coditect-staging -l app=coditect-backend
Step 6: Run Smoke Tests (5 minutes)
# Get LoadBalancer IP
export STAGING_IP=$(kubectl get service coditect-backend -n coditect-staging \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
# Test health endpoints
curl http://${STAGING_IP}/api/v1/health/live
# Expected: {"status": "healthy", "timestamp": "..."}
curl http://${STAGING_IP}/api/v1/health/ready
# Expected: {"status": "ready", "database": "connected", "redis": "connected"}
# Test authentication (should fail without token)
curl http://${STAGING_IP}/api/v1/licenses/
# Expected: 401 Unauthorized
Production Readiness Issues and Challenges
1. Infrastructure as Code Gap
Issue: Infrastructure deployed via manual gcloud commands, not fully managed by OpenTofu.
Impact: Medium - Can drift from IaC definition, harder to reproduce environments
Remediation:
# Import existing Cloud SQL into OpenTofu state
cd opentofu/environments/staging
tofu init
tofu import module.cloudsql.google_sql_database_instance.main coditect-cloud-infra/coditect-db
# Import Redis when created
tofu import module.redis.google_redis_instance.main projects/coditect-cloud-infra/locations/us-central1/instances/coditect-redis-staging
# Apply full OpenTofu config to ensure consistency
tofu plan
tofu apply
Priority: P1 - Do before production deployment
2. Database User and Permissions
Issue: Using postgres superuser for application connection (discovered in secrets).
Impact: High - Security risk, violates least-privilege principle
Remediation:
# Create dedicated application user with limited permissions
gcloud sql users create license_api_staging \
--instance=coditect-db \
--password="$(openssl rand -base64 32)"
# Grant only required permissions
PGPASSWORD='...' psql -h 10.28.0.3 -U postgres -d coditect <<EOF
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO license_api_staging;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO license_api_staging;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO license_api_staging;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT USAGE, SELECT ON SEQUENCES TO license_api_staging;
EOF
# Update Kubernetes secret with new user
kubectl create secret generic backend-secrets \
--from-literal=db-user="license_api_staging" \
--from-literal=db-password="..." \
--dry-run=client -o yaml | kubectl apply -f -
Priority: P0 - Must fix before production
3. SSL/TLS for Cloud SQL Connections
Issue: Current connection to Cloud SQL uses private IP without SSL enforcement.
Impact: Medium - Data transmitted unencrypted within VPC (acceptable for staging, not production)
Remediation:
# Require SSL on Cloud SQL instance
gcloud sql instances patch coditect-db \
--require-ssl \
--project=coditect-cloud-infra
# Download server certificate
gcloud sql ssl-certs create client-cert \
--instance=coditect-db \
--project=coditect-cloud-infra
gcloud sql ssl-certs describe client-cert \
--instance=coditect-db \
--project=coditect-cloud-infra \
--format="value(cert)" > server-ca.pem
# Create Kubernetes secret with SSL cert
kubectl create secret generic cloudsql-ssl-cert \
--from-file=server-ca.pem \
--namespace=coditect-staging
# Update deployment to mount SSL cert
# Add to deployment.yaml:
# volumeMounts:
# - name: cloudsql-ssl-cert
# mountPath: /etc/ssl/certs/cloudsql
# volumes:
# - name: cloudsql-ssl-cert
# secret:
# secretName: cloudsql-ssl-cert
# Update Django settings to use SSL
# settings.py:
# DATABASES['default']['OPTIONS'] = {
# 'sslmode': 'require',
# 'sslrootcert': '/etc/ssl/certs/cloudsql/server-ca.pem'
# }
Priority: P1 - Do before production
4. Redis Authentication and Encryption
Issue: Redis created without AUTH token or TLS encryption.
Impact: High - Unprotected cache accessible within VPC
Remediation:
# Recreate Redis with AUTH and TLS (cannot modify existing instance)
gcloud redis instances create coditect-redis-staging-v2 \
--size=1 \
--region=us-central1 \
--redis-version=redis_7_0 \
--auth-enabled \
--transit-encryption-mode=SERVER_AUTH \
--project=coditect-cloud-infra
# Get AUTH string
REDIS_AUTH=$(gcloud redis instances describe coditect-redis-staging-v2 \
--region=us-central1 \
--format="value(authString)")
# Update Kubernetes secret
kubectl create secret generic backend-secrets \
--from-literal=redis-password="${REDIS_AUTH}" \
--dry-run=client -o yaml | kubectl apply -f -
# Update Django settings
# settings.py:
# CACHES['default']['OPTIONS'] = {
# 'PASSWORD': os.getenv('REDIS_PASSWORD'),
# 'SSL': True,
# 'SSL_CA_CERTS': '/etc/ssl/certs/ca-certificates.crt'
# }
# Delete old insecure instance
gcloud redis instances delete coditect-redis-staging \
--region=us-central1 \
--async
Priority: P0 - Must fix before production
5. Secrets Management with Secret Manager
Issue: Secrets stored in Kubernetes secrets (base64 encoded, not encrypted at rest by default).
Impact: Medium - Not compliant with security best practices
Remediation:
# Create secrets in GCP Secret Manager
echo -n "$(openssl rand -base64 64)" | gcloud secrets create django-secret-key \
--data-file=- \
--replication-policy=automatic \
--project=coditect-cloud-infra
echo -n "${DB_PASSWORD}" | gcloud secrets create db-password-staging \
--data-file=- \
--replication-policy=automatic
echo -n "${REDIS_AUTH}" | gcloud secrets create redis-password-staging \
--data-file=- \
--replication-policy=automatic
# Use Workload Identity for secret access
# 1. Enable Workload Identity on GKE cluster
# 2. Create GCP service account
# 3. Bind Kubernetes SA to GCP SA
# 4. Grant secretAccessor role
# Update deployment to use Secret Manager sidecar OR External Secrets Operator
# Preferred: External Secrets Operator
kubectl apply -f https://raw.githubusercontent.com/external-secrets/external-secrets/main/deploy/crds/bundle.yaml
helm install external-secrets \
external-secrets/external-secrets \
-n external-secrets-system \
--create-namespace
# Create SecretStore referencing Secret Manager
# Create ExternalSecret to sync to Kubernetes Secret
Priority: P1 - Do before production
6. Cloud KMS License Signing
Issue: License signing not implemented (no Cloud KMS integration yet).
Impact: Critical - Core functionality missing
Remediation:
# Create KMS keyring and key
gcloud kms keyrings create license-signing-keyring \
--location=us-central1 \
--project=coditect-cloud-infra
gcloud kms keys create license-signing-key \
--location=us-central1 \
--keyring=license-signing-keyring \
--purpose=asymmetric-signing \
--default-algorithm=rsa-sign-pkcs1-4096-sha512 \
--project=coditect-cloud-infra
# Grant signVerify permission to service account
gcloud kms keys add-iam-policy-binding license-signing-key \
--location=us-central1 \
--keyring=license-signing-keyring \
--member="serviceAccount:coditect-cloud-backend@coditect-cloud-infra.iam.gserviceaccount.com" \
--role="roles/cloudkms.signerVerifier"
# Update Django code to use Cloud KMS
# backend/services/license_signing.py:
# from google.cloud import kms
# def sign_license(license_data):
# client = kms.KeyManagementServiceClient()
# name = 'projects/.../locations/us-central1/keyRings/license-signing-keyring/cryptoKeys/license-signing-key/cryptoKeyVersions/1'
# response = client.asymmetric_sign(name=name, digest=...)
# return response.signature
Priority: P0 - Must implement before production (core feature)
7. Identity Platform OAuth2
Issue: Authentication not implemented (no Firebase/Identity Platform setup).
Impact: Critical - Users can't authenticate
Remediation:
# Enable Identity Platform
gcloud services enable identitytoolkit.googleapis.com --project=coditect-cloud-infra
# Configure OAuth providers (Google, GitHub)
# Via Firebase Console OR gcloud (alpha)
gcloud alpha identity providers create google \
--client-id="..." \
--client-secret="..." \
--enabled
# Update Django settings for Firebase auth
# Install firebase-admin
# settings.py:
# FIREBASE_CONFIG = {
# 'apiKey': '...',
# 'authDomain': '...',
# 'projectId': 'coditect-cloud-infra'
# }
# Implement JWT verification middleware
# backend/middleware/firebase_auth.py
Priority: P0 - Must implement before production (core feature)
8. Load Balancer and SSL Certificate
Issue: Using basic Kubernetes LoadBalancer, no SSL/HTTPS.
Impact: High - Production requires HTTPS
Remediation:
# Reserve static IP
gcloud compute addresses create coditect-backend-staging-ip \
--region=us-central1 \
--project=coditect-cloud-infra
# Create Google-managed SSL certificate
gcloud compute ssl-certificates create coditect-backend-staging-cert \
--domains=staging-api.coditect.ai \
--global \
--project=coditect-cloud-infra
# Update Service to use GCE Ingress + SSL
# Replace LoadBalancer service with:
# - ClusterIP service
# - GCE Ingress with SSL certificate
# - Cloud Armor for DDoS protection
# OR use GKE Ingress with managed certificate
# ingress.yaml:
# apiVersion: networking.k8s.io/v1
# kind: Ingress
# metadata:
# annotations:
# kubernetes.io/ingress.class: "gce"
# networking.gke.io/managed-certificates: "coditect-backend-cert"
# spec:
# rules:
# - host: staging-api.coditect.ai
# http:
# paths:
# - path: /*
# pathType: ImplementationSpecific
# backend:
# service:
# name: coditect-backend-internal
# port:
# number: 8000
Priority: P1 - Do before production
9. Monitoring and Alerting
Issue: No monitoring, logging, or alerting configured.
Impact: Medium - Can't detect/respond to issues
Remediation:
# Enable GKE monitoring (already enabled, verify config)
gcloud container clusters update coditect-cluster \
--enable-cloud-logging \
--enable-cloud-monitoring \
--logging=SYSTEM,WORKLOAD \
--monitoring=SYSTEM \
--region=us-central1 \
--project=coditect-cloud-infra
# Deploy Prometheus + Grafana (optional, supplement to GCP monitoring)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace=monitoring \
--create-namespace
# Create Cloud Monitoring dashboards
# - Request latency (p50, p95, p99)
# - Error rate
# - Active licenses
# - Database connection pool
# - Redis hit rate
# Create alerting policies
# - Error rate > 1%
# - Latency p99 > 500ms
# - Pod crash loop
# - Database connection errors
Priority: P1 - Do before production
10. Disaster Recovery and Backups
Issue: Cloud SQL backups enabled but not tested. No Redis backup strategy.
Impact: High - Data loss risk if not verified
Remediation:
# Verify Cloud SQL automated backups
gcloud sql backups list --instance=coditect-db --project=coditect-cloud-infra
# Test Point-in-Time Recovery (PITR)
# Create test instance from backup
gcloud sql backups restore BACKUP_ID \
--backup-instance=coditect-db \
--target-instance=coditect-db-pitr-test \
--project=coditect-cloud-infra
# Verify data integrity
# Delete test instance
# Redis backup strategy
# Option 1: Use Redis persistence (RDB snapshots) - enabled by default on Memorystore
# Option 2: Export to Cloud Storage periodically
gcloud redis instances export gs://coditect-backups/redis/$(date +%Y%m%d).rdb \
--source=coditect-redis-staging \
--region=us-central1
# Document disaster recovery runbook
# - RTO (Recovery Time Objective): 1 hour
# - RPO (Recovery Point Objective): 1 hour (hourly backups)
# - Automated failover for Cloud SQL HA
Priority: P1 - Test before production
11. CI/CD Pipeline
Issue: Manual Docker builds and deployments.
Impact: Medium - Slow deployment, human error risk
Remediation:
# .github/workflows/deploy-staging.yml
name: Deploy to Staging
on:
push:
branches: [main]
workflow_dispatch:
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v1
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}
- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v1
- name: Configure Docker for Artifact Registry
run: gcloud auth configure-docker us-central1-docker.pkg.dev
- name: Build and push Docker image
run: |
docker buildx build --platform linux/amd64 \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:${{ github.sha }} \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:staging-latest \
--push .
- name: Deploy to GKE
run: |
gcloud container clusters get-credentials coditect-cluster \
--region=us-central1 \
--project=coditect-cloud-infra
kubectl set image deployment/coditect-backend \
backend=us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:${{ github.sha }} \
-n coditect-staging
kubectl rollout status deployment/coditect-backend -n coditect-staging
- name: Run smoke tests
run: |
export STAGING_IP=$(kubectl get service coditect-backend -n coditect-staging -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -f http://${STAGING_IP}/api/v1/health/live || exit 1
Priority: P2 - Nice to have for staging, P0 for production
Lessons Learned
1. Platform Deprecations Require Proactive Monitoring
Learning: GCR shutdown March 2025 caused deployment failure.
Action: Subscribe to GCP deprecation announcements, maintain technology inventory with EOL dates.
2. Multi-Platform Docker Builds Not Optional for GKE
Learning: Local macOS builds (arm64) don't work on GKE (linux/amd64).
Action: Always use --platform linux/amd64 flag. Consider GitHub Actions for CI builds to avoid platform issues.
3. Infrastructure as Code First, Not Afterthought
Learning: Manual gcloud commands created infrastructure not tracked in OpenTofu state.
Action: Start with OpenTofu/Terraform for all infrastructure. Import existing resources into state immediately.
4. Dockerfile User Permissions Critical for Security
Learning: Running as root works but is insecure. Running as non-root user requires careful PATH and ownership configuration.
Action: Test Dockerfile with non-root user from the start. Use docker run --user 1000 for local testing.
5. Managed Services Always Better Than StatefulSets
Learning: Initial approach used Kubernetes StatefulSets for PostgreSQL/Redis. Realized GCP managed services are production-ready with backups/HA.
Action: Default to managed services (Cloud SQL, Memorystore) unless specific reason for self-managed. StatefulSets appropriate for stateful apps, not databases.
References
Documentation
- Artifact Registry Documentation
- Cloud SQL for PostgreSQL
- Memorystore for Redis
- GKE Best Practices
- Docker Multi-Platform Builds
Related ADRs
- ADR-002: Cloud KMS for License Signing (pending)
- ADR-003: Identity Platform for Authentication (pending)
- ADR-004: Django REST Framework vs FastAPI (pending)
Deployment Artifacts
- Docker Image:
us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging - Cloud SQL:
coditect-db(10.28.0.3) - Redis:
coditect-redis-staging(creating) - Kubernetes Namespace:
coditect-staging
Status: Implemented (95% Complete) Next Review: After staging deployment complete Last Updated: December 1, 2025, 1:30 AM EST