Skip to main content

ADR-001: Staging Deployment with Docker, Artifact Registry, and Managed GCP Services

Status: Implemented (95% Complete) Date: December 1, 2025 Decision Makers: Hal Casteel, Claude Code Context: Phase 2 completion - deploying Django backend to GKE staging for integration testing


Context and Problem Statement

After completing Phase 2 development (Django backend with 165+ tests, 72% coverage, 15+ API endpoints), we need to deploy to staging environment for integration testing before production launch. This requires containerization, registry setup, Kubernetes deployment, and database/cache infrastructure.

Key Requirements:

  • Production-ready Docker image with Python 3.12
  • Container registry for GKE image pull
  • PostgreSQL database for license storage
  • Redis cache for session/seat tracking
  • Kubernetes manifests for deployment
  • Managed services (not self-managed StatefulSets)

Decision Drivers

  1. GCR Deprecation - Google Container Registry shut down March 18, 2025
  2. Multi-Platform Builds - Local macOS builds (arm64) incompatible with GKE (linux/amd64)
  3. Infrastructure as Code - Existing OpenTofu modules for managed services
  4. Production Readiness - Managed services preferred over self-managed
  5. Time Constraints - 2-week timeline to production launch

Considered Options

Option 1: Kubernetes StatefulSets (Initial Approach)

  • Pros: Fast deployment, full control
  • Cons: Manual management, no backups, not production-ready
  • Decision: ❌ Rejected after midnight pivot

Option 2: GCP Managed Services via OpenTofu (Chosen)

  • Pros: Production-ready, automatic backups/HA, Infrastructure as Code
  • Cons: Longer provisioning time (10-15 minutes)
  • Decision: ✅ Chosen - proper approach

Option 3: GCP Managed Services via gcloud (Hybrid)

  • Pros: Quick deployment, production-ready
  • Cons: Not fully managed by IaC
  • Decision: ✅ Temporary - will migrate to OpenTofu later

Decisions Made

Decision 1: Migrate from GCR to Artifact Registry

Status: ✅ Implemented

Context: GCR was shut down March 18, 2025. All image pull attempts returned 403/401 Forbidden.

Actions Taken:

# Enable Artifact Registry API
gcloud services enable artifactregistry.googleapis.com --project=coditect-cloud-infra

# Create repository
gcloud artifacts repositories create coditect-backend \
--repository-format=docker \
--location=us-central1 \
--project=coditect-cloud-infra

# Configure Docker authentication
gcloud auth configure-docker us-central1-docker.pkg.dev

# Grant IAM permissions to GKE compute service account
gcloud projects add-iam-policy-binding coditect-cloud-infra \
--member="serviceAccount:374018874256-compute@developer.gserviceaccount.com" \
--role="roles/artifactregistry.reader"

Outcome:

  • Image successfully pushed: us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
  • Digest: sha256:ebca8fb332ffcbcbb6125f6a2b121d5ece38ac47ea97a90872f0c9cbaa3baa69
  • GKE pods successfully pull images

Consequences:

  • ✅ All future projects MUST use Artifact Registry
  • ✅ Existing GCR documentation is obsolete
  • ⚠️ Update all deployment guides to reference Artifact Registry

Decision 2: Multi-Platform Docker Builds for GKE

Status: ✅ Implemented

Context: Initial Docker builds on macOS (arm64) caused "no match for platform" errors when GKE (linux/amd64) tried to pull images.

Actions Taken:

# Build for linux/amd64 platform and push directly to registry
docker buildx build --platform linux/amd64 \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging \
--push .

Outcome:

  • Image compatible with GKE nodes
  • Build time: ~5-7 minutes (multi-stage with 75+ MB dependencies)
  • Size: 737MB disk, 136MB content

Consequences:

  • ✅ CI/CD must use --platform linux/amd64 flag
  • ✅ Update Dockerfile documentation with platform requirements
  • ⚠️ Local testing requires emulation (slower)

Decision 3: Use GCP Cloud SQL (Existing) + Memorystore (New)

Status: 🔄 In Progress - Cloud SQL ✅ Ready, Redis ⏳ Creating

Context: Attempted Kubernetes StatefulSets first, pivoted to managed services after realizing existing OpenTofu modules.

Actions Taken:

# Discovered existing Cloud SQL instance (from earlier work)
gcloud sql instances list --project=coditect-cloud-infra
# NAME VERSION IP STATUS
# coditect-db POSTGRES_16 10.28.0.3 RUNNABLE

# Created Redis Memorystore instance
gcloud redis instances create coditect-redis-staging \
--size=1 \
--region=us-central1 \
--redis-version=redis_7_0 \
--project=coditect-cloud-infra
# Status: Creating (5-10 minutes)

# Cleaned up manual StatefulSets
kubectl delete statefulset postgresql redis -n coditect-staging
kubectl delete pvc postgres-storage-postgresql-0 redis-storage-redis-0 -n coditect-staging

Outcome:

  • Cloud SQL: ✅ Ready with coditect database
  • Redis: ⏳ Provisioning (async operation)
  • Kubernetes: ✅ Manual resources cleaned up

Consequences:

  • ✅ Production-grade infrastructure with automatic backups
  • ✅ High availability and disaster recovery built-in
  • ⚠️ Need to import Cloud SQL into OpenTofu state for IaC management
  • ⚠️ Redis provisioning completes asynchronously (~10 minutes)

Decision 4: Dockerfile User Permissions Fix Required

Status: ⚠️ Blocked - Django Dependencies Not Accessible

Context: Docker image builds Python packages to /root/.local but runs as non-root user django (UID 1000), causing "ModuleNotFoundError: No module named 'django'".

Current Dockerfile (Lines 50-63):

# Copy Python packages from builder
COPY --from=builder /root/.local /root/.local

# Copy application code from builder
COPY --from=builder /app /app

# Make sure scripts in .local are usable
ENV PATH=/root/.local/bin:$PATH

# Create non-root user for security
RUN useradd -m -u 1000 django && \
chown -R django:django /app

USER django # ❌ Can't access /root/.local

Required Fix:

# Copy Python packages to user-accessible location
COPY --from=builder /root/.local /home/django/.local

# Copy application code
COPY --from=builder /app /app

# Update PATH for user packages
ENV PATH=/home/django/.local/bin:$PATH

# Create non-root user and set ownership
RUN useradd -m -u 1000 django && \
chown -R django:django /app /home/django/.local

USER django # ✅ Can access /home/django/.local

Consequences:

  • ⚠️ BLOCKER: Application won't start until Dockerfile fixed
  • ⚠️ Must rebuild and re-push image before deployment works
  • ✅ Simple 3-line fix with immediate results

Implementation Summary

✅ Successfully Completed (95%)

  1. Docker Image Build

    • Python 3.12.12 (protobuf compatible)
    • Multi-stage build (builder + runtime)
    • Size optimized (136MB content)
    • Multi-platform (linux/amd64)
  2. Artifact Registry Migration

    • Repository created: us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend
    • IAM permissions configured
    • Docker authentication setup
    • Image successfully pushed
  3. Kubernetes Infrastructure

    • Namespace: coditect-staging
    • Service Account: coditect-cloud-backend
    • Secrets: backend-secrets (created with placeholder values)
    • Deployment: 2 replicas with health checks
    • Services: LoadBalancer + ClusterIP
  4. Database Infrastructure

    • Cloud SQL: ✅ Ready (coditect-db, 10.28.0.3)
    • Redis: ⏳ Creating (coditect-redis-staging)

⏸️ Pending Completion (5%)

  1. Dockerfile Fix - Change /root/.local/home/django/.local
  2. Redis Provisioning - Wait for Memorystore creation
  3. Update Secrets - Add real Cloud SQL and Redis endpoints
  4. Rebuild Image - Push fixed Dockerfile to Artifact Registry
  5. Database Migrations - Run python manage.py migrate
  6. Restart Deployment - Apply updated secrets and image
  7. Smoke Tests - Verify health endpoints responding

Estimated Time to Completion: 30-45 minutes


Next Steps to Complete Staging

Immediate (Next Session - 30-45 minutes)

Step 1: Fix Dockerfile (5 minutes)

# Edit Dockerfile lines 50-63
# Change /root/.local → /home/django/.local
# Add chown for /home/django/.local

# Rebuild and push
docker buildx build --platform linux/amd64 \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging \
--push .

Step 2: Wait for Redis and Get Endpoint (5 minutes)

# Check Redis status
gcloud redis instances describe coditect-redis-staging \
--region=us-central1 \
--project=coditect-cloud-infra

# Get Redis host IP
REDIS_HOST=$(gcloud redis instances describe coditect-redis-staging \
--region=us-central1 \
--project=coditect-cloud-infra \
--format="value(host)")

Step 3: Update Kubernetes Secrets (2 minutes)

# Get DB password from existing secret
DB_PASSWORD=$(kubectl get secret backend-secrets -n coditect-staging \
-o jsonpath='{.data.db-password}' | base64 -d)

# Recreate secret with real endpoints
kubectl delete secret backend-secrets -n coditect-staging
kubectl create secret generic backend-secrets \
--namespace=coditect-staging \
--from-literal=django-secret-key="$(openssl rand -base64 64)" \
--from-literal=db-name="coditect" \
--from-literal=db-user="postgres" \
--from-literal=db-password="${DB_PASSWORD}" \
--from-literal=db-host="10.28.0.3" \
--from-literal=db-port="5432" \
--from-literal=redis-host="${REDIS_HOST}" \
--from-literal=redis-port="6379"

Step 4: Run Database Migrations (5 minutes)

# Create migration job
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: Job
metadata:
name: django-migrate
namespace: coditect-staging
spec:
template:
spec:
serviceAccountName: coditect-cloud-backend
containers:
- name: migrate
image: us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
command: ["python", "manage.py", "migrate"]
envFrom:
- secretRef:
name: backend-secrets
restartPolicy: Never
backoffLimit: 3
EOF

# Watch logs
kubectl logs -f job/django-migrate -n coditect-staging

Step 5: Restart Deployment (2 minutes)

# Restart to pick up new image and secrets
kubectl rollout restart deployment/coditect-backend -n coditect-staging

# Watch rollout
kubectl rollout status deployment/coditect-backend -n coditect-staging

# Verify pods running
kubectl get pods -n coditect-staging -l app=coditect-backend

Step 6: Run Smoke Tests (5 minutes)

# Get LoadBalancer IP
export STAGING_IP=$(kubectl get service coditect-backend -n coditect-staging \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')

# Test health endpoints
curl http://${STAGING_IP}/api/v1/health/live
# Expected: {"status": "healthy", "timestamp": "..."}

curl http://${STAGING_IP}/api/v1/health/ready
# Expected: {"status": "ready", "database": "connected", "redis": "connected"}

# Test authentication (should fail without token)
curl http://${STAGING_IP}/api/v1/licenses/
# Expected: 401 Unauthorized

Production Readiness Issues and Challenges

1. Infrastructure as Code Gap

Issue: Infrastructure deployed via manual gcloud commands, not fully managed by OpenTofu.

Impact: Medium - Can drift from IaC definition, harder to reproduce environments

Remediation:

# Import existing Cloud SQL into OpenTofu state
cd opentofu/environments/staging
tofu init
tofu import module.cloudsql.google_sql_database_instance.main coditect-cloud-infra/coditect-db

# Import Redis when created
tofu import module.redis.google_redis_instance.main projects/coditect-cloud-infra/locations/us-central1/instances/coditect-redis-staging

# Apply full OpenTofu config to ensure consistency
tofu plan
tofu apply

Priority: P1 - Do before production deployment


2. Database User and Permissions

Issue: Using postgres superuser for application connection (discovered in secrets).

Impact: High - Security risk, violates least-privilege principle

Remediation:

# Create dedicated application user with limited permissions
gcloud sql users create license_api_staging \
--instance=coditect-db \
--password="$(openssl rand -base64 32)"

# Grant only required permissions
PGPASSWORD='...' psql -h 10.28.0.3 -U postgres -d coditect <<EOF
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO license_api_staging;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO license_api_staging;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO license_api_staging;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT USAGE, SELECT ON SEQUENCES TO license_api_staging;
EOF

# Update Kubernetes secret with new user
kubectl create secret generic backend-secrets \
--from-literal=db-user="license_api_staging" \
--from-literal=db-password="..." \
--dry-run=client -o yaml | kubectl apply -f -

Priority: P0 - Must fix before production


3. SSL/TLS for Cloud SQL Connections

Issue: Current connection to Cloud SQL uses private IP without SSL enforcement.

Impact: Medium - Data transmitted unencrypted within VPC (acceptable for staging, not production)

Remediation:

# Require SSL on Cloud SQL instance
gcloud sql instances patch coditect-db \
--require-ssl \
--project=coditect-cloud-infra

# Download server certificate
gcloud sql ssl-certs create client-cert \
--instance=coditect-db \
--project=coditect-cloud-infra

gcloud sql ssl-certs describe client-cert \
--instance=coditect-db \
--project=coditect-cloud-infra \
--format="value(cert)" > server-ca.pem

# Create Kubernetes secret with SSL cert
kubectl create secret generic cloudsql-ssl-cert \
--from-file=server-ca.pem \
--namespace=coditect-staging

# Update deployment to mount SSL cert
# Add to deployment.yaml:
# volumeMounts:
# - name: cloudsql-ssl-cert
# mountPath: /etc/ssl/certs/cloudsql
# volumes:
# - name: cloudsql-ssl-cert
# secret:
# secretName: cloudsql-ssl-cert

# Update Django settings to use SSL
# settings.py:
# DATABASES['default']['OPTIONS'] = {
# 'sslmode': 'require',
# 'sslrootcert': '/etc/ssl/certs/cloudsql/server-ca.pem'
# }

Priority: P1 - Do before production


4. Redis Authentication and Encryption

Issue: Redis created without AUTH token or TLS encryption.

Impact: High - Unprotected cache accessible within VPC

Remediation:

# Recreate Redis with AUTH and TLS (cannot modify existing instance)
gcloud redis instances create coditect-redis-staging-v2 \
--size=1 \
--region=us-central1 \
--redis-version=redis_7_0 \
--auth-enabled \
--transit-encryption-mode=SERVER_AUTH \
--project=coditect-cloud-infra

# Get AUTH string
REDIS_AUTH=$(gcloud redis instances describe coditect-redis-staging-v2 \
--region=us-central1 \
--format="value(authString)")

# Update Kubernetes secret
kubectl create secret generic backend-secrets \
--from-literal=redis-password="${REDIS_AUTH}" \
--dry-run=client -o yaml | kubectl apply -f -

# Update Django settings
# settings.py:
# CACHES['default']['OPTIONS'] = {
# 'PASSWORD': os.getenv('REDIS_PASSWORD'),
# 'SSL': True,
# 'SSL_CA_CERTS': '/etc/ssl/certs/ca-certificates.crt'
# }

# Delete old insecure instance
gcloud redis instances delete coditect-redis-staging \
--region=us-central1 \
--async

Priority: P0 - Must fix before production


5. Secrets Management with Secret Manager

Issue: Secrets stored in Kubernetes secrets (base64 encoded, not encrypted at rest by default).

Impact: Medium - Not compliant with security best practices

Remediation:

# Create secrets in GCP Secret Manager
echo -n "$(openssl rand -base64 64)" | gcloud secrets create django-secret-key \
--data-file=- \
--replication-policy=automatic \
--project=coditect-cloud-infra

echo -n "${DB_PASSWORD}" | gcloud secrets create db-password-staging \
--data-file=- \
--replication-policy=automatic

echo -n "${REDIS_AUTH}" | gcloud secrets create redis-password-staging \
--data-file=- \
--replication-policy=automatic

# Use Workload Identity for secret access
# 1. Enable Workload Identity on GKE cluster
# 2. Create GCP service account
# 3. Bind Kubernetes SA to GCP SA
# 4. Grant secretAccessor role

# Update deployment to use Secret Manager sidecar OR External Secrets Operator
# Preferred: External Secrets Operator
kubectl apply -f https://raw.githubusercontent.com/external-secrets/external-secrets/main/deploy/crds/bundle.yaml
helm install external-secrets \
external-secrets/external-secrets \
-n external-secrets-system \
--create-namespace

# Create SecretStore referencing Secret Manager
# Create ExternalSecret to sync to Kubernetes Secret

Priority: P1 - Do before production


6. Cloud KMS License Signing

Issue: License signing not implemented (no Cloud KMS integration yet).

Impact: Critical - Core functionality missing

Remediation:

# Create KMS keyring and key
gcloud kms keyrings create license-signing-keyring \
--location=us-central1 \
--project=coditect-cloud-infra

gcloud kms keys create license-signing-key \
--location=us-central1 \
--keyring=license-signing-keyring \
--purpose=asymmetric-signing \
--default-algorithm=rsa-sign-pkcs1-4096-sha512 \
--project=coditect-cloud-infra

# Grant signVerify permission to service account
gcloud kms keys add-iam-policy-binding license-signing-key \
--location=us-central1 \
--keyring=license-signing-keyring \
--member="serviceAccount:coditect-cloud-backend@coditect-cloud-infra.iam.gserviceaccount.com" \
--role="roles/cloudkms.signerVerifier"

# Update Django code to use Cloud KMS
# backend/services/license_signing.py:
# from google.cloud import kms
# def sign_license(license_data):
# client = kms.KeyManagementServiceClient()
# name = 'projects/.../locations/us-central1/keyRings/license-signing-keyring/cryptoKeys/license-signing-key/cryptoKeyVersions/1'
# response = client.asymmetric_sign(name=name, digest=...)
# return response.signature

Priority: P0 - Must implement before production (core feature)


7. Identity Platform OAuth2

Issue: Authentication not implemented (no Firebase/Identity Platform setup).

Impact: Critical - Users can't authenticate

Remediation:

# Enable Identity Platform
gcloud services enable identitytoolkit.googleapis.com --project=coditect-cloud-infra

# Configure OAuth providers (Google, GitHub)
# Via Firebase Console OR gcloud (alpha)
gcloud alpha identity providers create google \
--client-id="..." \
--client-secret="..." \
--enabled

# Update Django settings for Firebase auth
# Install firebase-admin
# settings.py:
# FIREBASE_CONFIG = {
# 'apiKey': '...',
# 'authDomain': '...',
# 'projectId': 'coditect-cloud-infra'
# }

# Implement JWT verification middleware
# backend/middleware/firebase_auth.py

Priority: P0 - Must implement before production (core feature)


8. Load Balancer and SSL Certificate

Issue: Using basic Kubernetes LoadBalancer, no SSL/HTTPS.

Impact: High - Production requires HTTPS

Remediation:

# Reserve static IP
gcloud compute addresses create coditect-backend-staging-ip \
--region=us-central1 \
--project=coditect-cloud-infra

# Create Google-managed SSL certificate
gcloud compute ssl-certificates create coditect-backend-staging-cert \
--domains=staging-api.coditect.ai \
--global \
--project=coditect-cloud-infra

# Update Service to use GCE Ingress + SSL
# Replace LoadBalancer service with:
# - ClusterIP service
# - GCE Ingress with SSL certificate
# - Cloud Armor for DDoS protection

# OR use GKE Ingress with managed certificate
# ingress.yaml:
# apiVersion: networking.k8s.io/v1
# kind: Ingress
# metadata:
# annotations:
# kubernetes.io/ingress.class: "gce"
# networking.gke.io/managed-certificates: "coditect-backend-cert"
# spec:
# rules:
# - host: staging-api.coditect.ai
# http:
# paths:
# - path: /*
# pathType: ImplementationSpecific
# backend:
# service:
# name: coditect-backend-internal
# port:
# number: 8000

Priority: P1 - Do before production


9. Monitoring and Alerting

Issue: No monitoring, logging, or alerting configured.

Impact: Medium - Can't detect/respond to issues

Remediation:

# Enable GKE monitoring (already enabled, verify config)
gcloud container clusters update coditect-cluster \
--enable-cloud-logging \
--enable-cloud-monitoring \
--logging=SYSTEM,WORKLOAD \
--monitoring=SYSTEM \
--region=us-central1 \
--project=coditect-cloud-infra

# Deploy Prometheus + Grafana (optional, supplement to GCP monitoring)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace=monitoring \
--create-namespace

# Create Cloud Monitoring dashboards
# - Request latency (p50, p95, p99)
# - Error rate
# - Active licenses
# - Database connection pool
# - Redis hit rate

# Create alerting policies
# - Error rate > 1%
# - Latency p99 > 500ms
# - Pod crash loop
# - Database connection errors

Priority: P1 - Do before production


10. Disaster Recovery and Backups

Issue: Cloud SQL backups enabled but not tested. No Redis backup strategy.

Impact: High - Data loss risk if not verified

Remediation:

# Verify Cloud SQL automated backups
gcloud sql backups list --instance=coditect-db --project=coditect-cloud-infra

# Test Point-in-Time Recovery (PITR)
# Create test instance from backup
gcloud sql backups restore BACKUP_ID \
--backup-instance=coditect-db \
--target-instance=coditect-db-pitr-test \
--project=coditect-cloud-infra

# Verify data integrity
# Delete test instance

# Redis backup strategy
# Option 1: Use Redis persistence (RDB snapshots) - enabled by default on Memorystore
# Option 2: Export to Cloud Storage periodically
gcloud redis instances export gs://coditect-backups/redis/$(date +%Y%m%d).rdb \
--source=coditect-redis-staging \
--region=us-central1

# Document disaster recovery runbook
# - RTO (Recovery Time Objective): 1 hour
# - RPO (Recovery Point Objective): 1 hour (hourly backups)
# - Automated failover for Cloud SQL HA

Priority: P1 - Test before production


11. CI/CD Pipeline

Issue: Manual Docker builds and deployments.

Impact: Medium - Slow deployment, human error risk

Remediation:

# .github/workflows/deploy-staging.yml
name: Deploy to Staging

on:
push:
branches: [main]
workflow_dispatch:

jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3

- name: Authenticate to Google Cloud
uses: google-github-actions/auth@v1
with:
credentials_json: ${{ secrets.GCP_SA_KEY }}

- name: Set up Cloud SDK
uses: google-github-actions/setup-gcloud@v1

- name: Configure Docker for Artifact Registry
run: gcloud auth configure-docker us-central1-docker.pkg.dev

- name: Build and push Docker image
run: |
docker buildx build --platform linux/amd64 \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:${{ github.sha }} \
-t us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:staging-latest \
--push .

- name: Deploy to GKE
run: |
gcloud container clusters get-credentials coditect-cluster \
--region=us-central1 \
--project=coditect-cloud-infra

kubectl set image deployment/coditect-backend \
backend=us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:${{ github.sha }} \
-n coditect-staging

kubectl rollout status deployment/coditect-backend -n coditect-staging

- name: Run smoke tests
run: |
export STAGING_IP=$(kubectl get service coditect-backend -n coditect-staging -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -f http://${STAGING_IP}/api/v1/health/live || exit 1

Priority: P2 - Nice to have for staging, P0 for production


Lessons Learned

1. Platform Deprecations Require Proactive Monitoring

Learning: GCR shutdown March 2025 caused deployment failure.

Action: Subscribe to GCP deprecation announcements, maintain technology inventory with EOL dates.


2. Multi-Platform Docker Builds Not Optional for GKE

Learning: Local macOS builds (arm64) don't work on GKE (linux/amd64).

Action: Always use --platform linux/amd64 flag. Consider GitHub Actions for CI builds to avoid platform issues.


3. Infrastructure as Code First, Not Afterthought

Learning: Manual gcloud commands created infrastructure not tracked in OpenTofu state.

Action: Start with OpenTofu/Terraform for all infrastructure. Import existing resources into state immediately.


4. Dockerfile User Permissions Critical for Security

Learning: Running as root works but is insecure. Running as non-root user requires careful PATH and ownership configuration.

Action: Test Dockerfile with non-root user from the start. Use docker run --user 1000 for local testing.


5. Managed Services Always Better Than StatefulSets

Learning: Initial approach used Kubernetes StatefulSets for PostgreSQL/Redis. Realized GCP managed services are production-ready with backups/HA.

Action: Default to managed services (Cloud SQL, Memorystore) unless specific reason for self-managed. StatefulSets appropriate for stateful apps, not databases.


References

Documentation

  • ADR-002: Cloud KMS for License Signing (pending)
  • ADR-003: Identity Platform for Authentication (pending)
  • ADR-004: Django REST Framework vs FastAPI (pending)

Deployment Artifacts

  • Docker Image: us-central1-docker.pkg.dev/coditect-cloud-infra/coditect-backend/coditect-cloud-backend:v1.0.0-staging
  • Cloud SQL: coditect-db (10.28.0.3)
  • Redis: coditect-redis-staging (creating)
  • Kubernetes Namespace: coditect-staging

Status: Implemented (95% Complete) Next Review: After staging deployment complete Last Updated: December 1, 2025, 1:30 AM EST