ADR-020: GCP Cloud Run Deployment Strategy

Status: Accepted Date: 2025-10-06 Deciders: Development Team, DevOps Team, Infrastructure Team Related: ADR-016 (NGINX), ADR-017 (WebSocket), ADR-004 (FoundationDB)

Context

The AZ1.AI llm IDE requires a cloud deployment strategy for production use. We need:

Scalability: Handle 1000+ concurrent users
Cost-Efficiency: Pay for actual usage, not idle capacity
Global Reach: Low latency worldwide
Easy Deployment: Simple CI/CD pipeline
WebSocket Support: For real-time communication
Stateful Services: FoundationDB, Redis, file storage

Current State

Local development with Docker Compose
No production deployment infrastructure
No CI/CD pipeline
Manual deployments

Requirements

Auto-Scaling: Scale from 0 to 1000+ instances
Global CDN: Fast content delivery worldwide
Managed Services: Minimize operational overhead
Cost Control: Budget-friendly for startup
Security: SSL/TLS, IAM, VPC
Monitoring: Metrics, logs, alerts
High Availability: 99.9% uptime SLA

Decision

We will deploy to Google Cloud Platform using Cloud Run with supporting managed services:

Architecture

┌────────────────────────────────────────────────────────────────┐
│                         Global CDN                             │
│                    (Cloud CDN + Cloud Armor)                   │
└──────────────────────────┬─────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────────────┐
│                  Global Load Balancer                          │
│              (Cloud Load Balancing - HTTPS)                    │
└──────────────────────────┬─────────────────────────────────────┘
                           │
                           ▼
┌────────────────────────────────────────────────────────────────┐
│                      Cloud Run Services                        │
│                                                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │    theia     │  │  WebSocket   │  │  MCP Gateway │        │
│  │   Frontend   │  │   Backend    │  │   Service    │        │
│  │ (Port 3000)  │  │ (Port 4000)  │  │              │        │
│  └──────────────┘  └──────────────┘  └──────────────┘        │
│         │                  │                  │                │
│         │                  │                  │                │
│         └──────────────────┼──────────────────┘                │
│                            │                                   │
└────────────────────────────┼───────────────────────────────────┘
                             │
                             ▼
┌────────────────────────────────────────────────────────────────┐
│                    Managed Services                            │
│                                                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │FoundationDB  │  │  Memorystore │  │  Cloud       │        │
│  │   (VMs on    │  │   (Redis)    │  │  Storage     │        │
│  │  Compute     │  │              │  │  (Files)     │        │
│  │  Engine)     │  │              │  │              │        │
│  └──────────────┘  └──────────────┘  └──────────────┘        │
│                                                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐        │
│  │  Cloud SQL   │  │  Secret      │  │  Cloud       │        │
│  │ (PostgreSQL) │  │  Manager     │  │  Logging     │        │
│  │ (Metadata)   │  │              │  │              │        │
│  └──────────────┘  └──────────────┘  └──────────────┘        │
└────────────────────────────────────────────────────────────────┘

Service Breakdown

Compute:

Cloud Run: theia frontend, WebSocket backend, MCP gateway
Compute Engine: FoundationDB cluster (3-5 VMs)

Storage:

Cloud Storage: User files, session data, static assets
Memorystore (Redis): Session cache, MCP response cache
Cloud SQL (PostgreSQL): User data, session metadata (alternative to FDB for some use cases)

Networking:

Cloud Load Balancing: Global HTTPS load balancer
Cloud CDN: Static asset caching
Cloud Armor: DDoS protection, WAF

Observability:

Cloud Logging: Centralized logs
Cloud Monitoring: Metrics, dashboards
Cloud Trace: Distributed tracing
Error Reporting: Exception tracking

Security:

Secret Manager: API keys, credentials
Identity Platform: User authentication
VPC: Private networking for services
Cloud IAM: Fine-grained access control

Implementation

1. Cloud Run Service Definitions

# cloud-run/theia-frontend.yaml

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: theia-frontend
  namespace: default
  annotations:
    run.googleapis.com/ingress: all
    run.googleapis.com/execution-environment: gen2
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: '1'
        autoscaling.knative.dev/maxScale: '100'
        run.googleapis.com/cpu-throttling: 'false'  # Important for WebSocket
        run.googleapis.com/startup-cpu-boost: 'true'
    spec:
      containerConcurrency: 80
      timeoutSeconds: 3600  # 1 hour for WebSocket connections
      containers:
      - name: theia
        image: gcr.io/PROJECT_ID/theia-frontend:latest
        ports:
        - name: http1
          containerPort: 3000
        env:
        - name: NODE_ENV
          value: production
        - name: PORT
          value: '3000'
        - name: REDIS_HOST
          valueFrom:
            secretKeyRef:
              name: redis-connection
              key: host
        - name: FDB_CLUSTER_FILE
          value: /etc/foundationdb/fdb.cluster
        resources:
          limits:
            cpu: '2000m'
            memory: '4Gi'
        volumeMounts:
        - name: fdb-config
          mountPath: /etc/foundationdb
          readOnly: true
      volumes:
      - name: fdb-config
        secret:
          secretName: fdb-cluster-file

# cloud-run/websocket-backend.yaml

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: websocket-backend
  namespace: default
spec:
  template:
    metadata:
      annotations:
        autoscaling.knative.dev/minScale: '2'
        autoscaling.knative.dev/maxScale: '200'
        run.googleapis.com/cpu-throttling: 'false'
    spec:
      containerConcurrency: 100
      timeoutSeconds: 86400  # 24 hours for long-lived WebSocket
      containers:
      - name: websocket
        image: gcr.io/PROJECT_ID/websocket-backend:latest
        ports:
        - name: h2c  # HTTP/2 for WebSocket
          containerPort: 4000
        env:
        - name: PORT
          value: '4000'
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: redis-connection
              key: url
        - name: GCS_BUCKET
          value: az1ai-user-files
        resources:
          limits:
            cpu: '4000m'
            memory: '8Gi'

2. Dockerfile for Cloud Run

# Dockerfile.cloudrun

FROM node:20-slim AS builder

WORKDIR /app

# Copy package files
COPY package*.json ./
COPY tsconfig*.json ./

# Install dependencies
RUN npm ci --production=false

# Copy source
COPY src ./src
COPY theia-app ./theia-app

# Build theia application
RUN npm run theia:build:prod

# Production image
FROM node:20-slim

WORKDIR /app

# Install production dependencies only
COPY package*.json ./
RUN npm ci --production

# Copy built application
COPY --from=builder /app/lib ./lib
COPY --from=builder /app/theia-app ./theia-app

# Expose port
EXPOSE 3000

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s \
  CMD node healthcheck.js || exit 1

# Start application
CMD ["npm", "run", "theia:start"]

3. FoundationDB on Compute Engine

#!/bin/bash
# scripts/deploy-fdb.sh

# Create FoundationDB VMs (3-node cluster)
for i in {1..3}; do
  gcloud compute instances create fdb-node-$i \
    --zone=us-central1-a \
    --machine-type=n2-standard-8 \
    --boot-disk-size=100GB \
    --boot-disk-type=pd-ssd \
    --image-family=ubuntu-2204-lts \
    --image-project=ubuntu-os-cloud \
    --metadata-from-file startup-script=install-fdb.sh \
    --tags=fdb-cluster \
    --scopes=cloud-platform
done

# Create firewall rule for FDB cluster communication
gcloud compute firewall-rules create allow-fdb-internal \
  --network=default \
  --allow=tcp:4500-4520 \
  --source-tags=fdb-cluster \
  --target-tags=fdb-cluster

#!/bin/bash
# install-fdb.sh (startup script for FDB VMs)

# Download and install FoundationDB
wget https://github.com/apple/foundationdb/releases/download/7.1.27/foundationdb-server_7.1.27-1_amd64.deb
wget https://github.com/apple/foundationdb/releases/download/7.1.27/foundationdb-clients_7.1.27-1_amd64.deb

sudo dpkg -i foundationdb-clients_7.1.27-1_amd64.deb
sudo dpkg -i foundationdb-server_7.1.27-1_amd64.deb

# Configure cluster
sudo fdbcli --exec "configure new single ssd"

# Enable automatic backups to Cloud Storage
sudo gsutil cp /etc/foundationdb/fdb.cluster gs://az1ai-fdb-backups/cluster/

4. CI/CD Pipeline (Cloud Build)

# cloudbuild.yaml

steps:
  # Build theia frontend
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'build'
      - '-t'
      - 'gcr.io/$PROJECT_ID/theia-frontend:$COMMIT_SHA'
      - '-t'
      - 'gcr.io/$PROJECT_ID/theia-frontend:latest'
      - '-f'
      - 'Dockerfile.cloudrun'
      - '.'

  # Push images
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'push'
      - 'gcr.io/$PROJECT_ID/theia-frontend:$COMMIT_SHA'

  # Deploy to Cloud Run
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: gcloud
    args:
      - 'run'
      - 'deploy'
      - 'theia-frontend'
      - '--image=gcr.io/$PROJECT_ID/theia-frontend:$COMMIT_SHA'
      - '--region=us-central1'
      - '--platform=managed'
      - '--allow-unauthenticated'
      - '--max-instances=100'
      - '--min-instances=1'
      - '--memory=4Gi'
      - '--cpu=2'
      - '--timeout=3600'
      - '--concurrency=80'
      - '--set-env-vars=NODE_ENV=production'

  # Run database migrations
  - name: 'gcr.io/google.com/cloudsdktool/cloud-sdk'
    entrypoint: bash
    args:
      - '-c'
      - |
        gcloud run jobs execute fdb-migrate \
          --region=us-central1 \
          --wait

images:
  - 'gcr.io/$PROJECT_ID/theia-frontend:$COMMIT_SHA'
  - 'gcr.io/$PROJECT_ID/theia-frontend:latest'

options:
  machineType: 'E2_HIGHCPU_8'
  logging: CLOUD_LOGGING_ONLY

5. Infrastructure as Code (Terraform)

# infrastructure/main.tf

terraform {
  required_version = ">= 1.0"
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }

  backend "gcs" {
    bucket = "az1ai-terraform-state"
    prefix = "prod"
  }
}

provider "google" {
  project = var.project_id
  region  = var.region
}

# Cloud Run Service
resource "google_cloud_run_service" "theia_frontend" {
  name     = "theia-frontend"
  location = var.region

  template {
    spec {
      containers {
        image = "gcr.io/${var.project_id}/theia-frontend:latest"

        ports {
          container_port = 3000
        }

        resources {
          limits = {
            cpu    = "2000m"
            memory = "4Gi"
          }
        }

        env {
          name  = "NODE_ENV"
          value = "production"
        }

        env {
          name = "REDIS_URL"
          value_from {
            secret_key_ref {
              name = google_secret_manager_secret.redis_url.secret_id
              key  = "latest"
            }
          }
        }
      }

      container_concurrency = 80
      timeout_seconds       = 3600

      service_account_name = google_service_account.cloud_run_sa.email
    }

    metadata {
      annotations = {
        "autoscaling.knative.dev/minScale"         = "1"
        "autoscaling.knative.dev/maxScale"         = "100"
        "run.googleapis.com/cpu-throttling"        = "false"
        "run.googleapis.com/startup-cpu-boost"     = "true"
        "run.googleapis.com/execution-environment" = "gen2"
      }
    }
  }

  traffic {
    percent         = 100
    latest_revision = true
  }
}

# Memorystore Redis
resource "google_redis_instance" "cache" {
  name           = "az1ai-cache"
  tier           = "STANDARD_HA"
  memory_size_gb = 5
  region         = var.region

  redis_version     = "REDIS_7_0"
  display_name      = "AZ1.AI Cache"

  authorized_network = google_compute_network.vpc.id

  redis_configs = {
    maxmemory-policy = "allkeys-lru"
  }
}

# Cloud Storage Bucket
resource "google_storage_bucket" "user_files" {
  name          = "az1ai-user-files"
  location      = "US"
  storage_class = "STANDARD"

  uniform_bucket_level_access = true

  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }

  lifecycle_rule {
    condition {
      age = 365
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }

  cors {
    origin          = ["https://ide.az1.ai"]
    method          = ["GET", "HEAD", "PUT", "POST", "DELETE"]
    response_header = ["*"]
    max_age_seconds = 3600
  }
}

# Cloud SQL (Alternative to FoundationDB for metadata)
resource "google_sql_database_instance" "metadata" {
  name             = "az1ai-metadata"
  database_version = "POSTGRES_15"
  region           = var.region

  settings {
    tier              = "db-custom-4-16384"  # 4 vCPU, 16GB RAM
    availability_type = "REGIONAL"  # HA

    disk_type = "PD_SSD"
    disk_size = 100

    backup_configuration {
      enabled                        = true
      point_in_time_recovery_enabled = true
      start_time                     = "03:00"
      transaction_log_retention_days = 7
    }

    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.vpc.id
    }

    database_flags {
      name  = "max_connections"
      value = "200"
    }
  }

  deletion_protection = true
}

# Load Balancer
resource "google_compute_global_address" "default" {
  name = "az1ai-lb-ip"
}

resource "google_compute_global_forwarding_rule" "https" {
  name       = "az1ai-https-lb"
  target     = google_compute_target_https_proxy.default.id
  port_range = "443"
  ip_address = google_compute_global_address.default.address
}

resource "google_compute_target_https_proxy" "default" {
  name             = "az1ai-https-proxy"
  url_map          = google_compute_url_map.default.id
  ssl_certificates = [google_compute_managed_ssl_certificate.default.id]
}

resource "google_compute_managed_ssl_certificate" "default" {
  name = "az1ai-ssl-cert"

  managed {
    domains = ["ide.az1.ai", "www.ide.az1.ai"]
  }
}

resource "google_compute_url_map" "default" {
  name            = "az1ai-url-map"
  default_service = google_compute_backend_service.cloud_run.id
}

resource "google_compute_backend_service" "cloud_run" {
  name                  = "az1ai-backend"
  protocol              = "HTTP"
  port_name             = "http"
  timeout_sec           = 3600
  enable_cdn            = true

  backend {
    group = google_compute_region_network_endpoint_group.cloud_run_neg.id
  }

  cdn_policy {
    cache_mode  = "CACHE_ALL_STATIC"
    default_ttl = 3600
    max_ttl     = 86400
  }

  log_config {
    enable      = true
    sample_rate = 1.0
  }
}

resource "google_compute_region_network_endpoint_group" "cloud_run_neg" {
  name                  = "cloud-run-neg"
  network_endpoint_type = "SERVERLESS"
  region                = var.region

  cloud_run {
    service = google_cloud_run_service.theia_frontend.name
  }
}

# VPC Network
resource "google_compute_network" "vpc" {
  name                    = "az1ai-vpc"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "private" {
  name          = "az1ai-private"
  ip_cidr_range = "10.0.0.0/24"
  region        = var.region
  network       = google_compute_network.vpc.id

  private_ip_google_access = true
}

# Service Account
resource "google_service_account" "cloud_run_sa" {
  account_id   = "cloud-run-sa"
  display_name = "Cloud Run Service Account"
}

resource "google_project_iam_member" "cloud_run_sa_roles" {
  for_each = toset([
    "roles/cloudsql.client",
    "roles/secretmanager.secretAccessor",
    "roles/storage.objectAdmin",
    "roles/logging.logWriter",
    "roles/cloudtrace.agent"
  ])

  project = var.project_id
  role    = each.value
  member  = "serviceAccount:${google_service_account.cloud_run_sa.email}"
}

# Secret Manager
resource "google_secret_manager_secret" "redis_url" {
  secret_id = "redis-url"

  replication {
    automatic = true
  }
}

# Monitoring
resource "google_monitoring_alert_policy" "cloud_run_errors" {
  display_name = "Cloud Run High Error Rate"
  combiner     = "OR"

  conditions {
    display_name = "Error rate > 5%"

    condition_threshold {
      filter          = "resource.type=\"cloud_run_revision\" AND metric.type=\"run.googleapis.com/request_count\" AND metric.label.response_code_class=\"5xx\""
      duration        = "300s"
      comparison      = "COMPARISON_GT"
      threshold_value = 0.05

      aggregations {
        alignment_period   = "60s"
        per_series_aligner = "ALIGN_RATE"
      }
    }
  }

  notification_channels = [google_monitoring_notification_channel.email.name]
}

resource "google_monitoring_notification_channel" "email" {
  display_name = "Email Notifications"
  type         = "email"

  labels = {
    email_address = "alerts@az1.ai"
  }
}

# Outputs
output "load_balancer_ip" {
  value = google_compute_global_address.default.address
}

output "cloud_run_url" {
  value = google_cloud_run_service.theia_frontend.status[0].url
}

output "redis_host" {
  value = google_redis_instance.cache.host
}

Rationale

Why Cloud Run?

Serverless Benefits:

✅ Auto-scaling from 0 to 1000+ instances
✅ Pay only for actual usage (not idle time)
✅ No server management
✅ Built-in load balancing

Performance:

✅ Gen 2 execution environment (faster cold starts)
✅ CPU boost on startup
✅ HTTP/2 support for WebSocket

Cost:

✅ Free tier: 2M requests/month
✅ $0.00002400/vCPU-second ($51.84/month for 1 vCPU 24/7)
✅ $0.00000250/GiB-second ($5.40/month for 1 GiB 24/7)

Why GCP vs AWS/Azure?

GCP Advantages:

✅ Cloud Run (best serverless containers)
✅ Global load balancer (built-in CDN)
✅ Generous free tier
✅ Simple pricing
✅ Excellent Kubernetes integration (GKE)

AWS Disadvantages:

❌ Fargate more expensive
❌ Complex networking setup
❌ More services to manage

Azure Disadvantages:

❌ Container Apps less mature
❌ Higher latency in some regions
❌ Less generous free tier

Why Memorystore vs Self-Hosted Redis?

Managed Service Benefits:

✅ High availability (HA tier)
✅ Automatic failover
✅ Automatic backups
✅ No maintenance

Cost:

✅ $0.053/GB-hour (~$38/month for 1GB HA)
✅ Cheaper than managing VMs

Alternatives Considered

Alternative 1: GKE (Google Kubernetes Engine)

Pros:

More control
Better for complex deployments
Easier multi-cloud migration

Cons:

❌ More complex
❌ Higher cost (always-on nodes)
❌ More operational overhead

Deferred: Start with Cloud Run, migrate to GKE if needed

Alternative 2: AWS ECS Fargate

Pros:

Similar to Cloud Run
AWS ecosystem

Cons:

❌ More expensive
❌ More complex networking
❌ Slower cold starts

Rejected: Cloud Run is simpler and cheaper

Alternative 3: Vercel/Netlify

Pros:

Extremely simple
Great DX
Built-in CI/CD

Cons:

❌ Expensive at scale
❌ Vendor lock-in
❌ Limited backend capabilities

Rejected: Need more control for backend

Alternative 4: Self-Hosted (Bare Metal)

Pros:

Full control
Predictable cost

Cons:

❌ High upfront cost
❌ Operational burden
❌ No auto-scaling

Rejected: Too much overhead for startup

Consequences

Positive

✅ Cost-Efficient: Pay only for usage, generous free tier ✅ Auto-Scaling: Handle traffic spikes automatically ✅ Global: Low latency worldwide with Cloud CDN ✅ Managed: Less operational overhead ✅ Fast Deployment: CI/CD with Cloud Build ✅ Secure: IAM, VPC, Secret Manager built-in

Negative

❌ Vendor Lock-In: GCP-specific features (Cloud Run) ❌ Cold Starts: ~1-2s for first request (mitigated with min instances) ❌ Cost Unpredictability: Usage-based pricing can spike ❌ WebSocket Limits: 1-hour timeout (but can extend to 24 hours)

Mitigation

Vendor Lock-In:

Use Docker containers (portable)
Abstract cloud-specific code
Document migration path to GKE/other clouds

Cold Starts:

Set min instances = 1 for critical services
Use startup CPU boost
Optimize container image size

Cost Unpredictability:

Set budget alerts
Monitor usage dashboards
Use quotas to cap spending

WebSocket Limits:

Document timeout behavior
Implement reconnection logic
Consider GKE for long-lived connections if needed

Implementation Plan

Phase 1: Development Environment ✅

Dockerize application
Local Docker Compose setup
Development workflow

Phase 2: GCP Setup 🔲

Create GCP project
Enable required APIs
Set up billing
Configure IAM roles

Phase 3: Core Services 🔲

Deploy Cloud Run services
Configure Memorystore Redis
Set up Cloud Storage
Deploy FoundationDB cluster

Phase 4: Networking 🔲

Configure load balancer
Set up Cloud CDN
Configure SSL certificates
Set up Cloud Armor (WAF)

Phase 5: CI/CD 🔲

Set up Cloud Build
Create deployment pipeline
Automated testing
Rollback strategy

Phase 6: Observability 🔲

Cloud Logging integration
Cloud Monitoring dashboards
Alert policies
Error tracking

Phase 7: Production Hardening 🔲

Load testing (10K users)
Disaster recovery plan
Backup strategy
Security audit

Success Metrics

Performance:

< 2s cold start time
< 100ms p99 latency (warm)
1000+ concurrent users

Reliability:

99.9% uptime
< 1% error rate
Zero data loss

Cost:

< $500/month for 100 active users
< $5000/month for 1000 active users

Deployment:

< 10 minutes deployment time
Zero-downtime deployments
Automated rollbacks

ADR-016: NGINX Load Balancer - Frontend LB
ADR-017: WebSocket Backend - Backend architecture
ADR-004: FoundationDB - Database

References

Cloud Run:

GCP Services:

Best Practices:

Status: ✅ Accepted Next Review: 2025-11-06 (1 month) Last Updated: 2025-10-06

Context​

Current State​

Requirements​

Decision​

Architecture​

Service Breakdown​

Implementation​

1. Cloud Run Service Definitions​

2. Dockerfile for Cloud Run​

3. FoundationDB on Compute Engine​

4. CI/CD Pipeline (Cloud Build)​

5. Infrastructure as Code (Terraform)​

Rationale​

Why Cloud Run?​

Why GCP vs AWS/Azure?​

Why Memorystore vs Self-Hosted Redis?​

Alternatives Considered​

Alternative 1: GKE (Google Kubernetes Engine)​

Alternative 2: AWS ECS Fargate​

Alternative 3: Vercel/Netlify​

Alternative 4: Self-Hosted (Bare Metal)​

Consequences​

Positive​

Negative​

Mitigation​

Implementation Plan​

Phase 1: Development Environment ✅​

Phase 2: GCP Setup 🔲​

Phase 3: Core Services 🔲​

Phase 4: Networking 🔲​

Phase 5: CI/CD 🔲​

Phase 6: Observability 🔲​

Phase 7: Production Hardening 🔲​

Success Metrics​

Related Decisions​

References​

Context

Current State

Requirements

Decision

Architecture

Service Breakdown

Implementation

1. Cloud Run Service Definitions

2. Dockerfile for Cloud Run

3. FoundationDB on Compute Engine

4. CI/CD Pipeline (Cloud Build)

5. Infrastructure as Code (Terraform)

Rationale

Why Cloud Run?

Why GCP vs AWS/Azure?

Why Memorystore vs Self-Hosted Redis?

Alternatives Considered

Alternative 1: GKE (Google Kubernetes Engine)

Alternative 2: AWS ECS Fargate

Alternative 3: Vercel/Netlify

Alternative 4: Self-Hosted (Bare Metal)

Consequences

Positive

Negative

Mitigation

Implementation Plan

Phase 1: Development Environment ✅

Phase 2: GCP Setup 🔲

Phase 3: Core Services 🔲

Phase 4: Networking 🔲

Phase 5: CI/CD 🔲

Phase 6: Observability 🔲

Phase 7: Production Hardening 🔲

Success Metrics

Related Decisions

References