Skip to main content

GCP Infrastructure Topology - Complete Resource Inventory

Type: Infrastructure Diagram Scope: All GCP resources across regions and services Purpose: Understand complete infrastructure layout and dependencies Last Updated: November 23, 2025


Overview

Complete inventory of all Google Cloud Platform resources provisioned for CODITECT cloud infrastructure, organized by region, service category, and deployment environment.

Environments:

  • Development: coditect-dev - Full infrastructure at reduced scale
  • Staging: coditect-staging (Planned) - Production-equivalent for testing
  • Production: coditect-prod (Planned) - Full-scale production deployment

Regions:

  • Primary: us-central1 (Iowa) - All development and future production workloads
  • DR/Backup: us-east1 (South Carolina) - Planned disaster recovery region

Infrastructure Topology Diagram


Resource Inventory by Service

Compute (GKE)

ResourceNameConfigurationLocationStatus
GKE Clustercoditect-devKubernetes 1.28, 3-10 nodesus-central1✅ Active
Node Pooldefault-pooln1-standard-2, PreemptibleMulti-zone✅ Active
Deploymentlicense-apiFastAPI, 3-10 replicasGKE namespace: default✅ Active
Servicelicense-apiClusterIP, Port 8000GKE✅ Active
Ingresslicense-apiNGINX Ingress, HTTPSGKE✅ Active

Resource Utilization:

  • CPU: 50% average across nodes
  • Memory: 60% average across nodes
  • Pods: 10-15 total (API + Ingress + system pods)
  • Auto-scaling: HPA targets 70% CPU, scales 3-10 replicas

Database (Cloud SQL)

ResourceNameConfigurationLocationStatus
Cloud SQL Instancecoditect-devPostgreSQL 16, db-custom-2-7680us-central1✅ Active
Databasescoditect_prodLicense, user, tenant tablesCloud SQL✅ Active
Userapp_userRead/write permissionsCloud SQL✅ Active
Userreadonly_userRead-only for analyticsCloud SQL✅ Active
Backup ScheduleDaily + PITR7-day retentionus-central1 + us-east1✅ Active

Storage:

  • Current Size: 15GB / 100GB SSD
  • Auto-resize: Enabled (max 200GB)
  • Backup Size: 2GB (compressed)

Cache (Redis Memorystore)

ResourceNameConfigurationLocationStatus
Redis Instancecoditect-dev-redisRedis 7.0, BASIC tier, 6GBus-central1-a✅ Active
AuthRedis AUTH tokenStored in Secret ManagerGlobal✅ Active
PersistenceRDB snapshotsEvery 15 minutesus-central1✅ Active

Usage:

  • Memory: 1.2GB / 6GB (20%)
  • Operations: 5,000 ops/sec average
  • Hit Rate: 92%

Networking

ResourceNameConfigurationLocationStatus
VPCcoditect-dev-vpcCustom mode, 10.0.0.0/16Global✅ Active
Subnetgke-nodes10.0.0.0/20 (4,096 IPs)us-central1✅ Active
Pod Rangegke-pods10.1.0.0/16 (65,536 IPs)us-central1✅ Active
Service Rangegke-services10.2.0.0/16 (65,536 IPs)us-central1✅ Active
Cloud Routercoditect-dev-routerBGP ASN 64512us-central1✅ Active
Cloud NATcoditect-dev-nat2 static IPs, 64 ports/VMus-central1✅ Active
Load Balancerlicense-api-lbHTTPS, Global AnycastGlobal✅ Active
Firewall Rules5 rulesAllow health, HTTPS, internal; Deny allGlobal✅ Active

VPC Peering:

  • Cloud SQL: servicenetworking.googleapis.com (10.67.0.0/16)
  • Redis: redis.googleapis.com (10.121.0.0/16)

Security

ResourceNameConfigurationLocationStatus
Cloud Armor Policycoditect-wafRate limiting, geo-blocking, OWASPGlobal✅ Active
Cloud KMS Key Ringcoditect-license-keysRSA-4096, HSM-backedus-central1✅ Active
Crypto Keylicense-signing-keyASYMMETRIC_SIGN, manual rotationus-central1✅ Active
Secret Manager9 secretsPasswords, API keys, tokensGlobal✅ Active
Identity PlatformFirebase AuthOAuth2 (Google, GitHub)Global✅ Active

Service Accounts:

  • license-api@coditect.iam.gserviceaccount.com - Workload Identity for API pods
  • github-actions@coditect.iam.gserviceaccount.com - CI/CD deployments
  • backup@coditect.iam.gserviceaccount.com - Backup operations

Monitoring & Logging

ResourceNameConfigurationLocationStatus
Managed PrometheusGKE monitoring15s scrape, 15-day retentionus-central1✅ Active
Cloud LoggingSink: BigQueryStructured JSON, 30-day retentionGlobal✅ Active
Log RouterError logs → PagerDutyReal-time alertingGlobal✅ Active
Cloud TraceDistributed tracingOpenTelemetry integrationGlobal🟡 Planned
GrafanaDashboardLicense API performanceGKE🟡 Planned

Alert Policies:

  • Error rate >2% for 5 minutes → PagerDuty
  • Latency p95 >200ms for 10 minutes → Email
  • CPU >80% for 15 minutes → Slack notification

Storage

ResourceNameConfigurationLocationStatus
Backup Bucketgs://coditect-backupsMulti-region (us), 90-day lifecycleMulti-region✅ Active
Static Assetsgs://coditect-staticCDN-enabled, public readMulti-region🟡 Planned
Terraform Stategs://coditect-terraform-stateVersioned, lockedus-central1✅ Active

Storage Costs:

  • Backups: 5GB (compressed) = $0.10/month
  • Static assets: 2GB = $0.05/month (if used)
  • Terraform state: 100MB = $0.002/month

External Integrations

Stripe (Payments)

Integration Points:

  • Subscription Management: Create/update/cancel licenses
  • Webhook Events: customer.subscription.updated, invoice.payment_succeeded
  • Usage Metering: Report seat usage for overage billing
  • Checkout: Hosted payment pages for license purchases

Security:

  • Webhook signature verification (HMAC-SHA256)
  • API key stored in Secret Manager
  • Restricted API key scope (read licenses, write usage)

SendGrid (Email)

Integration Points:

  • Transactional Emails: License activation, expiration warnings
  • Templates: Branded email templates for notifications
  • Analytics: Open rate, click-through rate tracking

Email Types:

  • License activated (on purchase)
  • Seat limit warning (at 80% capacity)
  • Expiration notice (30 days, 7 days, 1 day before)
  • Seat released (on graceful shutdown)

GitHub (CI/CD)

Integration Points:

  • Source Code: github.com/coditect-ai/coditect-cloud-infra
  • Container Registry: ghcr.io/coditect-ai/license-api
  • GitHub Actions: Build, test, deploy workflows
  • Secrets: GCP service account key, database credentials

Workflows:

  • test.yml - Run pytest on pull requests
  • deploy-dev.yml - Auto-deploy to dev on main branch push
  • deploy-prod.yml - Manual deployment to production (approval required)

Cost Breakdown (Monthly)

Development Environment

ServiceConfigurationMonthly Cost
GKE3x n1-standard-2 (preemptible)$100
Cloud SQLdb-custom-2-7680, Regional HA$150
Redis6GB BASIC tier$30
NetworkingLoad Balancer, Cloud NAT, egress$20
KMSHSM signing operations (~10K/month)$10
MonitoringCloud Logging, Prometheus$5
StorageBackups, Terraform state$1
Identity PlatformFree tier (up to 50K MAU)$0
Secret Manager9 secrets, 100K accesses/month$0.50
Cloud Armor1 policy, minimal rules$5
Total~$320/month

Production Environment (Estimated at 10,000 users)

ServiceConfigurationMonthly Cost
GKE10x n1-standard-4 (standard), Committed Use$500
Cloud SQLdb-custom-8-30720, Regional HA$600
Redis16GB STANDARD HA tier$150
NetworkingPremium tier, higher egress$100
KMSHSM signing (~100K/month)$50
MonitoringFull observability stack$50
StorageLarger backups, CDN$20
Identity Platform10K MAU (within free tier)$0
Secret ManagerSame secrets, higher access$1
Cloud ArmorAdvanced DDoS protection$20
Total~$1,500/month

Cost per User: $0.15/month per active user (at 10K users)


Scaling Strategy

Auto-Scaling Configuration

GKE Horizontal Pod Autoscaler (HPA):

Metric: CPU Utilization
Target: 70%
Min Replicas: 3
Max Replicas: 10
Scale Up: 1 pod every 30 seconds
Scale Down: 1 pod every 5 minutes

GKE Cluster Autoscaler:

Min Nodes: 3
Max Nodes: 20
Scale Up: Add node if pods pending for 30 seconds
Scale Down: Remove node if <50% utilized for 10 minutes

Cloud SQL:

  • Manual vertical scaling (increase vCPU/RAM)
  • Read replicas for analytics queries (future)
  • Connection pooling (max 100 connections)

Redis:

  • Upgrade to STANDARD HA tier (16GB → 100GB)
  • Sharding by tenant_id for multi-tenant isolation

Disaster Recovery Strategy

Backup Procedures

Cloud SQL:

  • Automated Daily Backups: 03:00 UTC, 7-day retention
  • Point-in-Time Recovery (PITR): 7-day window, 5-minute RPO
  • Transaction Logs: Replicated to us-east1 for geo-redundancy

Redis:

  • RDB Snapshots: Every 15 minutes, stored in Cloud Storage
  • Manual Snapshot: Before major schema changes
  • Recovery: Restore from snapshot (~5 minutes)

Failover Procedures

Scenario 1: Zone Failure (us-central1-a)

  • Detection: GKE node NotReady status
  • Action: Pods automatically rescheduled to us-central1-b or us-central1-c
  • Recovery Time: <2 minutes
  • Data Loss: None (stateless pods)

Scenario 2: Cloud SQL Failure

  • Detection: Regional HA detects primary failure
  • Action: Automatic failover to standby instance (same region)
  • Recovery Time: <60 seconds
  • Data Loss: None (synchronous replication)

Scenario 3: Complete Region Failure (us-central1)

  • Detection: Manual (all services unreachable)
  • Action: Provision new infrastructure in us-east1, restore from backups
  • Recovery Time: ~4 hours (manual DR process)
  • Data Loss: Up to 1 hour (backup lag)

Infrastructure as Code

OpenTofu Modules

Repository: github.com/coditect-ai/coditect-cloud-infra

Modules:

  • opentofu/modules/gke/ - GKE cluster and node pools
  • opentofu/modules/cloudsql/ - Cloud SQL PostgreSQL instance
  • opentofu/modules/redis/ - Cloud Memorystore Redis
  • opentofu/modules/networking/ - VPC, subnets, Cloud NAT
  • opentofu/modules/firewall/ - Firewall rules
  • opentofu/modules/secrets/ - Secret Manager secrets

State Management:

Backend: GCS
Bucket: gs://coditect-terraform-state
Lock: Cloud Storage object locking
Encryption: Google-managed encryption keys

Deployment:

cd opentofu/environments/dev
tofu init
tofu plan -out=plan.tfplan
tofu apply plan.tfplan


Document Classification: Internal - Infrastructure Documentation Review Cycle: Monthly or upon infrastructure changes Next Review Date: 2025-12-23


Last Updated: November 23, 2025 Owner: Platform Engineering Team Status: Production Ready (Development Environment)