GCP Infrastructure Topology - Complete Resource Inventory

Type: Infrastructure Diagram Scope: All GCP resources across regions and services Purpose: Understand complete infrastructure layout and dependencies Last Updated: November 23, 2025

Overview

Complete inventory of all Google Cloud Platform resources provisioned for CODITECT cloud infrastructure, organized by region, service category, and deployment environment.

Environments:

Development: coditect-dev - Full infrastructure at reduced scale
Staging: coditect-staging (Planned) - Production-equivalent for testing
Production: coditect-prod (Planned) - Full-scale production deployment

Regions:

Primary: us-central1 (Iowa) - All development and future production workloads
DR/Backup: us-east1 (South Carolina) - Planned disaster recovery region

Infrastructure Topology Diagram

Resource Inventory by Service

Compute (GKE)

Resource	Name	Configuration	Location	Status
GKE Cluster	`coditect-dev`	Kubernetes 1.28, 3-10 nodes	us-central1	✅ Active
Node Pool	`default-pool`	n1-standard-2, Preemptible	Multi-zone	✅ Active
Deployment	`license-api`	FastAPI, 3-10 replicas	GKE namespace: default	✅ Active
Service	`license-api`	ClusterIP, Port 8000	GKE	✅ Active
Ingress	`license-api`	NGINX Ingress, HTTPS	GKE	✅ Active

Resource Utilization:

CPU: 50% average across nodes
Memory: 60% average across nodes
Pods: 10-15 total (API + Ingress + system pods)
Auto-scaling: HPA targets 70% CPU, scales 3-10 replicas

Database (Cloud SQL)

Resource	Name	Configuration	Location	Status
Cloud SQL Instance	`coditect-dev`	PostgreSQL 16, db-custom-2-7680	us-central1	✅ Active
Databases	`coditect_prod`	License, user, tenant tables	Cloud SQL	✅ Active
User	`app_user`	Read/write permissions	Cloud SQL	✅ Active
User	`readonly_user`	Read-only for analytics	Cloud SQL	✅ Active
Backup Schedule	Daily + PITR	7-day retention	us-central1 + us-east1	✅ Active

Storage:

Current Size: 15GB / 100GB SSD
Auto-resize: Enabled (max 200GB)
Backup Size: 2GB (compressed)

Cache (Redis Memorystore)

Resource	Name	Configuration	Location	Status
Redis Instance	`coditect-dev-redis`	Redis 7.0, BASIC tier, 6GB	us-central1-a	✅ Active
Auth	Redis AUTH token	Stored in Secret Manager	Global	✅ Active
Persistence	RDB snapshots	Every 15 minutes	us-central1	✅ Active

Usage:

Memory: 1.2GB / 6GB (20%)
Operations: 5,000 ops/sec average
Hit Rate: 92%

Networking

Resource	Name	Configuration	Location	Status
VPC	`coditect-dev-vpc`	Custom mode, 10.0.0.0/16	Global	✅ Active
Subnet	`gke-nodes`	10.0.0.0/20 (4,096 IPs)	us-central1	✅ Active
Pod Range	`gke-pods`	10.1.0.0/16 (65,536 IPs)	us-central1	✅ Active
Service Range	`gke-services`	10.2.0.0/16 (65,536 IPs)	us-central1	✅ Active
Cloud Router	`coditect-dev-router`	BGP ASN 64512	us-central1	✅ Active
Cloud NAT	`coditect-dev-nat`	2 static IPs, 64 ports/VM	us-central1	✅ Active
Load Balancer	`license-api-lb`	HTTPS, Global Anycast	Global	✅ Active
Firewall Rules	5 rules	Allow health, HTTPS, internal; Deny all	Global	✅ Active

VPC Peering:

Cloud SQL: servicenetworking.googleapis.com (10.67.0.0/16)
Redis: redis.googleapis.com (10.121.0.0/16)

Security

Resource	Name	Configuration	Location	Status
Cloud Armor Policy	`coditect-waf`	Rate limiting, geo-blocking, OWASP	Global	✅ Active
Cloud KMS Key Ring	`coditect-license-keys`	RSA-4096, HSM-backed	us-central1	✅ Active
Crypto Key	`license-signing-key`	ASYMMETRIC_SIGN, manual rotation	us-central1	✅ Active
Secret Manager	9 secrets	Passwords, API keys, tokens	Global	✅ Active
Identity Platform	Firebase Auth	OAuth2 (Google, GitHub)	Global	✅ Active

Service Accounts:

license-api@coditect.iam.gserviceaccount.com - Workload Identity for API pods
github-actions@coditect.iam.gserviceaccount.com - CI/CD deployments
backup@coditect.iam.gserviceaccount.com - Backup operations

Monitoring & Logging

Resource	Name	Configuration	Location	Status
Managed Prometheus	GKE monitoring	15s scrape, 15-day retention	us-central1	✅ Active
Cloud Logging	Sink: BigQuery	Structured JSON, 30-day retention	Global	✅ Active
Log Router	Error logs → PagerDuty	Real-time alerting	Global	✅ Active
Cloud Trace	Distributed tracing	OpenTelemetry integration	Global	🟡 Planned
Grafana	Dashboard	License API performance	GKE	🟡 Planned

Alert Policies:

Error rate >2% for 5 minutes → PagerDuty
Latency p95 >200ms for 10 minutes → Email
CPU >80% for 15 minutes → Slack notification

Storage

Resource	Name	Configuration	Location	Status
Backup Bucket	`gs://coditect-backups`	Multi-region (us), 90-day lifecycle	Multi-region	✅ Active
Static Assets	`gs://coditect-static`	CDN-enabled, public read	Multi-region	🟡 Planned
Terraform State	`gs://coditect-terraform-state`	Versioned, locked	us-central1	✅ Active

Storage Costs:

Backups: 5GB (compressed) = $0.10/month
Static assets: 2GB = $0.05/month (if used)
Terraform state: 100MB = $0.002/month

External Integrations

Stripe (Payments)

Integration Points:

Subscription Management: Create/update/cancel licenses
Webhook Events: customer.subscription.updated, invoice.payment_succeeded
Usage Metering: Report seat usage for overage billing
Checkout: Hosted payment pages for license purchases

Security:

Webhook signature verification (HMAC-SHA256)
API key stored in Secret Manager
Restricted API key scope (read licenses, write usage)

SendGrid (Email)

Integration Points:

Transactional Emails: License activation, expiration warnings
Templates: Branded email templates for notifications
Analytics: Open rate, click-through rate tracking

Email Types:

License activated (on purchase)
Seat limit warning (at 80% capacity)
Expiration notice (30 days, 7 days, 1 day before)
Seat released (on graceful shutdown)

GitHub (CI/CD)

Integration Points:

Source Code: github.com/coditect-ai/coditect-cloud-infra
Container Registry: ghcr.io/coditect-ai/license-api
GitHub Actions: Build, test, deploy workflows
Secrets: GCP service account key, database credentials

Workflows:

test.yml - Run pytest on pull requests
deploy-dev.yml - Auto-deploy to dev on main branch push
deploy-prod.yml - Manual deployment to production (approval required)

Cost Breakdown (Monthly)

Development Environment

Service	Configuration	Monthly Cost
GKE	3x n1-standard-2 (preemptible)	$100
Cloud SQL	db-custom-2-7680, Regional HA	$150
Redis	6GB BASIC tier	$30
Networking	Load Balancer, Cloud NAT, egress	$20
KMS	HSM signing operations (~10K/month)	$10
Monitoring	Cloud Logging, Prometheus	$5
Storage	Backups, Terraform state	$1
Identity Platform	Free tier (up to 50K MAU)	$0
Secret Manager	9 secrets, 100K accesses/month	$0.50
Cloud Armor	1 policy, minimal rules	$5
Total		~$320/month

Production Environment (Estimated at 10,000 users)

Service	Configuration	Monthly Cost
GKE	10x n1-standard-4 (standard), Committed Use	$500
Cloud SQL	db-custom-8-30720, Regional HA	$600
Redis	16GB STANDARD HA tier	$150
Networking	Premium tier, higher egress	$100
KMS	HSM signing (~100K/month)	$50
Monitoring	Full observability stack	$50
Storage	Larger backups, CDN	$20
Identity Platform	10K MAU (within free tier)	$0
Secret Manager	Same secrets, higher access	$1
Cloud Armor	Advanced DDoS protection	$20
Total		~$1,500/month

Cost per User: $0.15/month per active user (at 10K users)

Scaling Strategy

Auto-Scaling Configuration

GKE Horizontal Pod Autoscaler (HPA):

Metric: CPU Utilization
Target: 70%
Min Replicas: 3
Max Replicas: 10
Scale Up: 1 pod every 30 seconds
Scale Down: 1 pod every 5 minutes

GKE Cluster Autoscaler:

Min Nodes: 3
Max Nodes: 20
Scale Up: Add node if pods pending for 30 seconds
Scale Down: Remove node if <50% utilized for 10 minutes

Cloud SQL:

Manual vertical scaling (increase vCPU/RAM)
Read replicas for analytics queries (future)
Connection pooling (max 100 connections)

Redis:

Upgrade to STANDARD HA tier (16GB → 100GB)
Sharding by tenant_id for multi-tenant isolation

Disaster Recovery Strategy

Backup Procedures

Cloud SQL:

Automated Daily Backups: 03:00 UTC, 7-day retention
Point-in-Time Recovery (PITR): 7-day window, 5-minute RPO
Transaction Logs: Replicated to us-east1 for geo-redundancy

Redis:

RDB Snapshots: Every 15 minutes, stored in Cloud Storage
Manual Snapshot: Before major schema changes
Recovery: Restore from snapshot (~5 minutes)

Failover Procedures

Scenario 1: Zone Failure (us-central1-a)

Detection: GKE node NotReady status
Action: Pods automatically rescheduled to us-central1-b or us-central1-c
Recovery Time: <2 minutes
Data Loss: None (stateless pods)

Scenario 2: Cloud SQL Failure

Detection: Regional HA detects primary failure
Action: Automatic failover to standby instance (same region)
Recovery Time: <60 seconds
Data Loss: None (synchronous replication)

Scenario 3: Complete Region Failure (us-central1)

Detection: Manual (all services unreachable)
Action: Provision new infrastructure in us-east1, restore from backups
Recovery Time: ~4 hours (manual DR process)
Data Loss: Up to 1 hour (backup lag)

Infrastructure as Code

OpenTofu Modules

Repository: github.com/coditect-ai/coditect-cloud-infra

Modules:

opentofu/modules/gke/ - GKE cluster and node pools
opentofu/modules/cloudsql/ - Cloud SQL PostgreSQL instance
opentofu/modules/redis/ - Cloud Memorystore Redis
opentofu/modules/networking/ - VPC, subnets, Cloud NAT
opentofu/modules/firewall/ - Firewall rules
opentofu/modules/secrets/ - Secret Manager secrets

State Management:

Backend: GCS
Bucket: gs://coditect-terraform-state
Lock: Cloud Storage object locking
Encryption: Google-managed encryption keys

Deployment:

cd opentofu/environments/dev
tofu init
tofu plan -out=plan.tfplan
tofu apply plan.tfplan

C2: Container Diagram - Application architecture
Networking Components - VPC details
Blue/Green Deployment - Deployment strategy
Security Architecture - Defense-in-depth layers

Document Classification: Internal - Infrastructure Documentation Review Cycle: Monthly or upon infrastructure changes Next Review Date: 2025-12-23

Last Updated: November 23, 2025 Owner: Platform Engineering Team Status: Production Ready (Development Environment)

Overview​

Infrastructure Topology Diagram​

Resource Inventory by Service​

Compute (GKE)​

Database (Cloud SQL)​

Cache (Redis Memorystore)​

Networking​

Security​

Monitoring & Logging​

Storage​

External Integrations​

Stripe (Payments)​

SendGrid (Email)​

GitHub (CI/CD)​

Cost Breakdown (Monthly)​

Development Environment​

Production Environment (Estimated at 10,000 users)​

Scaling Strategy​

Auto-Scaling Configuration​

Disaster Recovery Strategy​

Backup Procedures​

Failover Procedures​

Infrastructure as Code​

OpenTofu Modules​

Related Documents​

Overview

Infrastructure Topology Diagram

Resource Inventory by Service

Compute (GKE)

Database (Cloud SQL)

Cache (Redis Memorystore)

Networking

Security

Monitoring & Logging

Storage

External Integrations

Stripe (Payments)

SendGrid (Email)

GitHub (CI/CD)

Cost Breakdown (Monthly)

Development Environment

Production Environment (Estimated at 10,000 users)

Scaling Strategy

Auto-Scaling Configuration

Disaster Recovery Strategy

Backup Procedures

Failover Procedures

Infrastructure as Code

OpenTofu Modules

Related Documents