C2: Container Diagram - CODITECT Cloud Infrastructure

Level: Container (C4 Model Level 2) Scope: Internal Components of CODITECT Cloud Infrastructure Primary Audience: Technical Architects, Senior Developers, DevOps Engineers Last Updated: November 23, 2025

Overview

The Container Diagram shows the high-level technology choices and how responsibilities are distributed across different runtime containers within the CODITECT cloud infrastructure system.

Key Containers:

Google Kubernetes Engine (GKE) cluster running FastAPI application
Cloud SQL PostgreSQL for persistent data storage
Cloud Memorystore Redis for session management
Cloud Run for serverless background jobs
Networking layer (VPC, Load Balancer, Cloud NAT)
Security layer (Identity Platform, Cloud KMS, Secret Manager)

Container Diagram

Container Details

1. Google Cloud Load Balancer (Ingress)

Technology: Google Cloud HTTP(S) Load Balancer Purpose: SSL termination, DDoS protection, geographic routing Deployment: Global, multi-region (us-central1 primary)

Configuration:

Type: HTTPS Load Balancer
SSL Policy: TLS 1.3 only
Certificate: Let's Encrypt (auto-renewed)
Backend Service: NGINX Ingress Controller (GKE)
Health Check: /health endpoint (every 10s)
Session Affinity: Client IP (for WebSocket support)
Cloud Armor: Enabled (rate limiting, geo-blocking)

Responsibilities:

Terminate SSL/TLS connections
Distribute traffic across GKE ingress pods
Protect against DDoS attacks (Cloud Armor)
Rate limiting (100 req/min per IP)
Geographic routing (future multi-region support)

Scalability:

Auto-scales based on traffic
Handles 1M+ req/sec globally
99.99% SLA for Premium Tier

2. GKE Cluster (Compute)

Technology: Google Kubernetes Engine (GKE) Purpose: Container orchestration for License API Deployment: Regional (us-central1), multi-zone

Configuration:

Cluster Name: coditect-dev
Kubernetes Version: 1.28 (auto-upgrade)
Node Count: 3-10 (auto-scaling)
Machine Type: n1-standard-2 (2 vCPU, 7.5GB RAM)
Node Pool: Preemptible (dev), Standard (prod)
Network: Custom VPC (10.0.0.0/16)
Pod CIDR: 10.1.0.0/16
Service CIDR: 10.2.0.0/16
Workload Identity: Enabled
Binary Authorization: Disabled (dev), Enabled (prod)

Responsibilities:

Run License API pods (FastAPI)
Auto-scaling based on CPU/memory
Rolling updates with zero downtime
Health monitoring and self-healing
Workload Identity for GCP API access

Scalability:

Horizontal Pod Autoscaler (HPA): 3-10 replicas
Cluster Autoscaler: 3-20 nodes
Handles 10,000 concurrent users at scale

3. License API Pods (Application)

Technology: FastAPI 0.104+ (Python 3.11) Purpose: License validation, session management, JWT generation Deployment: Kubernetes Deployment (3-10 replicas)

Container Specification:

Image: gcr.io/coditect/license-api:latest
Port: 8000 (HTTP)
CPU Request: 500m (0.5 vCPU)
CPU Limit: 2000m (2 vCPU)
Memory Request: 1GB
Memory Limit: 4GB
Liveness Probe: /health (every 30s)
Readiness Probe: /ready (every 10s)
Environment: Production

API Endpoints:

POST   /api/v1/auth/register          # User registration
POST   /api/v1/auth/login             # OAuth2 login
POST   /api/v1/licenses/acquire       # Acquire license seat
POST   /api/v1/licenses/heartbeat     # Extend session TTL
DELETE /api/v1/licenses/release       # Release license seat
GET    /api/v1/licenses/status        # Check license status
GET    /api/v1/analytics/usage        # Usage analytics (admin)

Responsibilities:

Validate license keys and hardware fingerprints
Allocate/release seats atomically (Redis Lua scripts)
Generate signed JWT tokens (Cloud KMS)
Verify OAuth2 tokens (Identity Platform)
Session heartbeat management (Redis TTL)
Multi-tenant data isolation (PostgreSQL RLS)

Performance Characteristics:

Request latency: <50ms p95 (local region)
Throughput: 100 req/sec per pod
Connection pooling: 10 DB connections per pod
Async I/O: asyncio + aiohttp

4. Cloud SQL PostgreSQL (Database)

Technology: Cloud SQL PostgreSQL 16 Purpose: Persistent storage for licenses, tenants, users, audit logs Deployment: Regional HA (us-central1)

Configuration:

Instance Name: coditect-dev
Machine Type: db-custom-2-7680 (2 vCPU, 7.68GB RAM)
Storage: 100GB SSD (auto-resize to 200GB)
Version: PostgreSQL 16.1
Availability: Regional HA (auto-failover)
Private IP: 10.67.0.3 (VPC peering)
Public IP: Disabled
SSL/TLS: Required (self-signed CA)
Backup: Daily at 03:00 UTC
PITR: 7-day retention

Database Schema:

-- Multi-tenant architecture
CREATE TABLE tenants (
    id UUID PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE users (
    id UUID PRIMARY KEY,
    tenant_id UUID REFERENCES tenants(id),
    email VARCHAR(255) UNIQUE NOT NULL,
    oauth_provider VARCHAR(50),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE licenses (
    id UUID PRIMARY KEY,
    tenant_id UUID REFERENCES tenants(id),
    license_key VARCHAR(64) UNIQUE NOT NULL,
    max_seats INTEGER NOT NULL,
    expires_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    hardware_fingerprints JSONB DEFAULT '[]'
);

CREATE TABLE audit_logs (
    id BIGSERIAL PRIMARY KEY,
    tenant_id UUID REFERENCES tenants(id),
    user_id UUID REFERENCES users(id),
    action VARCHAR(100) NOT NULL,
    details JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Row-Level Security for multi-tenant isolation
ALTER TABLE licenses ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON licenses
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

Responsibilities:

Store license records, tenant data, user accounts
Enforce multi-tenant data isolation (RLS)
Provide ACID guarantees for seat allocation
Audit trail for compliance (SOC 2, GDPR)
Backup and point-in-time recovery

Performance Characteristics:

Max connections: 100 (configured)
Query latency: <10ms p95
Throughput: 500 QPS (current), 10K QPS (production)
Automatic failover: <60 seconds

5. Cloud Memorystore Redis (Cache)

Technology: Cloud Memorystore Redis 7.0 Purpose: Session tracking, seat allocation, rate limiting, cache Deployment: Single-zone (BASIC tier, dev), Multi-zone (STANDARD HA, prod)

Configuration:

Instance Name: coditect-dev-redis
Version: Redis 7.0
Tier: BASIC (dev), STANDARD_HA (prod)
Memory: 6GB (dev), 16GB (prod)
Private IP: 10.121.42.67:6378
Network: VPC peering
Auth: Enabled (AUTH command)
Transit Encryption: SERVER_AUTHENTICATION
Persistence: RDB snapshots (automatic)

Data Structures:

# Active session tracking
SET license:{license_id}:session:{hardware_fp} "active" EX 360  # 6-minute TTL

# Seat allocation counter (atomic increment/decrement)
HSET license:{license_id}:seats count 0 max_seats 10

# Rate limiting (sliding window)
ZADD rate_limit:{user_id} {timestamp} {request_id}
ZREMRANGEBYSCORE rate_limit:{user_id} 0 {timestamp - 60}

# Cached license data (reduce DB queries)
HSET license:{license_id}:cache license_key "..." expires_at "..."

Lua Scripts (Atomic Operations):

-- Atomic seat allocation
local key = KEYS[1]
local max_seats = tonumber(ARGV[1])
local current = tonumber(redis.call('HGET', key, 'count') or 0)

if current < max_seats then
    redis.call('HINCRBY', key, 'count', 1)
    return 1  -- Success
else
    return 0  -- No seats available
end

Responsibilities:

Track active sessions with TTL-based expiration
Atomic seat allocation/deallocation (Lua scripts)
Rate limiting for API endpoints
Cache frequently accessed license data
Heartbeat tracking (update TTL every 5 minutes)

Performance Characteristics:

Latency: <1ms p95 (in-region)
Throughput: 50K ops/sec (BASIC), 500K ops/sec (STANDARD HA)
Persistence: RDB snapshots every 15 minutes
Automatic failover: <30 seconds (STANDARD HA)

6. Identity Platform (Authentication)

Technology: Google Identity Platform (Firebase Auth) Purpose: OAuth2 authentication, JWT validation, user management Deployment: Managed service (global)

Configuration:

Providers:
  - Google OAuth2
  - GitHub OAuth2
  - Email/Password (disabled for now)

Token Settings:
  Algorithm: RS256
  Expiration: 1 hour (access token), 7 days (refresh token)

Multi-Factor Auth:
  Status: Optional (future)
  Methods: TOTP, SMS

Responsibilities:

User registration and authentication
OAuth2 token generation and validation
JWT signature verification (RS256)
User session management
Multi-factor authentication (future)

Integration:

# FastAPI dependency injection
async def verify_token(token: str = Depends(oauth2_scheme)):
    try:
        decoded = auth.verify_id_token(token)
        return decoded
    except auth.InvalidIdTokenError:
        raise HTTPException(401, "Invalid token")

7. Cloud KMS (License Signing)

Technology: Google Cloud Key Management Service Purpose: RSA-4096 asymmetric key signing for license tokens Deployment: Regional (us-central1), HSM-backed

Configuration:

Key Ring: coditect-license-keys
Key Name: license-signing-key
Algorithm: RSA_SIGN_PKCS1_4096_SHA256
Purpose: ASYMMETRIC_SIGN
Protection Level: HSM (hardware security module)
Rotation: Manual (future: automatic annual rotation)

Usage:

# Sign license token
async def sign_license(license_data: dict) -> str:
    kms_client = KeyManagementServiceAsyncClient()
    key_name = "projects/.../keyRings/.../cryptoKeys/.../cryptoKeyVersions/1"

    # Serialize license data
    message = json.dumps(license_data, sort_keys=True).encode()
    digest = hashlib.sha256(message).digest()

    # Sign with Cloud KMS
    response = await kms_client.asymmetric_sign(
        request={"name": key_name, "digest": {"sha256": digest}}
    )

    # Return base64-encoded signature
    return base64.b64encode(response.signature).decode()

Responsibilities:

Sign license tokens with RSA-4096 private key
Provide public key for offline verification
Key rotation and versioning
Audit trail for all signing operations

8. Secret Manager (Secrets)

Technology: Google Cloud Secret Manager Purpose: Secure storage for API keys, credentials, tokens Deployment: Global (replicated)

Secrets Inventory:

Secrets (9 total):
db-password           # Cloud SQL root password
db-app-user-password  # Application database user
db-readonly-password  # Read-only user for analytics
redis-auth-token      # Redis AUTH password
django-secret-key     # Django SECRET_KEY
stripe-api-key        # Stripe API key
stripe-webhook-secret # Stripe webhook signature
jwt-secret-key        # JWT signing key (backup)
sendgrid-api-key      # SendGrid email API key

Access Control:

# Workload Identity binding
Service Account: license-api@coditect.iam.gserviceaccount.com
Role: roles/secretmanager.secretAccessor
Secrets: All (scoped by IAM policy)

Usage:

# Fetch secret at startup
async def get_secret(secret_id: str) -> str:
    client = SecretManagerServiceAsyncClient()
    name = f"projects/{PROJECT_ID}/secrets/{secret_id}/versions/latest"
    response = await client.access_secret_version(request={"name": name})
    return response.payload.data.decode("UTF-8")

# Inject into environment
os.environ["DATABASE_PASSWORD"] = await get_secret("db-password")

9. VPC Network (Networking)

Technology: Google Cloud VPC (Virtual Private Cloud) Purpose: Network isolation, private communication, security Deployment: Regional (us-central1)

Configuration:

VPC Name: coditect-dev-vpc
CIDR Ranges:
  Primary Subnet: 10.0.0.0/16 (GKE nodes)
  Pods Secondary: 10.1.0.0/16 (Kubernetes pods)
  Services Secondary: 10.2.0.0/16 (Kubernetes services)

Private Google Access: Enabled
Flow Logs: Enabled (5-second interval, 100% sampling)
Routing Mode: Regional
MTU: 1460 (default)

Firewall Rules:

allow-health-checks:
  Source: 35.191.0.0/16, 130.211.0.0/22 (Google health checks)
  Target: gke-node
  Ports: TCP 8000, 8080

allow-ingress-https:
  Source: 0.0.0.0/0
  Target: load-balancer
  Ports: TCP 443

deny-all-ingress:
  Priority: 65535 (lowest)
  Action: Deny

VPC Peering:

Cloud SQL: Private service connection (10.67.0.0/16)
Redis: Private VPC peering (10.121.0.0/16)

10. Cloud NAT (Egress)

Technology: Google Cloud NAT Purpose: Egress traffic for private GKE nodes Deployment: Regional (us-central1)

Configuration:

NAT Name: coditect-dev-nat
Router: coditect-dev-router
NAT IP Allocation: AUTO_ONLY
Source Subnetworks: ALL_SUBNETWORKS_ALL_IP_RANGES
Min Ports per VM: 64
Logging: Enabled (ALL filter)
TCP Timeouts:
  Established Idle: 1200s (20 minutes)
  Transitory Idle: 30s
  Time Wait: 120s

Responsibilities:

Provide outbound internet access for private GKE nodes
Static IP allocation for egress (rate limiting by external APIs)
Logging and monitoring of egress traffic

11. Monitoring & Observability

Prometheus + Grafana:

Prometheus:
  Deployment: Managed Prometheus (GKE)
  Scrape Interval: 15 seconds
  Retention: 15 days

Grafana:
  Deployment: GKE pod (StatefulSet)
  Data Source: Prometheus, Cloud SQL
  Dashboards:
    - License API performance (latency, throughput, errors)
    - Database health (connections, query time, cache hit rate)
    - Redis performance (ops/sec, memory usage, evictions)
    - Seat utilization (per tenant, per license)

Cloud Logging:

Log Router:
  - API request/response logs → BigQuery (analytics)
  - Error logs → PagerDuty (alerting)
  - Audit logs → Cloud Storage (compliance)

Log Format: Structured JSON
Retention: 30 days (logs), 7 years (audit)

Technology Decision Rationale

Why FastAPI over Django?

Decision: Use FastAPI for License API instead of Django REST Framework.

Rationale:

Performance: Async/await support (3x faster than Django)
Type Safety: Pydantic models prevent runtime errors
Documentation: Auto-generated OpenAPI/Swagger docs
Lightweight: Minimal overhead for microservices
Modern: Built for Python 3.11+ with type hints

Trade-offs:

Smaller ecosystem than Django
Less built-in functionality (need to implement auth, admin)
Fewer third-party packages

Why Cloud SQL over Self-Managed PostgreSQL?

Decision: Use Cloud SQL PostgreSQL instead of self-managed on GKE.

Rationale:

Reliability: 99.95% SLA with automatic failover
Backups: Automated daily backups + PITR (7 days)
Maintenance: Automatic minor version upgrades
Security: Encryption at rest, private IP only
Cost: Comparable to EC2 + EBS + ops overhead

Trade-offs:

Limited PostgreSQL extensions (no Citus)
No direct filesystem access
Higher cost than self-managed

Why Redis over Memcached?

Decision: Use Redis instead of Memcached for session storage.

Rationale:

Data Structures: Supports hash, set, sorted set (not just key-value)
Persistence: RDB snapshots prevent data loss on restart
Atomic Operations: Lua scripting for seat allocation
TTL: Built-in expiration for session management
Pub/Sub: Future support for real-time notifications

Trade-offs:

Single-threaded (lower throughput than Memcached)
Higher memory overhead
More complex to operate

Data Flow Examples

License Acquisition Flow

User starts CODITECT application
License Client SDK → Load Balancer (HTTPS POST /api/v1/licenses/acquire)
Load Balancer → NGINX Ingress Controller
Ingress → License API pod (FastAPI)
API validates JWT (Identity Platform)
API checks license in PostgreSQL (SELECT * FROM licenses WHERE license_key = ?)
API atomically allocates seat in Redis (Lua script HINCRBY)
API signs license token with Cloud KMS (RSA-4096)
API stores session in Redis (SET with 6-minute TTL)
API logs audit event (INSERT INTO audit_logs)
Return signed JWT to SDK
CODITECT stores JWT locally for offline mode

Total Latency: ~200ms (p95)

Heartbeat Flow

Background thread wakes every 5 minutes
License Client SDK → Load Balancer (HTTPS POST /api/v1/licenses/heartbeat)
Load Balancer → License API pod
API validates JWT signature (local verification, no KMS call)
API updates Redis TTL (EXPIRE license:{id}:session:{fp} 360)
Return 200 OK

Total Latency: ~20ms (p95)

Seat Release (Graceful Shutdown)

User closes CODITECT application
SDK → Load Balancer (HTTPS DELETE /api/v1/licenses/release)
Load Balancer → License API pod
API deletes Redis session (DEL license:{id}:session:{fp})
API decrements seat counter (HINCRBY license:{id}:seats count -1)
API logs audit event (INSERT INTO audit_logs)
Return 200 OK

Total Latency: ~50ms (p95)

Automatic Seat Release (Zombie Session Cleanup)

User crashes / network disconnects
Heartbeat stops sending
Redis TTL expires after 6 minutes (EXPIRE)
Seat automatically released (no API call needed)
Next heartbeat attempt returns 404 (session not found)
SDK re-acquires license on next startup

Scalability Analysis

Current Capacity (Development)

Component	Configuration	Max Throughput
GKE Cluster	3 nodes x 2 vCPU	~100 concurrent users
License API	3 pods x 500m CPU	~300 req/sec
Cloud SQL	2 vCPU, 100 connections	~500 QPS
Redis	6GB BASIC	~50K ops/sec
Load Balancer	Auto-scaling	~1M req/sec

Production Target (10,000 users)

Component	Configuration	Max Throughput
GKE Cluster	10 nodes x 4 vCPU	~10,000 concurrent users
License API	10 pods x 2000m CPU	~1,000 req/sec
Cloud SQL	8 vCPU, 1000 connections	~10K QPS
Redis	16GB STANDARD HA	~500K ops/sec
Load Balancer	Auto-scaling	~1M req/sec

Bottleneck Analysis:

Current Bottleneck: Cloud SQL (100 connections)
Mitigation: Connection pooling (10 per pod), read replicas
Future Bottleneck: Redis (seat allocation contention)
Mitigation: Sharding by tenant_id, Redis Cluster

Security Architecture

Defense in Depth

Layer 1: Network (VPC Firewall)

Deny all ingress except HTTPS (443)
Private GKE cluster (no public node IPs)
Cloud NAT for egress (controlled IP ranges)

Layer 2: Application (API Gateway)

Rate limiting (100 req/min per IP)
Cloud Armor WAF (SQL injection, XSS prevention)
HTTPS-only (TLS 1.3)

Layer 3: Authentication (Identity Platform)

OAuth2 with JWT tokens (RS256)
Hardware fingerprinting (device binding)
Session expiration (1-hour access token, 7-day refresh)

Layer 4: Authorization (RBAC)

Tenant-based isolation (PostgreSQL RLS)
Role-based access control (admin, user, readonly)
Workload Identity for GCP API access

Layer 5: Data (Encryption)

Encryption at rest (Cloud SQL, Redis)
Encryption in transit (TLS 1.3)
Cloud KMS for key management (HSM-backed)

Cost Optimization Strategies

Current Costs (Development)

Component	Monthly Cost	Optimization
GKE (3x preemptible)	$100	Use preemptible nodes
Cloud SQL (Regional HA)	$150	Committed use discount (57%)
Redis (6GB BASIC)	$30	BASIC tier (no HA)
Networking	$20	Cloud NAT (minimal egress)
Total	$310/month	$3,720/year

Production Costs (10K users)

Component	Monthly Cost	Optimization
GKE (10x standard)	$500	3-year committed use
Cloud SQL (8 vCPU HA)	$400	Read replicas for analytics
Redis (16GB HA)	$150	STANDARD HA for failover
Cloud KMS	$10	Pay-per-use
Identity Platform	$50	Free tier up to 50K MAU
Load Balancer	$50	Premium tier for SLA
Monitoring	$40	Managed Prometheus
Total	$1,200/month	$14,400/year

Cost per User: $1.20/month per active user (at 10K users)

Deployment Strategy

Blue/Green Deployment

# Blue environment (current production)
Namespace: coditect-prod-blue
Deployment: license-api-blue (version 1.2.3)
Service: license-api-blue
Ingress: Routes 100% traffic to blue

# Green environment (new version)
Namespace: coditect-prod-green
Deployment: license-api-green (version 1.3.0)
Service: license-api-green
Ingress: Routes 0% traffic initially

# Gradual rollout
1. Deploy green (0% traffic)
2. Health checks pass
3. Route 10% traffic to green (canary)
4. Monitor metrics for 30 minutes
5. Route 50% traffic to green
6. Route 100% traffic to green
7. Decommission blue after 24 hours

Rolling Update Strategy

Deployment Strategy: RollingUpdate
Max Surge: 1 (add 1 extra pod during update)
Max Unavailable: 0 (zero-downtime deployment)

Update Process:
1. Create new pod (version 1.3.0)
2. Wait for readiness probe (30s)
3. Add to service load balancer
4. Remove old pod (version 1.2.3)
5. Repeat until all pods updated

Disaster Recovery

Backup Strategy

Cloud SQL:

Automated daily backups (03:00 UTC)
Point-in-time recovery (7-day retention)
Transaction log backups (every 5 minutes)
Backup location: us-central1 + us-east1 (geo-redundant)

Redis:

RDB snapshots every 15 minutes
AOF persistence (future, for STANDARD HA)
Manual snapshot before major changes

Secrets:

Secret Manager automatic replication (global)
Manual backup to encrypted GCS bucket (quarterly)

Recovery Procedures

Scenario 1: Pod Failure

Detection: Liveness probe fails (30s)
Action: Kubernetes auto-restarts pod
Recovery Time: <60 seconds
Data Loss: None (stateless pods)

Scenario 2: Node Failure

Detection: Node NotReady (60s)
Action: Pods rescheduled to healthy nodes
Recovery Time: <2 minutes
Data Loss: None (stateful data in Cloud SQL/Redis)

Scenario 3: Cloud SQL Failover

Detection: Regional HA detects primary failure
Action: Automatic failover to standby
Recovery Time: <60 seconds
Data Loss: None (synchronous replication)

Scenario 4: Redis Failure (BASIC tier)

Detection: Connection timeout
Action: Manual intervention (restart instance)
Recovery Time: ~5 minutes
Data Loss: Up to 15 minutes (RDB snapshot interval)

Scenario 5: Complete Region Failure

Detection: Manual (monitoring alerts)
Action: Failover to disaster recovery region (future)
Recovery Time: ~4 hours (manual DR)
Data Loss: Up to 1 hour (backup lag)

C1: System Context Diagram - External view and integrations
C3: GKE Component Diagram - Kubernetes internals
C3: Networking Components - VPC and firewall details
C3: Security Components - Authentication and encryption

Document History

Version	Date	Author	Changes
1.0	2025-11-23	SDD Architect	Initial C2 Container diagram with all components

Document Classification: Internal - Architecture Documentation Review Cycle: Quarterly (or upon infrastructure changes) Next Review Date: 2026-02-23

Overview​

Container Diagram​

Container Details​

1. Google Cloud Load Balancer (Ingress)​

2. GKE Cluster (Compute)​

3. License API Pods (Application)​

4. Cloud SQL PostgreSQL (Database)​

5. Cloud Memorystore Redis (Cache)​

6. Identity Platform (Authentication)​

7. Cloud KMS (License Signing)​

8. Secret Manager (Secrets)​

9. VPC Network (Networking)​

10. Cloud NAT (Egress)​

11. Monitoring & Observability​

Technology Decision Rationale​

Why FastAPI over Django?​

Why Cloud SQL over Self-Managed PostgreSQL?​

Why Redis over Memcached?​

Data Flow Examples​

License Acquisition Flow​

Heartbeat Flow​

Seat Release (Graceful Shutdown)​

Automatic Seat Release (Zombie Session Cleanup)​

Scalability Analysis​

Current Capacity (Development)​

Production Target (10,000 users)​

Security Architecture​

Defense in Depth​

Cost Optimization Strategies​

Current Costs (Development)​

Production Costs (10K users)​

Deployment Strategy​

Blue/Green Deployment​

Rolling Update Strategy​

Disaster Recovery​

Backup Strategy​

Recovery Procedures​

Related Diagrams​

Document History​

Overview

Container Diagram

Container Details

1. Google Cloud Load Balancer (Ingress)

2. GKE Cluster (Compute)

3. License API Pods (Application)

4. Cloud SQL PostgreSQL (Database)

5. Cloud Memorystore Redis (Cache)

6. Identity Platform (Authentication)

7. Cloud KMS (License Signing)

8. Secret Manager (Secrets)

9. VPC Network (Networking)

10. Cloud NAT (Egress)

11. Monitoring & Observability

Technology Decision Rationale

Why FastAPI over Django?

Why Cloud SQL over Self-Managed PostgreSQL?

Why Redis over Memcached?

Data Flow Examples

License Acquisition Flow

Heartbeat Flow

Seat Release (Graceful Shutdown)

Automatic Seat Release (Zombie Session Cleanup)

Scalability Analysis

Current Capacity (Development)

Production Target (10,000 users)

Security Architecture

Defense in Depth

Cost Optimization Strategies

Current Costs (Development)

Production Costs (10K users)

Deployment Strategy

Blue/Green Deployment

Rolling Update Strategy

Disaster Recovery

Backup Strategy

Recovery Procedures

Related Diagrams

Document History