C3-04: GKE Components - Container Architecture
Document Type: C4 Level 3 (Component) Diagram Container: Google Kubernetes Engine (GKE) Technology: GKE 1.28+, Kubernetes, Django REST Framework, Gunicorn Status: Specification Complete - Ready for Implementation Last Updated: November 30, 2025
Table of Contents
- Overview
- Component Diagram
- GKE Cluster Architecture
- Kubernetes Resources
- Deployment Configuration
- Service Configuration
- Ingress and Load Balancing
- Auto-Scaling Configuration
- Configuration Management
- Secrets Management
- Monitoring and Logging
- Production Deployment
Overview
Purpose
This document specifies the component-level architecture of the Google Kubernetes Engine (GKE) cluster hosting the CODITECT License Management Platform. It provides:
- Complete GKE cluster configuration (node pools, networking)
- Kubernetes resource specifications (Deployments, Services, Ingress)
- Django REST Framework pod architecture
- Auto-scaling and high-availability patterns
- Production-ready monitoring and logging integration
GKE Cluster Role
The GKE cluster serves as the container orchestration platform for:
- Django REST Framework license API (primary workload)
- Celery background workers (heartbeat cleanup, session management)
- Redis client (connection pooling)
- PostgreSQL client (connection pooling)
- Monitoring and logging agents (Prometheus, Fluent Bit)
Key Features:
- High Availability: Multi-zone deployment with automatic failover
- Auto-Scaling: Horizontal pod autoscaling based on CPU/memory
- Zero-Downtime Deployments: Rolling updates with health checks
- Resource Efficiency: Preemptible nodes for cost optimization (dev)
- Security: Private cluster with workload identity
Architecture Pattern
Internet
↓
Cloud Load Balancer (HTTPS/TLS 1.3)
↓
GKE Ingress Controller (nginx)
↓
Kubernetes Service (ClusterIP)
↓
Django REST Framework Pods (3 replicas)
├─► Cloud SQL Proxy (PostgreSQL)
├─► Redis Client (Memorystore)
├─► Cloud KMS Client (signing)
└─► Identity Platform (authentication)
Component Diagram
GKE Internal Components
GKE Cluster Architecture
Cluster Configuration
File: opentofu/modules/gke/main.tf
/**
* GKE Cluster Configuration
*
* Features:
* - Regional cluster (multi-zone HA)
* - Private cluster (no public IPs on nodes)
* - Workload Identity (secure GCP service access)
* - Binary Authorization (image security)
* - Auto-scaling enabled
*/
resource "google_container_cluster" "primary" {
name = "${var.environment}-gke-cluster"
location = var.region # Regional = multi-zone
# Remove default node pool (we'll create custom pools)
remove_default_node_pool = true
initial_node_count = 1
# Network configuration
network = var.vpc_network
subnetwork = var.gke_subnet
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true # Nodes have private IPs only
enable_private_endpoint = false # API endpoint is public
master_ipv4_cidr_block = "172.16.0.0/28"
}
# IP allocation for pods and services
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
# Master authorized networks (who can access API)
master_authorized_networks_config {
cidr_blocks {
cidr_block = "0.0.0.0/0"
display_name = "All networks (for development)"
# Production: Restrict to office IPs + CI/CD
}
}
# Workload Identity (secure service account binding)
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Binary Authorization (only signed images)
binary_authorization {
evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE"
}
# Addons
addons_config {
http_load_balancing {
disabled = false # Enable Ingress
}
horizontal_pod_autoscaling {
disabled = false # Enable HPA
}
network_policy_config {
disabled = false # Enable NetworkPolicy
}
}
# Monitoring and logging
monitoring_config {
enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
}
logging_config {
enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
}
# Maintenance window
maintenance_policy {
daily_maintenance_window {
start_time = "03:00" # 3 AM UTC
}
}
# Resource labels
resource_labels = {
environment = var.environment
project = "coditect"
managed_by = "opentofu"
}
}
Node Pool Configuration
File: opentofu/modules/gke/node_pools.tf
/**
* Production Node Pool
*
* Configuration:
* - n1-standard-2 (2 vCPU, 7.5 GB RAM)
* - Preemptible for dev (cost savings)
* - Auto-scaling 1-10 nodes
* - Auto-repair and auto-upgrade enabled
*/
resource "google_container_node_pool" "primary_nodes" {
name = "${var.environment}-node-pool"
location = var.region
cluster = google_container_cluster.primary.name
node_count = var.min_node_count
# Auto-scaling configuration
autoscaling {
min_node_count = var.min_node_count # Default: 1
max_node_count = var.max_node_count # Default: 10
}
# Node configuration
node_config {
machine_type = var.node_machine_type # n1-standard-2
# Use preemptible nodes for dev (70% cost savings)
preemptible = var.environment == "dev" ? true : false
disk_size_gb = 50
disk_type = "pd-standard"
# OAuth scopes (permissions for GCP APIs)
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]
# Workload Identity (bind Kubernetes SA to GCP SA)
workload_metadata_config {
mode = "GKE_METADATA"
}
# Metadata
metadata = {
disable-legacy-endpoints = "true"
}
# Labels
labels = {
environment = var.environment
node_pool = "primary"
}
# Taints (if needed for dedicated workloads)
# taint {
# key = "workload-type"
# value = "api"
# effect = "NO_SCHEDULE"
# }
# Security
shielded_instance_config {
enable_secure_boot = true
enable_integrity_monitoring = true
}
}
# Management configuration
management {
auto_repair = true
auto_upgrade = true
}
# Upgrade settings
upgrade_settings {
max_surge = 1
max_unavailable = 0
}
}
Kubernetes Resources
Namespace Configuration
File: kubernetes/base/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: coditect
labels:
name: coditect
environment: production
managed-by: opentofu
Django REST Framework Deployment
File: kubernetes/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: license-api
namespace: coditect
labels:
app: license-api
component: backend
version: v1
spec:
replicas: 3 # High availability
selector:
matchLabels:
app: license-api
component: backend
# Deployment strategy
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Allow 1 extra pod during update
maxUnavailable: 0 # Zero-downtime deployments
template:
metadata:
labels:
app: license-api
component: backend
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"
spec:
# Service account with Workload Identity
serviceAccountName: license-api-sa
# Pod anti-affinity (spread across nodes)
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- license-api
topologyKey: kubernetes.io/hostname
# Init containers (run before main container)
initContainers:
- name: wait-for-db
image: busybox:1.35
command:
- sh
- -c
- |
until nc -z -v -w30 $DB_HOST $DB_PORT; do
echo "Waiting for database connection..."
sleep 5
done
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: db-credentials
key: host
- name: DB_PORT
valueFrom:
secretKeyRef:
name: db-credentials
key: port
- name: run-migrations
image: gcr.io/coditect-cloud-infra/license-api:latest
command:
- python
- manage.py
- migrate
- --noinput
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-credentials
# Main containers
containers:
# Django REST Framework (Gunicorn)
- name: django
image: gcr.io/coditect-cloud-infra/license-api:latest
imagePullPolicy: Always
# Command (override Dockerfile CMD)
command:
- gunicorn
- config.wsgi:application
- --bind=0.0.0.0:8000
- --workers=4
- --threads=2
- --worker-class=gthread
- --worker-tmp-dir=/dev/shm
- --timeout=60
- --access-logfile=-
- --error-logfile=-
- --log-level=info
# Ports
ports:
- name: http
containerPort: 8000
protocol: TCP
# Environment variables
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-credentials
env:
- name: DJANGO_SETTINGS_MODULE
value: "config.settings.production"
- name: FIREBASE_SERVICE_ACCOUNT_PATH
value: "/secrets/firebase-service-account.json"
# Resource limits
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
# Health checks
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /health/startup
port: http
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
# Volume mounts
volumeMounts:
- name: firebase-credentials
mountPath: /secrets
readOnly: true
- name: tmp
mountPath: /tmp
- name: shm
mountPath: /dev/shm
# Cloud SQL Proxy (sidecar)
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.33.2
command:
- /cloud_sql_proxy
- -instances=$(INSTANCE_CONNECTION_NAME)=tcp:5432
- -credential_file=/secrets/service-account.json
env:
- name: INSTANCE_CONNECTION_NAME
valueFrom:
secretKeyRef:
name: db-credentials
key: instance_connection_name
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumeMounts:
- name: cloudsql-credentials
mountPath: /secrets
readOnly: true
# Volumes
volumes:
- name: firebase-credentials
secret:
secretName: firebase-service-account
items:
- key: service-account.json
path: firebase-service-account.json
- name: cloudsql-credentials
secret:
secretName: cloudsql-service-account
items:
- key: service-account.json
path: service-account.json
- name: tmp
emptyDir: {}
- name: shm
emptyDir:
medium: Memory
sizeLimit: 256Mi
# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
Service Configuration
ClusterIP Service for Django API
File: kubernetes/base/service.yaml
apiVersion: v1
kind: Service
metadata:
name: license-api
namespace: coditect
labels:
app: license-api
component: backend
spec:
type: ClusterIP # Internal only (Ingress routes to this)
selector:
app: license-api
component: backend
ports:
- name: http
port: 8000
targetPort: http
protocol: TCP
# Session affinity (optional - for sticky sessions)
# sessionAffinity: ClientIP
# sessionAffinityConfig:
# clientIP:
# timeoutSeconds: 10800 # 3 hours
Headless Service for StatefulSet (if needed)
File: kubernetes/base/service-headless.yaml
# Headless service for StatefulSet workloads (e.g., Celery workers)
apiVersion: v1
kind: Service
metadata:
name: celery-workers
namespace: coditect
labels:
app: celery-workers
component: background
spec:
clusterIP: None # Headless (no load balancing)
selector:
app: celery-workers
component: background
ports:
- name: flower
port: 5555
targetPort: 5555
Ingress and Load Balancing
Ingress Configuration
File: kubernetes/base/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: license-api-ingress
namespace: coditect
annotations:
# Use Google Cloud Load Balancer
kubernetes.io/ingress.class: "gce"
# Enable HTTPS redirect
kubernetes.io/ingress.allow-http: "false"
# Managed certificate (GCP)
networking.gke.io/managed-certificates: "license-api-cert"
# Cloud Armor security policy
cloud.google.com/armor-config: '{"license-api-policy": "license-api-security-policy"}'
# Backend configuration
cloud.google.com/backend-config: '{"default": "license-api-backend-config"}'
# Client body size limit
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
# Timeouts
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
labels:
app: license-api
spec:
# TLS configuration
tls:
- hosts:
- api.coditect.com
secretName: tls-certificate # Or use managed certificate
# Routing rules
rules:
- host: api.coditect.com
http:
paths:
# API v1 routes
- path: /api/v1/*
pathType: ImplementationSpecific
backend:
service:
name: license-api
port:
number: 8000
# Health check endpoint (for load balancer)
- path: /health/*
pathType: ImplementationSpecific
backend:
service:
name: license-api
port:
number: 8000
Managed Certificate
File: kubernetes/base/managed-certificate.yaml
# Google Managed SSL Certificate
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: license-api-cert
namespace: coditect
spec:
domains:
- api.coditect.com
Backend Configuration
File: kubernetes/base/backend-config.yaml
# Backend configuration for GCP Load Balancer
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: license-api-backend-config
namespace: coditect
spec:
# Health check configuration
healthCheck:
checkIntervalSec: 10
timeoutSec: 5
healthyThreshold: 2
unhealthyThreshold: 3
type: HTTP
requestPath: /health/ready
port: 8000
# Connection draining (graceful shutdown)
connectionDraining:
drainingTimeoutSec: 60
# Session affinity (optional)
sessionAffinity:
affinityType: "CLIENT_IP"
affinityCookieTtlSec: 10800 # 3 hours
# Custom request/response headers
customRequestHeaders:
headers:
- "X-Client-Region:{client_region}"
- "X-Client-City:{client_city}"
# Security
securityPolicy:
name: "license-api-security-policy"
# CDN (if needed for static assets)
cdn:
enabled: false
cachePolicy:
includeHost: true
includeProtocol: true
includeQueryString: false
Auto-Scaling Configuration
Horizontal Pod Autoscaler
File: kubernetes/base/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: license-api-hpa
namespace: coditect
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: license-api
# Replica configuration
minReplicas: 3 # Always maintain 3 for HA
maxReplicas: 10 # Scale up to 10 under load
# Scaling behavior
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scale down
policies:
- type: Percent
value: 50
periodSeconds: 60 # Max 50% scale down per minute
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100
periodSeconds: 15 # Max 100% scale up per 15 seconds
# Metrics to scale on
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when avg CPU > 70%
# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up when avg memory > 80%
# Custom metric: Requests per second (optional)
# - type: Pods
# pods:
# metric:
# name: http_requests_per_second
# target:
# type: AverageValue
# averageValue: "1000" # Scale up when RPS > 1000
Cluster Autoscaler
Configured in GKE node pool (see Node Pool Configuration above)
- Automatically adds/removes nodes based on pod resource requests
- Min nodes: 1 (dev), 3 (production)
- Max nodes: 10
- Scale-down delay: 10 minutes
Configuration Management
ConfigMap for Application Configuration
File: kubernetes/base/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: coditect
data:
# Django settings
DJANGO_SETTINGS_MODULE: "config.settings.production"
ALLOWED_HOSTS: "api.coditect.com,*.coditect.com"
# Database configuration
DB_ENGINE: "django.db.backends.postgresql"
DB_PORT: "5432"
DB_CONN_MAX_AGE: "600" # 10 minutes
DB_CONN_HEALTH_CHECKS: "true"
# Redis configuration
REDIS_HOST: "10.121.42.67"
REDIS_PORT: "6378"
REDIS_DB: "0"
REDIS_MAX_CONNECTIONS: "50"
# Celery configuration
CELERY_BROKER_URL: "redis://10.121.42.67:6378/1"
CELERY_RESULT_BACKEND: "redis://10.121.42.67:6378/2"
# Firebase/Identity Platform
FIREBASE_PROJECT_ID: "coditect-cloud-infra"
# Cloud KMS
KMS_PROJECT_ID: "coditect-cloud-infra"
KMS_LOCATION: "us-central1"
KMS_KEYRING: "license-signing"
KMS_KEY: "license-key"
# Application settings
LOG_LEVEL: "INFO"
DEBUG: "False"
CORS_ALLOWED_ORIGINS: "https://app.coditect.com"
# Feature flags
ENABLE_SWAGGER: "False"
ENABLE_METRICS: "True"
Secrets Management
Database Credentials Secret
File: kubernetes/secrets/db-credentials.yaml
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
namespace: coditect
type: Opaque
stringData:
# Database connection
DB_NAME: coditect_licenses
DB_USER: license_api_user
DB_PASSWORD: "REPLACE_WITH_SECRET_MANAGER_VALUE"
DB_HOST: "127.0.0.1" # Via Cloud SQL Proxy
DB_PORT: "5432"
# Cloud SQL instance connection
INSTANCE_CONNECTION_NAME: "coditect-cloud-infra:us-central1:coditect-postgres-dev"
Note: In production, use External Secrets Operator to sync from GCP Secret Manager:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: coditect
spec:
refreshInterval: 1h
secretStoreRef:
name: gcpsm-secret-store
kind: SecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: DB_PASSWORD
remoteRef:
key: db-password # Secret Manager secret name
Firebase Service Account Secret
File: kubernetes/secrets/firebase-service-account.yaml
apiVersion: v1
kind: Secret
metadata:
name: firebase-service-account
namespace: coditect
type: Opaque
stringData:
service-account.json: |
{
"type": "service_account",
"project_id": "coditect-cloud-infra",
"private_key_id": "REPLACE_WITH_ACTUAL_KEY_ID",
"private_key": "-----BEGIN PRIVATE KEY-----\nREPLACE_WITH_ACTUAL_KEY\n-----END PRIVATE KEY-----\n",
"client_email": "firebase-adminsdk-...@coditect-cloud-infra.iam.gserviceaccount.com",
"client_id": "REPLACE_WITH_ACTUAL_CLIENT_ID",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/firebase-adminsdk-...%40coditect-cloud-infra.iam.gserviceaccount.com"
}
Monitoring and Logging
Prometheus ServiceMonitor
File: kubernetes/monitoring/servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: license-api-metrics
namespace: coditect
labels:
app: license-api
spec:
selector:
matchLabels:
app: license-api
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s
Fluent Bit DaemonSet
File: kubernetes/logging/fluent-bit.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
app: fluent-bit
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.0
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
Production Deployment
Kustomization for Environment-Specific Configuration
File: kubernetes/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: coditect
# Base resources
resources:
- ../../base
# ConfigMap generator
configMapGenerator:
- name: app-config
behavior: merge
literals:
- LOG_LEVEL=INFO
- DEBUG=False
- ENVIRONMENT=production
# Secret generator (from files)
secretGenerator:
- name: db-credentials
files:
- db-password=secrets/db-password.txt
- instance-connection-name=secrets/instance-connection-name.txt
# Image tags
images:
- name: gcr.io/coditect-cloud-infra/license-api
newTag: v1.0.0
# Replica overrides
replicas:
- name: license-api
count: 3
# Resource patches
patchesStrategicMerge:
- deployment-patch.yaml
- hpa-patch.yaml
Deployment Patch for Production
File: kubernetes/overlays/production/deployment-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: license-api
spec:
replicas: 3 # Ensure HA
template:
spec:
containers:
- name: django
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
# Production-specific environment variables
env:
- name: DJANGO_SETTINGS_MODULE
value: "config.settings.production"
- name: GUNICORN_WORKERS
value: "8"
Summary
This C3-04 GKE Components specification provides:
✅ Complete GKE cluster configuration
- Regional multi-zone cluster for HA
- Private nodes with Workload Identity
- Auto-scaling node pools (1-10 nodes)
- Binary authorization for security
✅ Kubernetes resource specifications
- Django REST Framework Deployment (3 replicas)
- Cloud SQL Proxy sidecar
- ClusterIP Service for internal routing
- Ingress with Google Cloud Load Balancer
✅ Auto-scaling configuration
- HorizontalPodAutoscaler (3-10 replicas)
- CPU and memory-based scaling
- Cluster autoscaler for nodes
✅ Configuration management
- ConfigMap for application settings
- Secrets for sensitive data
- External Secrets Operator integration
✅ Monitoring and logging
- Prometheus ServiceMonitor
- Fluent Bit DaemonSet
- Cloud Logging integration
✅ Production deployment
- Kustomize overlays for environments
- Zero-downtime rolling updates
- Health checks and readiness probes
Implementation Status: Specification Complete Next Steps:
- Deploy GKE cluster (already complete ✅)
- Create Kubernetes manifests
- Deploy Django REST Framework application
- Configure Ingress and load balancer
- Set up monitoring and logging
- Test auto-scaling behavior
Current Infrastructure:
- GKE Cluster: ✅ Deployed
- Node Pool: ✅ 3 nodes (n1-standard-2)
- VPC Network: ✅ Configured
- Cloud NAT: ✅ Configured
Pending:
- Django application deployment (Phase 2)
- Ingress configuration (Phase 3)
- TLS certificate provisioning (Phase 3)
Total Lines: 900+ (complete production-ready Kubernetes configuration)
Author: CODITECT Infrastructure Team Date: November 30, 2025 Version: 1.0 Status: Ready for Implementation