Skip to main content

C3-04: GKE Components - Container Architecture

Document Type: C4 Level 3 (Component) Diagram Container: Google Kubernetes Engine (GKE) Technology: GKE 1.28+, Kubernetes, Django REST Framework, Gunicorn Status: Specification Complete - Ready for Implementation Last Updated: November 30, 2025


Table of Contents

  1. Overview
  2. Component Diagram
  3. GKE Cluster Architecture
  4. Kubernetes Resources
  5. Deployment Configuration
  6. Service Configuration
  7. Ingress and Load Balancing
  8. Auto-Scaling Configuration
  9. Configuration Management
  10. Secrets Management
  11. Monitoring and Logging
  12. Production Deployment

Overview

Purpose

This document specifies the component-level architecture of the Google Kubernetes Engine (GKE) cluster hosting the CODITECT License Management Platform. It provides:

  • Complete GKE cluster configuration (node pools, networking)
  • Kubernetes resource specifications (Deployments, Services, Ingress)
  • Django REST Framework pod architecture
  • Auto-scaling and high-availability patterns
  • Production-ready monitoring and logging integration

GKE Cluster Role

The GKE cluster serves as the container orchestration platform for:

  • Django REST Framework license API (primary workload)
  • Celery background workers (heartbeat cleanup, session management)
  • Redis client (connection pooling)
  • PostgreSQL client (connection pooling)
  • Monitoring and logging agents (Prometheus, Fluent Bit)

Key Features:

  • High Availability: Multi-zone deployment with automatic failover
  • Auto-Scaling: Horizontal pod autoscaling based on CPU/memory
  • Zero-Downtime Deployments: Rolling updates with health checks
  • Resource Efficiency: Preemptible nodes for cost optimization (dev)
  • Security: Private cluster with workload identity

Architecture Pattern

Internet

Cloud Load Balancer (HTTPS/TLS 1.3)

GKE Ingress Controller (nginx)

Kubernetes Service (ClusterIP)

Django REST Framework Pods (3 replicas)
├─► Cloud SQL Proxy (PostgreSQL)
├─► Redis Client (Memorystore)
├─► Cloud KMS Client (signing)
└─► Identity Platform (authentication)

Component Diagram

GKE Internal Components


GKE Cluster Architecture

Cluster Configuration

File: opentofu/modules/gke/main.tf

/**
* GKE Cluster Configuration
*
* Features:
* - Regional cluster (multi-zone HA)
* - Private cluster (no public IPs on nodes)
* - Workload Identity (secure GCP service access)
* - Binary Authorization (image security)
* - Auto-scaling enabled
*/

resource "google_container_cluster" "primary" {
name = "${var.environment}-gke-cluster"
location = var.region # Regional = multi-zone

# Remove default node pool (we'll create custom pools)
remove_default_node_pool = true
initial_node_count = 1

# Network configuration
network = var.vpc_network
subnetwork = var.gke_subnet

# Private cluster configuration
private_cluster_config {
enable_private_nodes = true # Nodes have private IPs only
enable_private_endpoint = false # API endpoint is public
master_ipv4_cidr_block = "172.16.0.0/28"
}

# IP allocation for pods and services
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}

# Master authorized networks (who can access API)
master_authorized_networks_config {
cidr_blocks {
cidr_block = "0.0.0.0/0"
display_name = "All networks (for development)"
# Production: Restrict to office IPs + CI/CD
}
}

# Workload Identity (secure service account binding)
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}

# Binary Authorization (only signed images)
binary_authorization {
evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE"
}

# Addons
addons_config {
http_load_balancing {
disabled = false # Enable Ingress
}
horizontal_pod_autoscaling {
disabled = false # Enable HPA
}
network_policy_config {
disabled = false # Enable NetworkPolicy
}
}

# Monitoring and logging
monitoring_config {
enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
}

logging_config {
enable_components = ["SYSTEM_COMPONENTS", "WORKLOADS"]
}

# Maintenance window
maintenance_policy {
daily_maintenance_window {
start_time = "03:00" # 3 AM UTC
}
}

# Resource labels
resource_labels = {
environment = var.environment
project = "coditect"
managed_by = "opentofu"
}
}

Node Pool Configuration

File: opentofu/modules/gke/node_pools.tf

/**
* Production Node Pool
*
* Configuration:
* - n1-standard-2 (2 vCPU, 7.5 GB RAM)
* - Preemptible for dev (cost savings)
* - Auto-scaling 1-10 nodes
* - Auto-repair and auto-upgrade enabled
*/

resource "google_container_node_pool" "primary_nodes" {
name = "${var.environment}-node-pool"
location = var.region
cluster = google_container_cluster.primary.name
node_count = var.min_node_count

# Auto-scaling configuration
autoscaling {
min_node_count = var.min_node_count # Default: 1
max_node_count = var.max_node_count # Default: 10
}

# Node configuration
node_config {
machine_type = var.node_machine_type # n1-standard-2

# Use preemptible nodes for dev (70% cost savings)
preemptible = var.environment == "dev" ? true : false
disk_size_gb = 50
disk_type = "pd-standard"

# OAuth scopes (permissions for GCP APIs)
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform",
]

# Workload Identity (bind Kubernetes SA to GCP SA)
workload_metadata_config {
mode = "GKE_METADATA"
}

# Metadata
metadata = {
disable-legacy-endpoints = "true"
}

# Labels
labels = {
environment = var.environment
node_pool = "primary"
}

# Taints (if needed for dedicated workloads)
# taint {
# key = "workload-type"
# value = "api"
# effect = "NO_SCHEDULE"
# }

# Security
shielded_instance_config {
enable_secure_boot = true
enable_integrity_monitoring = true
}
}

# Management configuration
management {
auto_repair = true
auto_upgrade = true
}

# Upgrade settings
upgrade_settings {
max_surge = 1
max_unavailable = 0
}
}

Kubernetes Resources

Namespace Configuration

File: kubernetes/base/namespace.yaml

apiVersion: v1
kind: Namespace
metadata:
name: coditect
labels:
name: coditect
environment: production
managed-by: opentofu

Django REST Framework Deployment

File: kubernetes/base/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: license-api
namespace: coditect
labels:
app: license-api
component: backend
version: v1
spec:
replicas: 3 # High availability

selector:
matchLabels:
app: license-api
component: backend

# Deployment strategy
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # Allow 1 extra pod during update
maxUnavailable: 0 # Zero-downtime deployments

template:
metadata:
labels:
app: license-api
component: backend
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8000"
prometheus.io/path: "/metrics"

spec:
# Service account with Workload Identity
serviceAccountName: license-api-sa

# Pod anti-affinity (spread across nodes)
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- license-api
topologyKey: kubernetes.io/hostname

# Init containers (run before main container)
initContainers:
- name: wait-for-db
image: busybox:1.35
command:
- sh
- -c
- |
until nc -z -v -w30 $DB_HOST $DB_PORT; do
echo "Waiting for database connection..."
sleep 5
done
env:
- name: DB_HOST
valueFrom:
secretKeyRef:
name: db-credentials
key: host
- name: DB_PORT
valueFrom:
secretKeyRef:
name: db-credentials
key: port

- name: run-migrations
image: gcr.io/coditect-cloud-infra/license-api:latest
command:
- python
- manage.py
- migrate
- --noinput
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-credentials

# Main containers
containers:
# Django REST Framework (Gunicorn)
- name: django
image: gcr.io/coditect-cloud-infra/license-api:latest
imagePullPolicy: Always

# Command (override Dockerfile CMD)
command:
- gunicorn
- config.wsgi:application
- --bind=0.0.0.0:8000
- --workers=4
- --threads=2
- --worker-class=gthread
- --worker-tmp-dir=/dev/shm
- --timeout=60
- --access-logfile=-
- --error-logfile=-
- --log-level=info

# Ports
ports:
- name: http
containerPort: 8000
protocol: TCP

# Environment variables
envFrom:
- configMapRef:
name: app-config
- secretRef:
name: db-credentials
env:
- name: DJANGO_SETTINGS_MODULE
value: "config.settings.production"
- name: FIREBASE_SERVICE_ACCOUNT_PATH
value: "/secrets/firebase-service-account.json"

# Resource limits
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"

# Health checks
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3

readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3

startupProbe:
httpGet:
path: /health/startup
port: http
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30

# Volume mounts
volumeMounts:
- name: firebase-credentials
mountPath: /secrets
readOnly: true
- name: tmp
mountPath: /tmp
- name: shm
mountPath: /dev/shm

# Cloud SQL Proxy (sidecar)
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.33.2
command:
- /cloud_sql_proxy
- -instances=$(INSTANCE_CONNECTION_NAME)=tcp:5432
- -credential_file=/secrets/service-account.json
env:
- name: INSTANCE_CONNECTION_NAME
valueFrom:
secretKeyRef:
name: db-credentials
key: instance_connection_name
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
volumeMounts:
- name: cloudsql-credentials
mountPath: /secrets
readOnly: true

# Volumes
volumes:
- name: firebase-credentials
secret:
secretName: firebase-service-account
items:
- key: service-account.json
path: firebase-service-account.json
- name: cloudsql-credentials
secret:
secretName: cloudsql-service-account
items:
- key: service-account.json
path: service-account.json
- name: tmp
emptyDir: {}
- name: shm
emptyDir:
medium: Memory
sizeLimit: 256Mi

# Security context
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000

Service Configuration

ClusterIP Service for Django API

File: kubernetes/base/service.yaml

apiVersion: v1
kind: Service
metadata:
name: license-api
namespace: coditect
labels:
app: license-api
component: backend
spec:
type: ClusterIP # Internal only (Ingress routes to this)

selector:
app: license-api
component: backend

ports:
- name: http
port: 8000
targetPort: http
protocol: TCP

# Session affinity (optional - for sticky sessions)
# sessionAffinity: ClientIP
# sessionAffinityConfig:
# clientIP:
# timeoutSeconds: 10800 # 3 hours

Headless Service for StatefulSet (if needed)

File: kubernetes/base/service-headless.yaml

# Headless service for StatefulSet workloads (e.g., Celery workers)
apiVersion: v1
kind: Service
metadata:
name: celery-workers
namespace: coditect
labels:
app: celery-workers
component: background
spec:
clusterIP: None # Headless (no load balancing)

selector:
app: celery-workers
component: background

ports:
- name: flower
port: 5555
targetPort: 5555

Ingress and Load Balancing

Ingress Configuration

File: kubernetes/base/ingress.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: license-api-ingress
namespace: coditect
annotations:
# Use Google Cloud Load Balancer
kubernetes.io/ingress.class: "gce"

# Enable HTTPS redirect
kubernetes.io/ingress.allow-http: "false"

# Managed certificate (GCP)
networking.gke.io/managed-certificates: "license-api-cert"

# Cloud Armor security policy
cloud.google.com/armor-config: '{"license-api-policy": "license-api-security-policy"}'

# Backend configuration
cloud.google.com/backend-config: '{"default": "license-api-backend-config"}'

# Client body size limit
nginx.ingress.kubernetes.io/proxy-body-size: "10m"

# Timeouts
nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
nginx.ingress.kubernetes.io/proxy-read-timeout: "60"

labels:
app: license-api
spec:
# TLS configuration
tls:
- hosts:
- api.coditect.com
secretName: tls-certificate # Or use managed certificate

# Routing rules
rules:
- host: api.coditect.com
http:
paths:
# API v1 routes
- path: /api/v1/*
pathType: ImplementationSpecific
backend:
service:
name: license-api
port:
number: 8000

# Health check endpoint (for load balancer)
- path: /health/*
pathType: ImplementationSpecific
backend:
service:
name: license-api
port:
number: 8000

Managed Certificate

File: kubernetes/base/managed-certificate.yaml

# Google Managed SSL Certificate
apiVersion: networking.gke.io/v1
kind: ManagedCertificate
metadata:
name: license-api-cert
namespace: coditect
spec:
domains:
- api.coditect.com

Backend Configuration

File: kubernetes/base/backend-config.yaml

# Backend configuration for GCP Load Balancer
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: license-api-backend-config
namespace: coditect
spec:
# Health check configuration
healthCheck:
checkIntervalSec: 10
timeoutSec: 5
healthyThreshold: 2
unhealthyThreshold: 3
type: HTTP
requestPath: /health/ready
port: 8000

# Connection draining (graceful shutdown)
connectionDraining:
drainingTimeoutSec: 60

# Session affinity (optional)
sessionAffinity:
affinityType: "CLIENT_IP"
affinityCookieTtlSec: 10800 # 3 hours

# Custom request/response headers
customRequestHeaders:
headers:
- "X-Client-Region:{client_region}"
- "X-Client-City:{client_city}"

# Security
securityPolicy:
name: "license-api-security-policy"

# CDN (if needed for static assets)
cdn:
enabled: false
cachePolicy:
includeHost: true
includeProtocol: true
includeQueryString: false

Auto-Scaling Configuration

Horizontal Pod Autoscaler

File: kubernetes/base/hpa.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: license-api-hpa
namespace: coditect
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: license-api

# Replica configuration
minReplicas: 3 # Always maintain 3 for HA
maxReplicas: 10 # Scale up to 10 under load

# Scaling behavior
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scale down
policies:
- type: Percent
value: 50
periodSeconds: 60 # Max 50% scale down per minute
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100
periodSeconds: 15 # Max 100% scale up per 15 seconds

# Metrics to scale on
metrics:
# CPU utilization
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale up when avg CPU > 70%

# Memory utilization
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale up when avg memory > 80%

# Custom metric: Requests per second (optional)
# - type: Pods
# pods:
# metric:
# name: http_requests_per_second
# target:
# type: AverageValue
# averageValue: "1000" # Scale up when RPS > 1000

Cluster Autoscaler

Configured in GKE node pool (see Node Pool Configuration above)

  • Automatically adds/removes nodes based on pod resource requests
  • Min nodes: 1 (dev), 3 (production)
  • Max nodes: 10
  • Scale-down delay: 10 minutes

Configuration Management

ConfigMap for Application Configuration

File: kubernetes/base/configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: coditect
data:
# Django settings
DJANGO_SETTINGS_MODULE: "config.settings.production"
ALLOWED_HOSTS: "api.coditect.com,*.coditect.com"

# Database configuration
DB_ENGINE: "django.db.backends.postgresql"
DB_PORT: "5432"
DB_CONN_MAX_AGE: "600" # 10 minutes
DB_CONN_HEALTH_CHECKS: "true"

# Redis configuration
REDIS_HOST: "10.121.42.67"
REDIS_PORT: "6378"
REDIS_DB: "0"
REDIS_MAX_CONNECTIONS: "50"

# Celery configuration
CELERY_BROKER_URL: "redis://10.121.42.67:6378/1"
CELERY_RESULT_BACKEND: "redis://10.121.42.67:6378/2"

# Firebase/Identity Platform
FIREBASE_PROJECT_ID: "coditect-cloud-infra"

# Cloud KMS
KMS_PROJECT_ID: "coditect-cloud-infra"
KMS_LOCATION: "us-central1"
KMS_KEYRING: "license-signing"
KMS_KEY: "license-key"

# Application settings
LOG_LEVEL: "INFO"
DEBUG: "False"
CORS_ALLOWED_ORIGINS: "https://app.coditect.com"

# Feature flags
ENABLE_SWAGGER: "False"
ENABLE_METRICS: "True"

Secrets Management

Database Credentials Secret

File: kubernetes/secrets/db-credentials.yaml

apiVersion: v1
kind: Secret
metadata:
name: db-credentials
namespace: coditect
type: Opaque
stringData:
# Database connection
DB_NAME: coditect_licenses
DB_USER: license_api_user
DB_PASSWORD: "REPLACE_WITH_SECRET_MANAGER_VALUE"
DB_HOST: "127.0.0.1" # Via Cloud SQL Proxy
DB_PORT: "5432"

# Cloud SQL instance connection
INSTANCE_CONNECTION_NAME: "coditect-cloud-infra:us-central1:coditect-postgres-dev"

Note: In production, use External Secrets Operator to sync from GCP Secret Manager:

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: coditect
spec:
refreshInterval: 1h
secretStoreRef:
name: gcpsm-secret-store
kind: SecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: DB_PASSWORD
remoteRef:
key: db-password # Secret Manager secret name

Firebase Service Account Secret

File: kubernetes/secrets/firebase-service-account.yaml

apiVersion: v1
kind: Secret
metadata:
name: firebase-service-account
namespace: coditect
type: Opaque
stringData:
service-account.json: |
{
"type": "service_account",
"project_id": "coditect-cloud-infra",
"private_key_id": "REPLACE_WITH_ACTUAL_KEY_ID",
"private_key": "-----BEGIN PRIVATE KEY-----\nREPLACE_WITH_ACTUAL_KEY\n-----END PRIVATE KEY-----\n",
"client_email": "firebase-adminsdk-...@coditect-cloud-infra.iam.gserviceaccount.com",
"client_id": "REPLACE_WITH_ACTUAL_CLIENT_ID",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/firebase-adminsdk-...%40coditect-cloud-infra.iam.gserviceaccount.com"
}

Monitoring and Logging

Prometheus ServiceMonitor

File: kubernetes/monitoring/servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: license-api-metrics
namespace: coditect
labels:
app: license-api
spec:
selector:
matchLabels:
app: license-api

endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s

Fluent Bit DaemonSet

File: kubernetes/logging/fluent-bit.yaml

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: kube-system
labels:
app: fluent-bit
spec:
selector:
matchLabels:
app: fluent-bit
template:
metadata:
labels:
app: fluent-bit
spec:
serviceAccountName: fluent-bit
containers:
- name: fluent-bit
image: fluent/fluent-bit:2.0
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config

Production Deployment

Kustomization for Environment-Specific Configuration

File: kubernetes/overlays/production/kustomization.yaml

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: coditect

# Base resources
resources:
- ../../base

# ConfigMap generator
configMapGenerator:
- name: app-config
behavior: merge
literals:
- LOG_LEVEL=INFO
- DEBUG=False
- ENVIRONMENT=production

# Secret generator (from files)
secretGenerator:
- name: db-credentials
files:
- db-password=secrets/db-password.txt
- instance-connection-name=secrets/instance-connection-name.txt

# Image tags
images:
- name: gcr.io/coditect-cloud-infra/license-api
newTag: v1.0.0

# Replica overrides
replicas:
- name: license-api
count: 3

# Resource patches
patchesStrategicMerge:
- deployment-patch.yaml
- hpa-patch.yaml

Deployment Patch for Production

File: kubernetes/overlays/production/deployment-patch.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: license-api
spec:
replicas: 3 # Ensure HA
template:
spec:
containers:
- name: django
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
# Production-specific environment variables
env:
- name: DJANGO_SETTINGS_MODULE
value: "config.settings.production"
- name: GUNICORN_WORKERS
value: "8"

Summary

This C3-04 GKE Components specification provides:

Complete GKE cluster configuration

  • Regional multi-zone cluster for HA
  • Private nodes with Workload Identity
  • Auto-scaling node pools (1-10 nodes)
  • Binary authorization for security

Kubernetes resource specifications

  • Django REST Framework Deployment (3 replicas)
  • Cloud SQL Proxy sidecar
  • ClusterIP Service for internal routing
  • Ingress with Google Cloud Load Balancer

Auto-scaling configuration

  • HorizontalPodAutoscaler (3-10 replicas)
  • CPU and memory-based scaling
  • Cluster autoscaler for nodes

Configuration management

  • ConfigMap for application settings
  • Secrets for sensitive data
  • External Secrets Operator integration

Monitoring and logging

  • Prometheus ServiceMonitor
  • Fluent Bit DaemonSet
  • Cloud Logging integration

Production deployment

  • Kustomize overlays for environments
  • Zero-downtime rolling updates
  • Health checks and readiness probes

Implementation Status: Specification Complete Next Steps:

  1. Deploy GKE cluster (already complete ✅)
  2. Create Kubernetes manifests
  3. Deploy Django REST Framework application
  4. Configure Ingress and load balancer
  5. Set up monitoring and logging
  6. Test auto-scaling behavior

Current Infrastructure:

  • GKE Cluster: ✅ Deployed
  • Node Pool: ✅ 3 nodes (n1-standard-2)
  • VPC Network: ✅ Configured
  • Cloud NAT: ✅ Configured

Pending:

  • Django application deployment (Phase 2)
  • Ingress configuration (Phase 3)
  • TLS certificate provisioning (Phase 3)

Total Lines: 900+ (complete production-ready Kubernetes configuration)


Author: CODITECT Infrastructure Team Date: November 30, 2025 Version: 1.0 Status: Ready for Implementation