C3: GKE Component Diagram - Kubernetes Cluster Internals
Level: Component (C4 Model Level 3) Scope: Google Kubernetes Engine Cluster Internal Architecture Primary Audience: Platform Engineers, Kubernetes Administrators, Senior Developers Last Updated: November 23, 2025
Overview
This diagram shows the internal components of the GKE cluster, including the Kubernetes control plane, node pools, networking, and application deployments.
Key Components:
- Kubernetes control plane (managed by Google)
- Multi-zone node pools with auto-scaling
- NGINX Ingress Controller for traffic routing
- License API deployment with HPA (Horizontal Pod Autoscaler)
- CoreDNS for service discovery
- Workload Identity for secure GCP access
GKE Component Diagram
Component Details
1. Kubernetes Control Plane (Managed by Google)
Components:
- kube-apiserver: REST API endpoint for all cluster operations
- kube-scheduler: Assigns pods to nodes based on resource requirements
- kube-controller-manager: Runs controllers (Deployment, ReplicaSet, Service, etc.)
- etcd: Distributed key-value store for cluster state
Management:
- Fully managed by Google (no direct access)
- Multi-zone HA (3 replicas across zones)
- Automatic version upgrades (release channel: REGULAR)
- 99.95% SLA for regional clusters
Configuration:
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA_CERT>
server: https://34.72.XX.XX # Master endpoint
name: coditect-dev
2. Node Pool Configuration
Specifications:
apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
name: coditect-dev-node-pool
spec:
autoscaling:
minNodeCount: 3
maxNodeCount: 10
nodeConfig:
machineType: n1-standard-2
diskSizeGb: 100
diskType: pd-ssd
preemptible: true # Cost savings for dev
serviceAccount: license-api@coditect.iam.gserviceaccount.com
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
labels:
environment: dev
component: compute
tags:
- gke-node
- allow-health-check
management:
autoRepair: true
autoUpgrade: true
upgradeSettings:
maxSurge: 1
maxUnavailable: 0
Node Resources:
- CPU: 2 vCPU per node
- Memory: 7.5GB RAM per node
- Disk: 100GB SSD (OS + container images)
- Network: 10 Gbps (egress capped at 2 Gbps per node)
Node Zones (Multi-Zone HA):
us-central1-a(Zone 1)us-central1-b(Zone 2)us-central1-c(Zone 3)
3. License API Deployment
Deployment Manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: license-api
namespace: default
labels:
app: license-api
version: v1.0.0
spec:
replicas: 3
selector:
matchLabels:
app: license-api
template:
metadata:
labels:
app: license-api
version: v1.0.0
spec:
serviceAccountName: license-api-sa
containers:
- name: license-api
image: gcr.io/coditect-citus-prod/license-api:latest
ports:
- containerPort: 8000
name: http
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: connection-string
- name: REDIS_URL
value: "redis://10.121.42.67:6378"
- name: ENVIRONMENT
value: "production"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 2
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: config
configMap:
name: license-api-config
Pod Lifecycle:
- Scheduler assigns pod to node with sufficient resources
- kubelet pulls container image from GCR
- kubelet starts container with environment variables
- Readiness probe checks /ready endpoint
- Pod added to service endpoints (receives traffic)
- Liveness probe monitors /health endpoint
- On failure: kubelet restarts container (3 failures → pod restart)
4. NGINX Ingress Controller
Purpose: HTTP(S) load balancing and routing for Kubernetes services
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
replicas: 2 # HA configuration
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
template:
spec:
containers:
- name: controller
image: registry.k8s.io/ingress-nginx/controller:v1.9.4
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
resources:
requests:
cpu: 100m
memory: 128Mi
Ingress Resource:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: license-api-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/rate-limit: "100"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.coditect.ai
secretName: api-tls-cert
rules:
- host: api.coditect.ai
http:
paths:
- path: /api/v1
pathType: Prefix
backend:
service:
name: license-api
port:
number: 8000
Traffic Flow:
Google Cloud LB (SSL termination)
↓
NGINX Ingress Controller (path routing)
↓
license-api Service (ClusterIP)
↓
License API Pods (round-robin)
5. Horizontal Pod Autoscaler (HPA)
Configuration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: license-api-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: license-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60
Scaling Logic:
- Scale Up: Add 50% more pods every 60 seconds if CPU > 70%
- Scale Down: Remove 25% of pods every 60 seconds if CPU < 70%
- Stabilization: Wait 5 minutes before scaling down (prevent flapping)
Example Scenario:
Current: 3 pods @ 80% CPU
HPA calculates: 3 * (80% / 70%) = 3.43 → 4 pods
After 60 seconds: 4 pods deployed
CPU drops to 60%
HPA waits 5 minutes (stabilization)
CPU still at 60%
HPA scales down: 4 * 0.75 = 3 pods
6. Cluster Autoscaler
Configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler
namespace: kube-system
data:
config.yaml: |
scaleDownEnabled: true
scaleDownDelayAfterAdd: 10m
scaleDownUnneededTime: 10m
scaleDownUtilizationThreshold: 0.5
maxNodeProvisionTime: 15m
maxGracefulTerminationSec: 600
Scaling Triggers:
- Scale Up: Pods in Pending state due to insufficient resources
- Scale Down: Node utilization < 50% for 10 minutes
Example Scenario:
Current: 3 nodes (6 vCPU total)
HPA scales up: 10 pods × 500m CPU = 5 vCPU needed
Available: 6 vCPU (after overhead ~5.5 vCPU)
Pods fit: All 10 pods scheduled
Later: Traffic drops, HPA scales down to 3 pods
Node utilization: ~30% (below 50% threshold)
After 10 minutes: Cluster Autoscaler removes 1 node
New state: 2 nodes (4 vCPU total)
7. CoreDNS (Service Discovery)
Purpose: DNS-based service discovery for Kubernetes
Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
spec:
replicas: 2
selector:
matchLabels:
k8s-app: kube-dns
template:
spec:
containers:
- name: coredns
image: registry.k8s.io/coredns/coredns:v1.10.1
args:
- -conf
- /etc/coredns/Corefile
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
requests:
cpu: 100m
memory: 128Mi
Corefile Configuration:
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
DNS Resolution Example:
# Inside License API pod
import aiohttp
# Resolve service name via CoreDNS
async with aiohttp.ClientSession() as session:
# license-api.default.svc.cluster.local → ClusterIP
async with session.get("http://license-api:8000/health") as response:
return await response.json()
DNS Records:
# Service DNS
license-api.default.svc.cluster.local → 10.2.0.50 (ClusterIP)
# Pod DNS
10-1-0-123.default.pod.cluster.local → 10.1.0.123 (Pod IP)
# Headless service (StatefulSet)
prometheus-0.prometheus.default.svc.cluster.local → 10.1.0.200
8. Workload Identity (GCP Integration)
Purpose: Secure GCP API access without long-lived service account keys
Configuration:
# Kubernetes Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: license-api-sa
namespace: default
annotations:
iam.gke.io/gcp-service-account: license-api@coditect-citus-prod.iam.gserviceaccount.com
---
# IAM Policy Binding (gcloud command)
gcloud iam service-accounts add-iam-policy-binding \
license-api@coditect-citus-prod.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:coditect-citus-prod.svc.id.goog[default/license-api-sa]"
GCP Permissions:
# Google Service Account: license-api@coditect-citus-prod.iam
Roles:
- roles/secretmanager.secretAccessor # Read secrets
- roles/cloudkms.signerVerifier # Sign licenses
- roles/cloudsql.client # Connect to Cloud SQL
- roles/monitoring.metricWriter # Write metrics
- roles/logging.logWriter # Write logs
Usage in Pod:
# Automatic authentication via Workload Identity
from google.cloud import secretmanager
# No explicit credentials needed - Workload Identity handles it
client = secretmanager.SecretManagerServiceAsyncClient()
secret_name = f"projects/{PROJECT_ID}/secrets/db-password/versions/latest"
response = await client.access_secret_version(request={"name": secret_name})
password = response.payload.data.decode("UTF-8")
9. Networking (CNI Plugin)
Technology: GKE VPC-native networking (kubenet replacement)
Configuration:
# IP allocation
Node CIDR: 10.0.0.0/16 (65,536 IPs)
Pod CIDR: 10.1.0.0/16 (65,536 IPs - alias IPs)
Service CIDR: 10.2.0.0/16 (65,536 IPs - virtual IPs)
# IP assignment
Each node reserves /24 subnet for pods (256 IPs per node)
Node 1: 10.1.0.0/24
Node 2: 10.1.1.0/24
Node 3: 10.1.2.0/24
# Service IP allocation
ClusterIP services get IPs from 10.2.0.0/16
license-api Service: 10.2.0.50
kube-dns Service: 10.2.0.10
prometheus Service: 10.2.0.100
Network Policies:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: license-api-network-policy
namespace: default
spec:
podSelector:
matchLabels:
app: license-api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: ingress-nginx
ports:
- protocol: TCP
port: 8000
egress:
- to:
- podSelector:
matchLabels:
app: coredns
ports:
- protocol: UDP
port: 53
- to:
- namespaceSelector: {} # Allow egress to Cloud SQL/Redis
10. Monitoring (Prometheus)
Purpose: Metrics collection and alerting
Deployment:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: monitoring
spec:
serviceName: prometheus
replicas: 1
selector:
matchLabels:
app: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.48.0
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
ports:
- containerPort: 9090
name: http
resources:
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: storage
mountPath: /prometheus
volumes:
- name: config
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: pd-ssd
resources:
requests:
storage: 100Gi
Scrape Configuration:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- job_name: 'license-api'
static_configs:
- targets: ['license-api:8000']
metrics_path: '/metrics'
Key Metrics:
# Request rate
rate(http_requests_total{job="license-api"}[5m])
# Request latency (p95)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# Error rate
rate(http_requests_total{job="license-api",status=~"5.."}[5m])
# Pod CPU usage
rate(container_cpu_usage_seconds_total{pod=~"license-api.*"}[5m])
# Pod memory usage
container_memory_working_set_bytes{pod=~"license-api.*"}
Deployment Workflow
1. Initial Deployment
# Build and push Docker image
docker build -t gcr.io/coditect-citus-prod/license-api:v1.0.0 .
docker push gcr.io/coditect-citus-prod/license-api:v1.0.0
# Apply Kubernetes manifests
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/configmap.yaml
kubectl apply -f kubernetes/secret.yaml
kubectl apply -f kubernetes/deployment.yaml
kubectl apply -f kubernetes/service.yaml
kubectl apply -f kubernetes/hpa.yaml
kubectl apply -f kubernetes/ingress.yaml
# Verify deployment
kubectl rollout status deployment/license-api
kubectl get pods -l app=license-api
kubectl get hpa license-api-hpa
kubectl get ingress license-api-ingress
2. Rolling Update
# Update image tag in deployment
kubectl set image deployment/license-api \
license-api=gcr.io/coditect-citus-prod/license-api:v1.1.0
# Monitor rollout
kubectl rollout status deployment/license-api
# Verify new pods
kubectl get pods -l app=license-api -o wide
# Rollback if needed
kubectl rollout undo deployment/license-api
3. Scaling Operations
# Manual scaling
kubectl scale deployment license-api --replicas=5
# HPA status
kubectl get hpa license-api-hpa
# Cluster autoscaler status
kubectl get nodes
kubectl describe node <node-name>
Troubleshooting
Pod Not Starting
# Check pod status
kubectl get pods -l app=license-api
kubectl describe pod <pod-name>
# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous # Previous container
# Check events
kubectl get events --sort-by='.lastTimestamp'
# Common issues:
# - ImagePullBackOff: Check GCR permissions
# - CrashLoopBackOff: Check application logs
# - Pending: Insufficient resources (check HPA/cluster autoscaler)
Service Not Reachable
# Check service endpoints
kubectl get endpoints license-api
# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://license-api:8000/health
# Check ingress
kubectl describe ingress license-api-ingress
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller
High Latency
# Check HPA
kubectl get hpa license-api-hpa
# Check resource usage
kubectl top pods -l app=license-api
kubectl top nodes
# Check Prometheus metrics
kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Open http://localhost:9090 and query metrics
Security Best Practices
1. Pod Security Standards
apiVersion: v1
kind: Pod
metadata:
name: license-api
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: license-api
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
2. Network Policies
# Deny all ingress by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress
3. Resource Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: namespace-quota
namespace: default
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
Related Diagrams
- C1: System Context - External system view
- C2: Container Diagram - High-level containers
- C3: Networking Components - VPC and networking
- C3: Security Components - Security architecture
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2025-11-23 | SDD Architect | Initial GKE component diagram |
Document Classification: Internal - Architecture Documentation Review Cycle: Quarterly Next Review Date: 2026-02-23