Skip to main content

C3: GKE Component Diagram - Kubernetes Cluster Internals

Level: Component (C4 Model Level 3) Scope: Google Kubernetes Engine Cluster Internal Architecture Primary Audience: Platform Engineers, Kubernetes Administrators, Senior Developers Last Updated: November 23, 2025


Overview

This diagram shows the internal components of the GKE cluster, including the Kubernetes control plane, node pools, networking, and application deployments.

Key Components:

  • Kubernetes control plane (managed by Google)
  • Multi-zone node pools with auto-scaling
  • NGINX Ingress Controller for traffic routing
  • License API deployment with HPA (Horizontal Pod Autoscaler)
  • CoreDNS for service discovery
  • Workload Identity for secure GCP access

GKE Component Diagram


Component Details

1. Kubernetes Control Plane (Managed by Google)

Components:

  • kube-apiserver: REST API endpoint for all cluster operations
  • kube-scheduler: Assigns pods to nodes based on resource requirements
  • kube-controller-manager: Runs controllers (Deployment, ReplicaSet, Service, etc.)
  • etcd: Distributed key-value store for cluster state

Management:

  • Fully managed by Google (no direct access)
  • Multi-zone HA (3 replicas across zones)
  • Automatic version upgrades (release channel: REGULAR)
  • 99.95% SLA for regional clusters

Configuration:

apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority-data: <CA_CERT>
server: https://34.72.XX.XX # Master endpoint
name: coditect-dev

2. Node Pool Configuration

Specifications:

apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
name: coditect-dev-node-pool
spec:
autoscaling:
minNodeCount: 3
maxNodeCount: 10
nodeConfig:
machineType: n1-standard-2
diskSizeGb: 100
diskType: pd-ssd
preemptible: true # Cost savings for dev
serviceAccount: license-api@coditect.iam.gserviceaccount.com
oauthScopes:
- https://www.googleapis.com/auth/cloud-platform
labels:
environment: dev
component: compute
tags:
- gke-node
- allow-health-check
management:
autoRepair: true
autoUpgrade: true
upgradeSettings:
maxSurge: 1
maxUnavailable: 0

Node Resources:

  • CPU: 2 vCPU per node
  • Memory: 7.5GB RAM per node
  • Disk: 100GB SSD (OS + container images)
  • Network: 10 Gbps (egress capped at 2 Gbps per node)

Node Zones (Multi-Zone HA):

  • us-central1-a (Zone 1)
  • us-central1-b (Zone 2)
  • us-central1-c (Zone 3)

3. License API Deployment

Deployment Manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
name: license-api
namespace: default
labels:
app: license-api
version: v1.0.0
spec:
replicas: 3
selector:
matchLabels:
app: license-api
template:
metadata:
labels:
app: license-api
version: v1.0.0
spec:
serviceAccountName: license-api-sa
containers:
- name: license-api
image: gcr.io/coditect-citus-prod/license-api:latest
ports:
- containerPort: 8000
name: http
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: connection-string
- name: REDIS_URL
value: "redis://10.121.42.67:6378"
- name: ENVIRONMENT
value: "production"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 4Gi
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 30
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 2
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: config
configMap:
name: license-api-config

Pod Lifecycle:

  1. Scheduler assigns pod to node with sufficient resources
  2. kubelet pulls container image from GCR
  3. kubelet starts container with environment variables
  4. Readiness probe checks /ready endpoint
  5. Pod added to service endpoints (receives traffic)
  6. Liveness probe monitors /health endpoint
  7. On failure: kubelet restarts container (3 failures → pod restart)

4. NGINX Ingress Controller

Purpose: HTTP(S) load balancing and routing for Kubernetes services

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: ingress-nginx-controller
namespace: ingress-nginx
spec:
replicas: 2 # HA configuration
selector:
matchLabels:
app.kubernetes.io/name: ingress-nginx
template:
spec:
containers:
- name: controller
image: registry.k8s.io/ingress-nginx/controller:v1.9.4
args:
- /nginx-ingress-controller
- --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
- --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
- --udp-services-configmap=$(POD_NAMESPACE)/udp-services
- --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
resources:
requests:
cpu: 100m
memory: 128Mi

Ingress Resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: license-api-ingress
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/rate-limit: "100"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.coditect.ai
secretName: api-tls-cert
rules:
- host: api.coditect.ai
http:
paths:
- path: /api/v1
pathType: Prefix
backend:
service:
name: license-api
port:
number: 8000

Traffic Flow:

Google Cloud LB (SSL termination)

NGINX Ingress Controller (path routing)

license-api Service (ClusterIP)

License API Pods (round-robin)

5. Horizontal Pod Autoscaler (HPA)

Configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: license-api-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: license-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 60

Scaling Logic:

  • Scale Up: Add 50% more pods every 60 seconds if CPU > 70%
  • Scale Down: Remove 25% of pods every 60 seconds if CPU < 70%
  • Stabilization: Wait 5 minutes before scaling down (prevent flapping)

Example Scenario:

Current: 3 pods @ 80% CPU
HPA calculates: 3 * (80% / 70%) = 3.43 → 4 pods
After 60 seconds: 4 pods deployed
CPU drops to 60%
HPA waits 5 minutes (stabilization)
CPU still at 60%
HPA scales down: 4 * 0.75 = 3 pods

6. Cluster Autoscaler

Configuration:

apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler
namespace: kube-system
data:
config.yaml: |
scaleDownEnabled: true
scaleDownDelayAfterAdd: 10m
scaleDownUnneededTime: 10m
scaleDownUtilizationThreshold: 0.5
maxNodeProvisionTime: 15m
maxGracefulTerminationSec: 600

Scaling Triggers:

  • Scale Up: Pods in Pending state due to insufficient resources
  • Scale Down: Node utilization < 50% for 10 minutes

Example Scenario:

Current: 3 nodes (6 vCPU total)
HPA scales up: 10 pods × 500m CPU = 5 vCPU needed
Available: 6 vCPU (after overhead ~5.5 vCPU)
Pods fit: All 10 pods scheduled
Later: Traffic drops, HPA scales down to 3 pods
Node utilization: ~30% (below 50% threshold)
After 10 minutes: Cluster Autoscaler removes 1 node
New state: 2 nodes (4 vCPU total)

7. CoreDNS (Service Discovery)

Purpose: DNS-based service discovery for Kubernetes

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
name: coredns
namespace: kube-system
spec:
replicas: 2
selector:
matchLabels:
k8s-app: kube-dns
template:
spec:
containers:
- name: coredns
image: registry.k8s.io/coredns/coredns:v1.10.1
args:
- -conf
- /etc/coredns/Corefile
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
resources:
requests:
cpu: 100m
memory: 128Mi

Corefile Configuration:

.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}

DNS Resolution Example:

# Inside License API pod
import aiohttp

# Resolve service name via CoreDNS
async with aiohttp.ClientSession() as session:
# license-api.default.svc.cluster.local → ClusterIP
async with session.get("http://license-api:8000/health") as response:
return await response.json()

DNS Records:

# Service DNS
license-api.default.svc.cluster.local → 10.2.0.50 (ClusterIP)

# Pod DNS
10-1-0-123.default.pod.cluster.local → 10.1.0.123 (Pod IP)

# Headless service (StatefulSet)
prometheus-0.prometheus.default.svc.cluster.local → 10.1.0.200

8. Workload Identity (GCP Integration)

Purpose: Secure GCP API access without long-lived service account keys

Configuration:

# Kubernetes Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
name: license-api-sa
namespace: default
annotations:
iam.gke.io/gcp-service-account: license-api@coditect-citus-prod.iam.gserviceaccount.com

---
# IAM Policy Binding (gcloud command)
gcloud iam service-accounts add-iam-policy-binding \
license-api@coditect-citus-prod.iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:coditect-citus-prod.svc.id.goog[default/license-api-sa]"

GCP Permissions:

# Google Service Account: license-api@coditect-citus-prod.iam
Roles:
- roles/secretmanager.secretAccessor # Read secrets
- roles/cloudkms.signerVerifier # Sign licenses
- roles/cloudsql.client # Connect to Cloud SQL
- roles/monitoring.metricWriter # Write metrics
- roles/logging.logWriter # Write logs

Usage in Pod:

# Automatic authentication via Workload Identity
from google.cloud import secretmanager

# No explicit credentials needed - Workload Identity handles it
client = secretmanager.SecretManagerServiceAsyncClient()
secret_name = f"projects/{PROJECT_ID}/secrets/db-password/versions/latest"
response = await client.access_secret_version(request={"name": secret_name})
password = response.payload.data.decode("UTF-8")

9. Networking (CNI Plugin)

Technology: GKE VPC-native networking (kubenet replacement)

Configuration:

# IP allocation
Node CIDR: 10.0.0.0/16 (65,536 IPs)
Pod CIDR: 10.1.0.0/16 (65,536 IPs - alias IPs)
Service CIDR: 10.2.0.0/16 (65,536 IPs - virtual IPs)

# IP assignment
Each node reserves /24 subnet for pods (256 IPs per node)
Node 1: 10.1.0.0/24
Node 2: 10.1.1.0/24
Node 3: 10.1.2.0/24

# Service IP allocation
ClusterIP services get IPs from 10.2.0.0/16
license-api Service: 10.2.0.50
kube-dns Service: 10.2.0.10
prometheus Service: 10.2.0.100

Network Policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: license-api-network-policy
namespace: default
spec:
podSelector:
matchLabels:
app: license-api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: ingress-nginx
ports:
- protocol: TCP
port: 8000
egress:
- to:
- podSelector:
matchLabels:
app: coredns
ports:
- protocol: UDP
port: 53
- to:
- namespaceSelector: {} # Allow egress to Cloud SQL/Redis

10. Monitoring (Prometheus)

Purpose: Metrics collection and alerting

Deployment:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: prometheus
namespace: monitoring
spec:
serviceName: prometheus
replicas: 1
selector:
matchLabels:
app: prometheus
template:
spec:
containers:
- name: prometheus
image: prom/prometheus:v2.48.0
args:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--storage.tsdb.retention.time=15d'
ports:
- containerPort: 9090
name: http
resources:
requests:
cpu: 500m
memory: 2Gi
volumeMounts:
- name: config
mountPath: /etc/prometheus
- name: storage
mountPath: /prometheus
volumes:
- name: config
configMap:
name: prometheus-config
volumeClaimTemplates:
- metadata:
name: storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: pd-ssd
resources:
requests:
storage: 100Gi

Scrape Configuration:

scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__

- job_name: 'license-api'
static_configs:
- targets: ['license-api:8000']
metrics_path: '/metrics'

Key Metrics:

# Request rate
rate(http_requests_total{job="license-api"}[5m])

# Request latency (p95)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate
rate(http_requests_total{job="license-api",status=~"5.."}[5m])

# Pod CPU usage
rate(container_cpu_usage_seconds_total{pod=~"license-api.*"}[5m])

# Pod memory usage
container_memory_working_set_bytes{pod=~"license-api.*"}

Deployment Workflow

1. Initial Deployment

# Build and push Docker image
docker build -t gcr.io/coditect-citus-prod/license-api:v1.0.0 .
docker push gcr.io/coditect-citus-prod/license-api:v1.0.0

# Apply Kubernetes manifests
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/configmap.yaml
kubectl apply -f kubernetes/secret.yaml
kubectl apply -f kubernetes/deployment.yaml
kubectl apply -f kubernetes/service.yaml
kubectl apply -f kubernetes/hpa.yaml
kubectl apply -f kubernetes/ingress.yaml

# Verify deployment
kubectl rollout status deployment/license-api
kubectl get pods -l app=license-api
kubectl get hpa license-api-hpa
kubectl get ingress license-api-ingress

2. Rolling Update

# Update image tag in deployment
kubectl set image deployment/license-api \
license-api=gcr.io/coditect-citus-prod/license-api:v1.1.0

# Monitor rollout
kubectl rollout status deployment/license-api

# Verify new pods
kubectl get pods -l app=license-api -o wide

# Rollback if needed
kubectl rollout undo deployment/license-api

3. Scaling Operations

# Manual scaling
kubectl scale deployment license-api --replicas=5

# HPA status
kubectl get hpa license-api-hpa

# Cluster autoscaler status
kubectl get nodes
kubectl describe node <node-name>

Troubleshooting

Pod Not Starting

# Check pod status
kubectl get pods -l app=license-api
kubectl describe pod <pod-name>

# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous # Previous container

# Check events
kubectl get events --sort-by='.lastTimestamp'

# Common issues:
# - ImagePullBackOff: Check GCR permissions
# - CrashLoopBackOff: Check application logs
# - Pending: Insufficient resources (check HPA/cluster autoscaler)

Service Not Reachable

# Check service endpoints
kubectl get endpoints license-api

# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
curl http://license-api:8000/health

# Check ingress
kubectl describe ingress license-api-ingress
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller

High Latency

# Check HPA
kubectl get hpa license-api-hpa

# Check resource usage
kubectl top pods -l app=license-api
kubectl top nodes

# Check Prometheus metrics
kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Open http://localhost:9090 and query metrics

Security Best Practices

1. Pod Security Standards

apiVersion: v1
kind: Pod
metadata:
name: license-api
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
seccompProfile:
type: RuntimeDefault
containers:
- name: license-api
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true

2. Network Policies

# Deny all ingress by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: default
spec:
podSelector: {}
policyTypes:
- Ingress

3. Resource Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
name: namespace-quota
namespace: default
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"


Document History

VersionDateAuthorChanges
1.02025-11-23SDD ArchitectInitial GKE component diagram

Document Classification: Internal - Architecture Documentation Review Cycle: Quarterly Next Review Date: 2026-02-23