C3: GKE Component Diagram - Kubernetes Cluster Internals

Level: Component (C4 Model Level 3) Scope: Google Kubernetes Engine Cluster Internal Architecture Primary Audience: Platform Engineers, Kubernetes Administrators, Senior Developers Last Updated: November 23, 2025

Overview

This diagram shows the internal components of the GKE cluster, including the Kubernetes control plane, node pools, networking, and application deployments.

Key Components:

Kubernetes control plane (managed by Google)
Multi-zone node pools with auto-scaling
NGINX Ingress Controller for traffic routing
License API deployment with HPA (Horizontal Pod Autoscaler)
CoreDNS for service discovery
Workload Identity for secure GCP access

GKE Component Diagram

Component Details

1. Kubernetes Control Plane (Managed by Google)

Components:

kube-apiserver: REST API endpoint for all cluster operations
kube-scheduler: Assigns pods to nodes based on resource requirements
kube-controller-manager: Runs controllers (Deployment, ReplicaSet, Service, etc.)
etcd: Distributed key-value store for cluster state

Management:

Fully managed by Google (no direct access)
Multi-zone HA (3 replicas across zones)
Automatic version upgrades (release channel: REGULAR)
99.95% SLA for regional clusters

Configuration:

apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority-data: <CA_CERT>
    server: https://34.72.XX.XX  # Master endpoint
  name: coditect-dev

2. Node Pool Configuration

Specifications:

apiVersion: container.cnrm.cloud.google.com/v1beta1
kind: ContainerNodePool
metadata:
  name: coditect-dev-node-pool
spec:
  autoscaling:
    minNodeCount: 3
    maxNodeCount: 10
  nodeConfig:
    machineType: n1-standard-2
    diskSizeGb: 100
    diskType: pd-ssd
    preemptible: true  # Cost savings for dev
    serviceAccount: license-api@coditect.iam.gserviceaccount.com
    oauthScopes:
      - https://www.googleapis.com/auth/cloud-platform
    labels:
      environment: dev
      component: compute
    tags:
      - gke-node
      - allow-health-check
  management:
    autoRepair: true
    autoUpgrade: true
  upgradeSettings:
    maxSurge: 1
    maxUnavailable: 0

Node Resources:

CPU: 2 vCPU per node
Memory: 7.5GB RAM per node
Disk: 100GB SSD (OS + container images)
Network: 10 Gbps (egress capped at 2 Gbps per node)

Node Zones (Multi-Zone HA):

us-central1-a (Zone 1)
us-central1-b (Zone 2)
us-central1-c (Zone 3)

3. License API Deployment

Deployment Manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: license-api
  namespace: default
  labels:
    app: license-api
    version: v1.0.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: license-api
  template:
    metadata:
      labels:
        app: license-api
        version: v1.0.0
    spec:
      serviceAccountName: license-api-sa
      containers:
      - name: license-api
        image: gcr.io/coditect-citus-prod/license-api:latest
        ports:
        - containerPort: 8000
          name: http
          protocol: TCP
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: connection-string
        - name: REDIS_URL
          value: "redis://10.121.42.67:6378"
        - name: ENVIRONMENT
          value: "production"
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
          limits:
            cpu: 2000m
            memory: 4Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 30
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 2
        volumeMounts:
        - name: config
          mountPath: /app/config
          readOnly: true
      volumes:
      - name: config
        configMap:
          name: license-api-config

Pod Lifecycle:

Scheduler assigns pod to node with sufficient resources
kubelet pulls container image from GCR
kubelet starts container with environment variables
Readiness probe checks /ready endpoint
Pod added to service endpoints (receives traffic)
Liveness probe monitors /health endpoint
On failure: kubelet restarts container (3 failures → pod restart)

4. NGINX Ingress Controller

Purpose: HTTP(S) load balancing and routing for Kubernetes services

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  replicas: 2  # HA configuration
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
  template:
    spec:
      containers:
      - name: controller
        image: registry.k8s.io/ingress-nginx/controller:v1.9.4
        args:
          - /nginx-ingress-controller
          - --configmap=$(POD_NAMESPACE)/ingress-nginx-controller
          - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
          - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
          - --publish-service=$(POD_NAMESPACE)/ingress-nginx-controller
        ports:
        - name: http
          containerPort: 80
        - name: https
          containerPort: 443
        resources:
          requests:
            cpu: 100m
            memory: 128Mi

Ingress Resource:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: license-api-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.coditect.ai
    secretName: api-tls-cert
  rules:
  - host: api.coditect.ai
    http:
      paths:
      - path: /api/v1
        pathType: Prefix
        backend:
          service:
            name: license-api
            port:
              number: 8000

Traffic Flow:

Google Cloud LB (SSL termination)
    ↓
NGINX Ingress Controller (path routing)
    ↓
license-api Service (ClusterIP)
    ↓
License API Pods (round-robin)

5. Horizontal Pod Autoscaler (HPA)

Configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: license-api-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: license-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 25
        periodSeconds: 60

Scaling Logic:

Scale Up: Add 50% more pods every 60 seconds if CPU > 70%
Scale Down: Remove 25% of pods every 60 seconds if CPU < 70%
Stabilization: Wait 5 minutes before scaling down (prevent flapping)

Example Scenario:

Current: 3 pods @ 80% CPU
HPA calculates: 3 * (80% / 70%) = 3.43 → 4 pods
After 60 seconds: 4 pods deployed
CPU drops to 60%
HPA waits 5 minutes (stabilization)
CPU still at 60%
HPA scales down: 4 * 0.75 = 3 pods

6. Cluster Autoscaler

Configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler
  namespace: kube-system
data:
  config.yaml: |
    scaleDownEnabled: true
    scaleDownDelayAfterAdd: 10m
    scaleDownUnneededTime: 10m
    scaleDownUtilizationThreshold: 0.5
    maxNodeProvisionTime: 15m
    maxGracefulTerminationSec: 600

Scaling Triggers:

Scale Up: Pods in Pending state due to insufficient resources
Scale Down: Node utilization < 50% for 10 minutes

Example Scenario:

Current: 3 nodes (6 vCPU total)
HPA scales up: 10 pods × 500m CPU = 5 vCPU needed
Available: 6 vCPU (after overhead ~5.5 vCPU)
Pods fit: All 10 pods scheduled
Later: Traffic drops, HPA scales down to 3 pods
Node utilization: ~30% (below 50% threshold)
After 10 minutes: Cluster Autoscaler removes 1 node
New state: 2 nodes (4 vCPU total)

7. CoreDNS (Service Discovery)

Purpose: DNS-based service discovery for Kubernetes

Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: coredns
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: kube-dns
  template:
    spec:
      containers:
      - name: coredns
        image: registry.k8s.io/coredns/coredns:v1.10.1
        args:
          - -conf
          - /etc/coredns/Corefile
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        resources:
          requests:
            cpu: 100m
            memory: 128Mi

Corefile Configuration:

.:53 {
    errors
    health {
        lameduck 5s
    }
    ready
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    prometheus :9153
    forward . /etc/resolv.conf
    cache 30
    loop
    reload
    loadbalance
}

DNS Resolution Example:

# Inside License API pod
import aiohttp

# Resolve service name via CoreDNS
async with aiohttp.ClientSession() as session:
    # license-api.default.svc.cluster.local → ClusterIP
    async with session.get("http://license-api:8000/health") as response:
        return await response.json()

DNS Records:

# Service DNS
license-api.default.svc.cluster.local → 10.2.0.50 (ClusterIP)

# Pod DNS
10-1-0-123.default.pod.cluster.local → 10.1.0.123 (Pod IP)

# Headless service (StatefulSet)
prometheus-0.prometheus.default.svc.cluster.local → 10.1.0.200

8. Workload Identity (GCP Integration)

Purpose: Secure GCP API access without long-lived service account keys

Configuration:

# Kubernetes Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: license-api-sa
  namespace: default
  annotations:
    iam.gke.io/gcp-service-account: license-api@coditect-citus-prod.iam.gserviceaccount.com

---
# IAM Policy Binding (gcloud command)
gcloud iam service-accounts add-iam-policy-binding \
  license-api@coditect-citus-prod.iam.gserviceaccount.com \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:coditect-citus-prod.svc.id.goog[default/license-api-sa]"

GCP Permissions:

# Google Service Account: license-api@coditect-citus-prod.iam
Roles:
  - roles/secretmanager.secretAccessor  # Read secrets
  - roles/cloudkms.signerVerifier       # Sign licenses
  - roles/cloudsql.client               # Connect to Cloud SQL
  - roles/monitoring.metricWriter       # Write metrics
  - roles/logging.logWriter             # Write logs

Usage in Pod:

# Automatic authentication via Workload Identity
from google.cloud import secretmanager

# No explicit credentials needed - Workload Identity handles it
client = secretmanager.SecretManagerServiceAsyncClient()
secret_name = f"projects/{PROJECT_ID}/secrets/db-password/versions/latest"
response = await client.access_secret_version(request={"name": secret_name})
password = response.payload.data.decode("UTF-8")

9. Networking (CNI Plugin)

Technology: GKE VPC-native networking (kubenet replacement)

Configuration:

# IP allocation
Node CIDR: 10.0.0.0/16 (65,536 IPs)
Pod CIDR: 10.1.0.0/16 (65,536 IPs - alias IPs)
Service CIDR: 10.2.0.0/16 (65,536 IPs - virtual IPs)

# IP assignment
Each node reserves /24 subnet for pods (256 IPs per node)
Node 1: 10.1.0.0/24
Node 2: 10.1.1.0/24
Node 3: 10.1.2.0/24

# Service IP allocation
ClusterIP services get IPs from 10.2.0.0/16
license-api Service: 10.2.0.50
kube-dns Service: 10.2.0.10
prometheus Service: 10.2.0.100

Network Policies:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: license-api-network-policy
  namespace: default
spec:
  podSelector:
    matchLabels:
      app: license-api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: ingress-nginx
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: coredns
    ports:
    - protocol: UDP
      port: 53
  - to:
    - namespaceSelector: {}  # Allow egress to Cloud SQL/Redis

10. Monitoring (Prometheus)

Purpose: Metrics collection and alerting

Deployment:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: monitoring
spec:
  serviceName: prometheus
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:v2.48.0
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention.time=15d'
        ports:
        - containerPort: 9090
          name: http
        resources:
          requests:
            cpu: 500m
            memory: 2Gi
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
        - name: storage
          mountPath: /prometheus
      volumes:
      - name: config
        configMap:
          name: prometheus-config
  volumeClaimTemplates:
  - metadata:
      name: storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: pd-ssd
      resources:
        requests:
          storage: 100Gi

Scrape Configuration:

scrape_configs:
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__

  - job_name: 'license-api'
    static_configs:
      - targets: ['license-api:8000']
    metrics_path: '/metrics'

Key Metrics:

# Request rate
rate(http_requests_total{job="license-api"}[5m])

# Request latency (p95)
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

# Error rate
rate(http_requests_total{job="license-api",status=~"5.."}[5m])

# Pod CPU usage
rate(container_cpu_usage_seconds_total{pod=~"license-api.*"}[5m])

# Pod memory usage
container_memory_working_set_bytes{pod=~"license-api.*"}

Deployment Workflow

1. Initial Deployment

# Build and push Docker image
docker build -t gcr.io/coditect-citus-prod/license-api:v1.0.0 .
docker push gcr.io/coditect-citus-prod/license-api:v1.0.0

# Apply Kubernetes manifests
kubectl apply -f kubernetes/namespace.yaml
kubectl apply -f kubernetes/configmap.yaml
kubectl apply -f kubernetes/secret.yaml
kubectl apply -f kubernetes/deployment.yaml
kubectl apply -f kubernetes/service.yaml
kubectl apply -f kubernetes/hpa.yaml
kubectl apply -f kubernetes/ingress.yaml

# Verify deployment
kubectl rollout status deployment/license-api
kubectl get pods -l app=license-api
kubectl get hpa license-api-hpa
kubectl get ingress license-api-ingress

2. Rolling Update

# Update image tag in deployment
kubectl set image deployment/license-api \
  license-api=gcr.io/coditect-citus-prod/license-api:v1.1.0

# Monitor rollout
kubectl rollout status deployment/license-api

# Verify new pods
kubectl get pods -l app=license-api -o wide

# Rollback if needed
kubectl rollout undo deployment/license-api

3. Scaling Operations

# Manual scaling
kubectl scale deployment license-api --replicas=5

# HPA status
kubectl get hpa license-api-hpa

# Cluster autoscaler status
kubectl get nodes
kubectl describe node <node-name>

Troubleshooting

Pod Not Starting

# Check pod status
kubectl get pods -l app=license-api
kubectl describe pod <pod-name>

# Check logs
kubectl logs <pod-name>
kubectl logs <pod-name> --previous  # Previous container

# Check events
kubectl get events --sort-by='.lastTimestamp'

# Common issues:
# - ImagePullBackOff: Check GCR permissions
# - CrashLoopBackOff: Check application logs
# - Pending: Insufficient resources (check HPA/cluster autoscaler)

Service Not Reachable

# Check service endpoints
kubectl get endpoints license-api

# Test from within cluster
kubectl run -it --rm debug --image=curlimages/curl --restart=Never -- \
  curl http://license-api:8000/health

# Check ingress
kubectl describe ingress license-api-ingress
kubectl logs -n ingress-nginx deployment/ingress-nginx-controller

High Latency

# Check HPA
kubectl get hpa license-api-hpa

# Check resource usage
kubectl top pods -l app=license-api
kubectl top nodes

# Check Prometheus metrics
kubectl port-forward -n monitoring svc/prometheus 9090:9090
# Open http://localhost:9090 and query metrics

Security Best Practices

1. Pod Security Standards

apiVersion: v1
kind: Pod
metadata:
  name: license-api
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: license-api
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true

2. Network Policies

# Deny all ingress by default
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: default
spec:
  podSelector: {}
  policyTypes:
  - Ingress

3. Resource Quotas

apiVersion: v1
kind: ResourceQuota
metadata:
  name: namespace-quota
  namespace: default
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"

C1: System Context - External system view
C2: Container Diagram - High-level containers
C3: Networking Components - VPC and networking
C3: Security Components - Security architecture

Document History

Version	Date	Author	Changes
1.0	2025-11-23	SDD Architect	Initial GKE component diagram

Document Classification: Internal - Architecture Documentation Review Cycle: Quarterly Next Review Date: 2026-02-23

Overview​

GKE Component Diagram​

Component Details​

1. Kubernetes Control Plane (Managed by Google)​

2. Node Pool Configuration​

3. License API Deployment​

4. NGINX Ingress Controller​

5. Horizontal Pod Autoscaler (HPA)​

6. Cluster Autoscaler​

7. CoreDNS (Service Discovery)​

8. Workload Identity (GCP Integration)​

9. Networking (CNI Plugin)​

10. Monitoring (Prometheus)​

Deployment Workflow​

1. Initial Deployment​

2. Rolling Update​

3. Scaling Operations​

Troubleshooting​

Pod Not Starting​

Service Not Reachable​

High Latency​

Security Best Practices​

1. Pod Security Standards​

2. Network Policies​

3. Resource Quotas​

Related Diagrams​

Document History​

Overview

GKE Component Diagram

Component Details

1. Kubernetes Control Plane (Managed by Google)

2. Node Pool Configuration

3. License API Deployment

4. NGINX Ingress Controller

5. Horizontal Pod Autoscaler (HPA)

6. Cluster Autoscaler

7. CoreDNS (Service Discovery)

8. Workload Identity (GCP Integration)

9. Networking (CNI Plugin)

10. Monitoring (Prometheus)

Deployment Workflow

1. Initial Deployment

2. Rolling Update

3. Scaling Operations

Troubleshooting

Pod Not Starting

Service Not Reachable

High Latency

Security Best Practices

1. Pod Security Standards

2. Network Policies

3. Resource Quotas

Related Diagrams

Document History