Skip to main content

Agent Skills Framework Extension (Optional)

Container Orchestration Skill

When to Use This Skill

Use this skill when implementing container orchestration patterns in your codebase.

How to Use This Skill

  1. Review the patterns and examples below
  2. Apply the relevant patterns to your implementation
  3. Follow the best practices outlined in this skill

Kubernetes cluster management, Docker containerization, and Helm chart development for production-grade deployments.

Core Capabilities

  1. Kubernetes Management - Cluster provisioning, pod orchestration
  2. Docker Optimization - Multi-stage builds, image optimization
  3. Helm Charts - Chart development, dependency management
  4. Service Mesh - Istio configuration, traffic management
  5. Resource Management - Requests, limits, autoscaling

Optimized Dockerfile

# Multi-stage build for production
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependency files first (cache optimization)
COPY package*.json ./
RUN npm ci --only=production

# Copy source and build
COPY . .
RUN npm run build

# Production image
FROM node:20-alpine AS production

# Security: non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001

WORKDIR /app

# Copy only production dependencies and build output
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package.json ./

USER nodejs

EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:8080/health', (r) => r.statusCode === 200 ? process.exit(0) : process.exit(1))"

CMD ["node", "dist/main.js"]

Kubernetes Deployment

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
app: api
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: api
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: api
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
containers:
- name: api
image: gcr.io/project/api:latest
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: config
configMap:
name: api-config
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api
---
apiVersion: v1
kind: Service
metadata:
name: api
spec:
type: ClusterIP
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15

Helm Chart Structure

charts/api/
├── Chart.yaml
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
├── templates/
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── hpa.yaml
│ ├── serviceaccount.yaml
│ ├── configmap.yaml
│ └── secret.yaml
└── charts/

Chart.yaml

apiVersion: v2
name: api
description: API service Helm chart
type: application
version: 1.0.0
appVersion: "1.0.0"
dependencies:
- name: redis
version: "17.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled

values.yaml

replicaCount: 3

image:
repository: gcr.io/project/api
tag: latest
pullPolicy: IfNotPresent

service:
type: ClusterIP
port: 80

ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: api-tls
hosts:
- api.example.com

resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi

autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70

nodeSelector: {}

tolerations: []

affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api
topologyKey: kubernetes.io/hostname

env:
- name: NODE_ENV
value: production

redis:
enabled: false

templates/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "api.fullname" . }}
labels:
{{- include "api.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "api.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "api.selectorLabels" . | nindent 8 }}
spec:
serviceAccountName: {{ include "api.serviceAccountName" . }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: http
readinessProbe:
httpGet:
path: /health/ready
port: http
resources:
{{- toYaml .Values.resources | nindent 12 }}
env:
{{- toYaml .Values.env | nindent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}

GKE Cluster Configuration

# gke-cluster.yaml (Terraform)
resource "google_container_cluster" "primary" {
name = "production-cluster"
location = "us-central1"

# Autopilot mode for simplified management
enable_autopilot = true

# Or standard mode with node pools
# remove_default_node_pool = true
# initial_node_count = 1

network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name

ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}

workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}

release_channel {
channel = "STABLE"
}
}

resource "google_container_node_pool" "primary" {
name = "primary-pool"
location = google_container_cluster.primary.location
cluster = google_container_cluster.primary.name
node_count = 3

autoscaling {
min_node_count = 3
max_node_count = 10
}

node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100

oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]

workload_metadata_config {
mode = "GKE_METADATA"
}
}
}

Service Mesh (Istio)

# istio-gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: api-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: api-tls
hosts:
- api.example.com
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api
spec:
hosts:
- api.example.com
gateways:
- api-gateway
http:
- match:
- uri:
prefix: /api/v1
route:
- destination:
host: api
port:
number: 80
retries:
attempts: 3
perTryTimeout: 5s
retryOn: 5xx,reset,connect-failure
timeout: 30s

Usage Examples

Create Kubernetes Deployment

Apply container-orchestration skill to create a production-ready Kubernetes deployment with HPA and pod anti-affinity

Develop Helm Chart

Apply container-orchestration skill to create a Helm chart for the API service with staging and production values

Optimize Docker Image

Apply container-orchestration skill to create a multi-stage Dockerfile with security hardening and health checks

Success Output

When this skill is successfully applied, you MUST output:

✅ SKILL COMPLETE: container-orchestration

Completed:
- [x] Optimized Dockerfile created with multi-stage build
- [x] Kubernetes deployment configured with proper resource limits
- [x] Health checks implemented (liveness and readiness probes)
- [x] Horizontal Pod Autoscaler (HPA) configured
- [x] Helm chart created with environment-specific values

Outputs:
- Dockerfile: [file path] (optimized, security-hardened)
- Deployment manifest: [file path] (K8s deployment, service, HPA)
- Helm chart: [directory path] (Chart.yaml, values, templates)
- CI/CD integration: [file path] (deployment pipeline)

Completion Checklist

Before marking this skill as complete, verify:

  • Dockerfile uses multi-stage build for size optimization
  • Container runs as non-root user (security)
  • Health check endpoint implemented and configured
  • Resource requests and limits defined (CPU, memory)
  • Liveness and readiness probes configured with appropriate thresholds
  • HPA configured with CPU/memory targets
  • Pod anti-affinity rules set for high availability
  • ConfigMap and Secret management implemented
  • Helm chart tested with staging and production values
  • Deployment tested in cluster (successful rollout)

Failure Indicators

This skill has FAILED if:

  • ❌ Dockerfile produces bloated image (>500MB for typical app)
  • ❌ Container runs as root user (security vulnerability)
  • ❌ No health checks configured (can't detect unhealthy pods)
  • ❌ No resource limits set (can consume all cluster resources)
  • ❌ Probes misconfigured (pods restart unnecessarily or stay unhealthy)
  • ❌ HPA not working (pods don't scale under load)
  • ❌ Single pod deployment (no high availability)
  • ❌ Secrets stored in plain text (ConfigMap instead of Secret)
  • ❌ Deployment fails or pods crash on startup

When NOT to Use

Do NOT use this skill when:

  • Simple application with no scalability requirements (single VM suffices)
  • Development/local environment only (Docker Compose simpler)
  • Serverless architecture more appropriate (use Cloud Functions/Lambda instead)
  • Batch job or cron task (use Kubernetes CronJob, not Deployment)
  • Stateful application requiring persistent storage (use StatefulSet, not Deployment)

Use alternatives:

  • For local dev: Docker Compose
  • For serverless: Cloud Functions, AWS Lambda, Cloud Run
  • For batch jobs: Kubernetes CronJob or Job
  • For stateful apps: StatefulSet with PersistentVolumeClaim
  • For simple VMs: Traditional deployment tools (Ansible, Terraform)

Anti-Patterns (Avoid)

Anti-PatternProblemSolution
Running as rootSecurity vulnerabilityCreate non-root user in Dockerfile
No resource limitsCan crash entire clusterAlways set requests and limits
Missing health checksCan't detect/recover from failuresImplement /health endpoints and probes
Storing secrets in ConfigMapSecrets visible in plain textUse Kubernetes Secret with encryption at rest
Single replicaNo high availabilityDeploy ≥3 replicas with anti-affinity
No HPAManual scaling requiredConfigure HPA with appropriate metrics
Latest tag in productionUnpredictable deploymentsUse semantic versioning (v1.2.3)
Large base imagesSlow builds, large attack surfaceUse alpine or distroless images

Principles

This skill embodies:

  • #3 Keep It Simple - Use managed services (GKE Autopilot) when possible
  • #4 Separation of Concerns - Separate config (ConfigMap) from secrets (Secret)
  • #8 No Assumptions - Verify deployment health with probes and monitoring
  • #11 Resilience and Robustness - HPA, anti-affinity, health checks ensure stability
  • Security by Default - Non-root user, minimal base images, secret management

Full Standard: CODITECT-STANDARD-AUTOMATION.md

Integration Points

  • cicd-pipeline-design - Deploy via CI/CD pipelines
  • monitoring-observability - Prometheus metrics, distributed tracing
  • infrastructure-as-code - Terraform cluster provisioning
  • multi-tenant-security - Network policies, pod security