Agent Skills Framework Extension (Optional)
Container Orchestration Skill
When to Use This Skill
Use this skill when implementing container orchestration patterns in your codebase.
How to Use This Skill
- Review the patterns and examples below
- Apply the relevant patterns to your implementation
- Follow the best practices outlined in this skill
Kubernetes cluster management, Docker containerization, and Helm chart development for production-grade deployments.
Core Capabilities
- Kubernetes Management - Cluster provisioning, pod orchestration
- Docker Optimization - Multi-stage builds, image optimization
- Helm Charts - Chart development, dependency management
- Service Mesh - Istio configuration, traffic management
- Resource Management - Requests, limits, autoscaling
Optimized Dockerfile
# Multi-stage build for production
FROM node:20-alpine AS builder
WORKDIR /app
# Copy dependency files first (cache optimization)
COPY package*.json ./
RUN npm ci --only=production
# Copy source and build
COPY . .
RUN npm run build
# Production image
FROM node:20-alpine AS production
# Security: non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
# Copy only production dependencies and build output
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package.json ./
USER nodejs
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:8080/health', (r) => r.statusCode === 200 ? process.exit(0) : process.exit(1))"
CMD ["node", "dist/main.js"]
Kubernetes Deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
labels:
app: api
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: api
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: api
version: v1
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: api
securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001
containers:
- name: api
image: gcr.io/project/api:latest
ports:
- containerPort: 8080
name: http
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /health/live
port: http
initialDelaySeconds: 10
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
volumes:
- name: config
configMap:
name: api-config
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api
---
apiVersion: v1
kind: Service
metadata:
name: api
spec:
type: ClusterIP
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
Helm Chart Structure
charts/api/
├── Chart.yaml
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
├── templates/
│ ├── _helpers.tpl
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ ├── hpa.yaml
│ ├── serviceaccount.yaml
│ ├── configmap.yaml
│ └── secret.yaml
└── charts/
Chart.yaml
apiVersion: v2
name: api
description: API service Helm chart
type: application
version: 1.0.0
appVersion: "1.0.0"
dependencies:
- name: redis
version: "17.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
values.yaml
replicaCount: 3
image:
repository: gcr.io/project/api
tag: latest
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: api.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: api-tls
hosts:
- api.example.com
resources:
requests:
cpu: 250m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 70
nodeSelector: {}
tolerations: []
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: api
topologyKey: kubernetes.io/hostname
env:
- name: NODE_ENV
value: production
redis:
enabled: false
templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "api.fullname" . }}
labels:
{{- include "api.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "api.selectorLabels" . | nindent 6 }}
template:
metadata:
labels:
{{- include "api.selectorLabels" . | nindent 8 }}
spec:
serviceAccountName: {{ include "api.serviceAccountName" . }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
livenessProbe:
httpGet:
path: /health/live
port: http
readinessProbe:
httpGet:
path: /health/ready
port: http
resources:
{{- toYaml .Values.resources | nindent 12 }}
env:
{{- toYaml .Values.env | nindent 12 }}
{{- with .Values.nodeSelector }}
nodeSelector:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.affinity }}
affinity:
{{- toYaml . | nindent 8 }}
{{- end }}
GKE Cluster Configuration
# gke-cluster.yaml (Terraform)
resource "google_container_cluster" "primary" {
name = "production-cluster"
location = "us-central1"
# Autopilot mode for simplified management
enable_autopilot = true
# Or standard mode with node pools
# remove_default_node_pool = true
# initial_node_count = 1
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
release_channel {
channel = "STABLE"
}
}
resource "google_container_node_pool" "primary" {
name = "primary-pool"
location = google_container_cluster.primary.location
cluster = google_container_cluster.primary.name
node_count = 3
autoscaling {
min_node_count = 3
max_node_count = 10
}
node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
workload_metadata_config {
mode = "GKE_METADATA"
}
}
}
Service Mesh (Istio)
# istio-gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
name: api-gateway
spec:
selector:
istio: ingressgateway
servers:
- port:
number: 443
name: https
protocol: HTTPS
tls:
mode: SIMPLE
credentialName: api-tls
hosts:
- api.example.com
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: api
spec:
hosts:
- api.example.com
gateways:
- api-gateway
http:
- match:
- uri:
prefix: /api/v1
route:
- destination:
host: api
port:
number: 80
retries:
attempts: 3
perTryTimeout: 5s
retryOn: 5xx,reset,connect-failure
timeout: 30s
Usage Examples
Create Kubernetes Deployment
Apply container-orchestration skill to create a production-ready Kubernetes deployment with HPA and pod anti-affinity
Develop Helm Chart
Apply container-orchestration skill to create a Helm chart for the API service with staging and production values
Optimize Docker Image
Apply container-orchestration skill to create a multi-stage Dockerfile with security hardening and health checks
Success Output
When this skill is successfully applied, you MUST output:
✅ SKILL COMPLETE: container-orchestration
Completed:
- [x] Optimized Dockerfile created with multi-stage build
- [x] Kubernetes deployment configured with proper resource limits
- [x] Health checks implemented (liveness and readiness probes)
- [x] Horizontal Pod Autoscaler (HPA) configured
- [x] Helm chart created with environment-specific values
Outputs:
- Dockerfile: [file path] (optimized, security-hardened)
- Deployment manifest: [file path] (K8s deployment, service, HPA)
- Helm chart: [directory path] (Chart.yaml, values, templates)
- CI/CD integration: [file path] (deployment pipeline)
Completion Checklist
Before marking this skill as complete, verify:
- Dockerfile uses multi-stage build for size optimization
- Container runs as non-root user (security)
- Health check endpoint implemented and configured
- Resource requests and limits defined (CPU, memory)
- Liveness and readiness probes configured with appropriate thresholds
- HPA configured with CPU/memory targets
- Pod anti-affinity rules set for high availability
- ConfigMap and Secret management implemented
- Helm chart tested with staging and production values
- Deployment tested in cluster (successful rollout)
Failure Indicators
This skill has FAILED if:
- ❌ Dockerfile produces bloated image (>500MB for typical app)
- ❌ Container runs as root user (security vulnerability)
- ❌ No health checks configured (can't detect unhealthy pods)
- ❌ No resource limits set (can consume all cluster resources)
- ❌ Probes misconfigured (pods restart unnecessarily or stay unhealthy)
- ❌ HPA not working (pods don't scale under load)
- ❌ Single pod deployment (no high availability)
- ❌ Secrets stored in plain text (ConfigMap instead of Secret)
- ❌ Deployment fails or pods crash on startup
When NOT to Use
Do NOT use this skill when:
- Simple application with no scalability requirements (single VM suffices)
- Development/local environment only (Docker Compose simpler)
- Serverless architecture more appropriate (use Cloud Functions/Lambda instead)
- Batch job or cron task (use Kubernetes CronJob, not Deployment)
- Stateful application requiring persistent storage (use StatefulSet, not Deployment)
Use alternatives:
- For local dev: Docker Compose
- For serverless: Cloud Functions, AWS Lambda, Cloud Run
- For batch jobs: Kubernetes CronJob or Job
- For stateful apps: StatefulSet with PersistentVolumeClaim
- For simple VMs: Traditional deployment tools (Ansible, Terraform)
Anti-Patterns (Avoid)
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Running as root | Security vulnerability | Create non-root user in Dockerfile |
| No resource limits | Can crash entire cluster | Always set requests and limits |
| Missing health checks | Can't detect/recover from failures | Implement /health endpoints and probes |
| Storing secrets in ConfigMap | Secrets visible in plain text | Use Kubernetes Secret with encryption at rest |
| Single replica | No high availability | Deploy ≥3 replicas with anti-affinity |
| No HPA | Manual scaling required | Configure HPA with appropriate metrics |
| Latest tag in production | Unpredictable deployments | Use semantic versioning (v1.2.3) |
| Large base images | Slow builds, large attack surface | Use alpine or distroless images |
Principles
This skill embodies:
- #3 Keep It Simple - Use managed services (GKE Autopilot) when possible
- #4 Separation of Concerns - Separate config (ConfigMap) from secrets (Secret)
- #8 No Assumptions - Verify deployment health with probes and monitoring
- #11 Resilience and Robustness - HPA, anti-affinity, health checks ensure stability
- Security by Default - Non-root user, minimal base images, secret management
Full Standard: CODITECT-STANDARD-AUTOMATION.md
Integration Points
- cicd-pipeline-design - Deploy via CI/CD pipelines
- monitoring-observability - Prometheus metrics, distributed tracing
- infrastructure-as-code - Terraform cluster provisioning
- multi-tenant-security - Network policies, pod security