Agent Skills Framework Extension (Optional)

Container Orchestration Skill

When to Use This Skill

Use this skill when implementing container orchestration patterns in your codebase.

How to Use This Skill

Review the patterns and examples below
Apply the relevant patterns to your implementation
Follow the best practices outlined in this skill

Kubernetes cluster management, Docker containerization, and Helm chart development for production-grade deployments.

Core Capabilities

Kubernetes Management - Cluster provisioning, pod orchestration
Docker Optimization - Multi-stage builds, image optimization
Helm Charts - Chart development, dependency management
Service Mesh - Istio configuration, traffic management
Resource Management - Requests, limits, autoscaling

Optimized Dockerfile

# Multi-stage build for production
FROM node:20-alpine AS builder

WORKDIR /app

# Copy dependency files first (cache optimization)
COPY package*.json ./
RUN npm ci --only=production

# Copy source and build
COPY . .
RUN npm run build

# Production image
FROM node:20-alpine AS production

# Security: non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# Copy only production dependencies and build output
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package.json ./

USER nodejs

EXPOSE 8080

HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:8080/health', (r) => r.statusCode === 200 ? process.exit(0) : process.exit(1))"

CMD ["node", "dist/main.js"]

Kubernetes Deployment

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  labels:
    app: api
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: api
        version: v1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: api
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        fsGroup: 1001
      containers:
        - name: api
          image: gcr.io/project/api:latest
          ports:
            - containerPort: 8080
              name: http
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "1000m"
              memory: "512Mi"
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
            initialDelaySeconds: 10
            periodSeconds: 15
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          env:
            - name: NODE_ENV
              value: "production"
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: api-secrets
                  key: database-url
          volumeMounts:
            - name: config
              mountPath: /app/config
              readOnly: true
      volumes:
        - name: config
          configMap:
            name: api-config
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: api
                topologyKey: kubernetes.io/hostname
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: api
---
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  type: ClusterIP
  ports:
    - port: 80
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app: api
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15

Helm Chart Structure

charts/api/
├── Chart.yaml
├── values.yaml
├── values-staging.yaml
├── values-production.yaml
├── templates/
│   ├── _helpers.tpl
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml
│   ├── serviceaccount.yaml
│   ├── configmap.yaml
│   └── secret.yaml
└── charts/

Chart.yaml

apiVersion: v2
name: api
description: API service Helm chart
type: application
version: 1.0.0
appVersion: "1.0.0"
dependencies:
  - name: redis
    version: "17.x.x"
    repository: "https://charts.bitnami.com/bitnami"
    condition: redis.enabled

values.yaml

replicaCount: 3

image:
  repository: gcr.io/project/api
  tag: latest
  pullPolicy: IfNotPresent

service:
  type: ClusterIP
  port: 80

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: api.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: api-tls
      hosts:
        - api.example.com

resources:
  requests:
    cpu: 250m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: api
          topologyKey: kubernetes.io/hostname

env:
  - name: NODE_ENV
    value: production

redis:
  enabled: false

templates/deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "api.fullname" . }}
  labels:
    {{- include "api.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "api.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "api.selectorLabels" . | nindent 8 }}
    spec:
      serviceAccountName: {{ include "api.serviceAccountName" . }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8080
          livenessProbe:
            httpGet:
              path: /health/live
              port: http
          readinessProbe:
            httpGet:
              path: /health/ready
              port: http
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          env:
            {{- toYaml .Values.env | nindent 12 }}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}

GKE Cluster Configuration

# gke-cluster.yaml (Terraform)
resource "google_container_cluster" "primary" {
  name     = "production-cluster"
  location = "us-central1"

  # Autopilot mode for simplified management
  enable_autopilot = true

  # Or standard mode with node pools
  # remove_default_node_pool = true
  # initial_node_count       = 1

  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.subnet.name

  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }

  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }

  release_channel {
    channel = "STABLE"
  }
}

resource "google_container_node_pool" "primary" {
  name       = "primary-pool"
  location   = google_container_cluster.primary.location
  cluster    = google_container_cluster.primary.name
  node_count = 3

  autoscaling {
    min_node_count = 3
    max_node_count = 10
  }

  node_config {
    machine_type = "e2-standard-4"
    disk_size_gb = 100

    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    workload_metadata_config {
      mode = "GKE_METADATA"
    }
  }
}

Service Mesh (Istio)

# istio-gateway.yaml
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: api-gateway
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 443
        name: https
        protocol: HTTPS
      tls:
        mode: SIMPLE
        credentialName: api-tls
      hosts:
        - api.example.com
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: api
spec:
  hosts:
    - api.example.com
  gateways:
    - api-gateway
  http:
    - match:
        - uri:
            prefix: /api/v1
      route:
        - destination:
            host: api
            port:
              number: 80
      retries:
        attempts: 3
        perTryTimeout: 5s
        retryOn: 5xx,reset,connect-failure
      timeout: 30s

Usage Examples

Create Kubernetes Deployment

Apply container-orchestration skill to create a production-ready Kubernetes deployment with HPA and pod anti-affinity

Develop Helm Chart

Apply container-orchestration skill to create a Helm chart for the API service with staging and production values

Optimize Docker Image

Apply container-orchestration skill to create a multi-stage Dockerfile with security hardening and health checks

Success Output

When this skill is successfully applied, you MUST output:

✅ SKILL COMPLETE: container-orchestration

Completed:
- [x] Optimized Dockerfile created with multi-stage build
- [x] Kubernetes deployment configured with proper resource limits
- [x] Health checks implemented (liveness and readiness probes)
- [x] Horizontal Pod Autoscaler (HPA) configured
- [x] Helm chart created with environment-specific values

Outputs:
- Dockerfile: [file path] (optimized, security-hardened)
- Deployment manifest: [file path] (K8s deployment, service, HPA)
- Helm chart: [directory path] (Chart.yaml, values, templates)
- CI/CD integration: [file path] (deployment pipeline)

Completion Checklist

Before marking this skill as complete, verify:

Failure Indicators

This skill has FAILED if:

❌ Dockerfile produces bloated image (>500MB for typical app)
❌ Container runs as root user (security vulnerability)
❌ No health checks configured (can't detect unhealthy pods)
❌ No resource limits set (can consume all cluster resources)
❌ Probes misconfigured (pods restart unnecessarily or stay unhealthy)
❌ HPA not working (pods don't scale under load)
❌ Single pod deployment (no high availability)
❌ Secrets stored in plain text (ConfigMap instead of Secret)
❌ Deployment fails or pods crash on startup

When NOT to Use

Do NOT use this skill when:

Simple application with no scalability requirements (single VM suffices)
Development/local environment only (Docker Compose simpler)
Serverless architecture more appropriate (use Cloud Functions/Lambda instead)
Batch job or cron task (use Kubernetes CronJob, not Deployment)
Stateful application requiring persistent storage (use StatefulSet, not Deployment)

Use alternatives:

For local dev: Docker Compose
For serverless: Cloud Functions, AWS Lambda, Cloud Run
For batch jobs: Kubernetes CronJob or Job
For stateful apps: StatefulSet with PersistentVolumeClaim
For simple VMs: Traditional deployment tools (Ansible, Terraform)

Anti-Patterns (Avoid)

Anti-Pattern	Problem	Solution
Running as root	Security vulnerability	Create non-root user in Dockerfile
No resource limits	Can crash entire cluster	Always set requests and limits
Missing health checks	Can't detect/recover from failures	Implement /health endpoints and probes
Storing secrets in ConfigMap	Secrets visible in plain text	Use Kubernetes Secret with encryption at rest
Single replica	No high availability	Deploy ≥3 replicas with anti-affinity
No HPA	Manual scaling required	Configure HPA with appropriate metrics
Latest tag in production	Unpredictable deployments	Use semantic versioning (v1.2.3)
Large base images	Slow builds, large attack surface	Use alpine or distroless images

Principles

This skill embodies:

#3 Keep It Simple - Use managed services (GKE Autopilot) when possible
#4 Separation of Concerns - Separate config (ConfigMap) from secrets (Secret)
#8 No Assumptions - Verify deployment health with probes and monitoring
#11 Resilience and Robustness - HPA, anti-affinity, health checks ensure stability
Security by Default - Non-root user, minimal base images, secret management

Full Standard: CODITECT-STANDARD-AUTOMATION.md

Integration Points

cicd-pipeline-design - Deploy via CI/CD pipelines
monitoring-observability - Prometheus metrics, distributed tracing
infrastructure-as-code - Terraform cluster provisioning
multi-tenant-security - Network policies, pod security

When to Use This Skill​

How to Use This Skill​

Core Capabilities​

Optimized Dockerfile​

Kubernetes Deployment​

Helm Chart Structure​

Chart.yaml​

values.yaml​

templates/deployment.yaml​

GKE Cluster Configuration​

Service Mesh (Istio)​

Usage Examples​

Create Kubernetes Deployment​

Develop Helm Chart​

Optimize Docker Image​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns (Avoid)​

Principles​

Integration Points​