/chaos-test - Chaos Engineering Experiment Execution

Execute controlled chaos engineering experiments to validate system resilience. Defines steady-state hypotheses, injects faults across network, resource, application, and infrastructure layers, monitors impact, and generates resilience reports.

System Prompt

EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

IMMEDIATELY execute - no questions first
Load the skill at skills/chaos-engineering-patterns/SKILL.md
Verify confirmation - this command injects faults and requires explicit user confirmation
Define steady-state hypothesis - establish baseline metrics and success criteria
Select fault type - network, resource, application, or infrastructure
Inject controlled fault - apply fault to target service/pod for specified duration
Monitor impact - track metrics, logs, and system behavior during fault injection
Generate resilience report - document experiment results, blast radius, and recovery behavior
Abort mechanism - provide emergency stop if fault causes unexpected cascade

Usage

# Basic experiment
/chaos-test --experiment network-partition --target api-service --duration 60s

# Specific fault type
/chaos-test --fault-type network --target payments-pod --duration 120s

# Preview without injection
/chaos-test --experiment cpu-stress --dry-run

# Emergency abort
/chaos-test --abort

# Full GameDay mode
/chaos-test --gameday --experiment black-friday-simulation

Options

Option	Description	Default
`--experiment`	Experiment name or file path	Interactive selection
`--fault-type`	Fault category: network\|resource\|application\|infrastructure	Auto-detect from experiment
`--target`	Target service/pod/deployment	All services
`--duration`	Fault injection duration (e.g., 60s, 5m)	60s
`--dry-run`	Preview experiment without fault injection	false
`--abort`	Emergency stop ongoing experiment	N/A
`--gameday`	Full GameDay mode with team coordination	false

/canary-check - Validate canary deployments before chaos testing
/smoke-test - Run smoke tests to establish baseline before chaos
/postmortem - Generate postmortem after chaos experiment reveals issues

Success Output

✅ Chaos Experiment Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━
Experiment: network-partition-api
Target: payments-service
Duration: 120s
Fault Type: Network (latency +500ms)

📊 Steady-State Metrics
- Pre-fault: 99.95% success rate, p99 200ms
- During fault: 98.2% success rate, p99 850ms
- Post-fault: 99.94% success rate, p99 210ms

🔍 Resilience Findings
✅ Circuit breaker activated after 15s
✅ Failover to backup service successful
⚠️ Recovery took 45s (target: 30s)
❌ 3 customer transactions failed (retry failed)

📝 Report: chaos-reports/network-partition-api-2026-02-01.md

Completion Checklist

Steady-state hypothesis defined
Baseline metrics captured
Fault injected to target service
Impact monitored in real-time
Blast radius contained
System recovered to steady-state
Resilience report generated
Action items identified for improvement

Failure Indicators

⛔ Uncontrolled cascade - fault spreads beyond expected blast radius
⛔ No recovery - system does not return to steady-state after fault removal
⛔ Monitoring gap - unable to measure impact metrics
⛔ Abort failed - emergency stop mechanism does not work

When NOT to Use

❌ Production systems without GameDay approval and stakeholder notification
❌ Systems with known critical issues or ongoing incidents
❌ Services without circuit breakers or resilience patterns
❌ Environments lacking monitoring and observability

Anti-Patterns

❌ Running chaos experiments without steady-state hypothesis
❌ Injecting faults without abort mechanism
❌ Testing in production without runbook or incident response plan
❌ Ignoring blast radius - causing unintended system-wide outages

Principles

#3 Complete Execution - Execute full experiment lifecycle (hypothesis → inject → monitor → report)
#9 Based on Facts - Use metrics and observability data to measure resilience

Full Standard: CODITECT-STANDARD-AUTOMATION.md

System Prompt​

Usage​

Options​

Related Commands​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns​

Principles​