/chaos-test - Chaos Engineering Experiment Execution
Execute controlled chaos engineering experiments to validate system resilience. Defines steady-state hypotheses, injects faults across network, resource, application, and infrastructure layers, monitors impact, and generates resilience reports.
System Prompt
EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions first
- Load the skill at
skills/chaos-engineering-patterns/SKILL.md - Verify confirmation - this command injects faults and requires explicit user confirmation
- Define steady-state hypothesis - establish baseline metrics and success criteria
- Select fault type - network, resource, application, or infrastructure
- Inject controlled fault - apply fault to target service/pod for specified duration
- Monitor impact - track metrics, logs, and system behavior during fault injection
- Generate resilience report - document experiment results, blast radius, and recovery behavior
- Abort mechanism - provide emergency stop if fault causes unexpected cascade
Usage
# Basic experiment
/chaos-test --experiment network-partition --target api-service --duration 60s
# Specific fault type
/chaos-test --fault-type network --target payments-pod --duration 120s
# Preview without injection
/chaos-test --experiment cpu-stress --dry-run
# Emergency abort
/chaos-test --abort
# Full GameDay mode
/chaos-test --gameday --experiment black-friday-simulation
Options
| Option | Description | Default |
|---|---|---|
--experiment | Experiment name or file path | Interactive selection |
--fault-type | Fault category: network|resource|application|infrastructure | Auto-detect from experiment |
--target | Target service/pod/deployment | All services |
--duration | Fault injection duration (e.g., 60s, 5m) | 60s |
--dry-run | Preview experiment without fault injection | false |
--abort | Emergency stop ongoing experiment | N/A |
--gameday | Full GameDay mode with team coordination | false |
Related Commands
/canary-check- Validate canary deployments before chaos testing/smoke-test- Run smoke tests to establish baseline before chaos/postmortem- Generate postmortem after chaos experiment reveals issues
Success Output
✅ Chaos Experiment Complete
━━━━━━━━━━━━━━━━━━━━━━━━━━
Experiment: network-partition-api
Target: payments-service
Duration: 120s
Fault Type: Network (latency +500ms)
📊 Steady-State Metrics
- Pre-fault: 99.95% success rate, p99 200ms
- During fault: 98.2% success rate, p99 850ms
- Post-fault: 99.94% success rate, p99 210ms
🔍 Resilience Findings
✅ Circuit breaker activated after 15s
✅ Failover to backup service successful
⚠️ Recovery took 45s (target: 30s)
❌ 3 customer transactions failed (retry failed)
📝 Report: chaos-reports/network-partition-api-2026-02-01.md
Completion Checklist
- Steady-state hypothesis defined
- Baseline metrics captured
- Fault injected to target service
- Impact monitored in real-time
- Blast radius contained
- System recovered to steady-state
- Resilience report generated
- Action items identified for improvement
Failure Indicators
- ⛔ Uncontrolled cascade - fault spreads beyond expected blast radius
- ⛔ No recovery - system does not return to steady-state after fault removal
- ⛔ Monitoring gap - unable to measure impact metrics
- ⛔ Abort failed - emergency stop mechanism does not work
When NOT to Use
- ❌ Production systems without GameDay approval and stakeholder notification
- ❌ Systems with known critical issues or ongoing incidents
- ❌ Services without circuit breakers or resilience patterns
- ❌ Environments lacking monitoring and observability
Anti-Patterns
- ❌ Running chaos experiments without steady-state hypothesis
- ❌ Injecting faults without abort mechanism
- ❌ Testing in production without runbook or incident response plan
- ❌ Ignoring blast radius - causing unintended system-wide outages
Principles
- #3 Complete Execution - Execute full experiment lifecycle (hypothesis → inject → monitor → report)
- #9 Based on Facts - Use metrics and observability data to measure resilience
Full Standard: CODITECT-STANDARD-AUTOMATION.md