/canary-check - Canary Deployment Analysis
Compares canary instance metrics against baseline using statistical tests (Mann-Whitney U, Kolmogorov-Smirnov) to determine whether to advance, hold, or rollback a progressive deployment.
System Prompt
EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions first
- Load the skill at
skills/canary-analysis-patterns/SKILL.md - Identify target service and current traffic step
- Query metrics from observability backend (Prometheus, Datadog, CloudWatch)
- Execute statistical comparison (Mann-Whitney U test, KS test)
- Apply threshold rules (error rate, latency p99, saturation)
- Render verdict: ADVANCE, HOLD, or ROLLBACK
- Optionally trigger auto-advance if configured
Usage
# Basic canary check for current deployment
/canary-check --service api-gateway
# Check specific traffic step
/canary-check --service api-gateway --step 10
# Override default metrics
/canary-check --service checkout --metrics error_rate,latency_p99,cpu_utilization
# Custom thresholds
/canary-check --service auth --threshold error_rate=0.01,latency_p99=500
# Auto-advance on pass
/canary-check --service payments --auto-advance
# CI integration with JSON output
/canary-check --service billing --json --ci
Options
| Option | Values | Description |
|---|---|---|
--service | service name | Target service to analyze (required) |
--step | percentage | Current traffic step (default: auto-detect from deployment) |
--metrics | comma-separated | Custom metric names (default: error_rate, latency_p99, saturation) |
--threshold | key=value pairs | Override thresholds (e.g., error_rate=0.005) |
--auto-advance | flag | Automatically advance traffic on PASS verdict |
--window | duration | Metrics observation window (default: 10m) |
--json | flag | Output results as JSON |
--ci | flag | Exit with non-zero code on ROLLBACK verdict |
Related Commands
/release-gate- Combined quality gates for release readiness/smoke-test- Quick functional validation after deployment/flag-audit- Feature flag hygiene audit/load-test- Performance testing under load
Success Output
✓ Canary Analysis Complete
Service: api-gateway
Traffic Step: 10% → canary, 90% → baseline
Observation Window: 10 minutes
Statistical Test: Mann-Whitney U (α=0.05)
Metrics Comparison:
✓ error_rate: canary=0.003 baseline=0.004 (p=0.42) PASS
✓ latency_p99: canary=245ms baseline=280ms (p=0.18) PASS
✓ saturation: canary=42% baseline=45% (p=0.31) PASS
Verdict: ADVANCE
Recommendation: Increase traffic to 25%
Next Steps:
1. Advance traffic: kubectl argo rollouts promote api-gateway
2. Wait 10 minutes for stabilization
3. Run /canary-check --service api-gateway --step 25
Completion Checklist
- Target service identified
- Current traffic step detected or specified
- Metrics queried from observability backend
- Canary and baseline populations sampled
- Statistical tests executed (Mann-Whitney U, KS test)
- Thresholds applied to each metric
- Verdict calculated: ADVANCE / HOLD / ROLLBACK
- Recommendation generated with next steps
- Auto-advance triggered if enabled and verdict is PASS
Failure Indicators
- Service not found: Specified service does not exist in deployment
- No canary detected: Service is not in progressive deployment state
- Insufficient data: Observation window too short or no traffic
- Metrics query failure: Observability backend unavailable or query syntax error
- ROLLBACK verdict: Canary metrics significantly worse than baseline (p < 0.05)
- Threshold breach: One or more metrics exceed configured thresholds
When NOT to Use
- Initial deployment: No baseline to compare against - use
/smoke-testinstead - Blue-green deployments: Use
/smoke-testor/integration-testfor validation - Load testing: Use
/load-testfor synthetic load validation - Security validation: Use
/security-scanfor vulnerability detection - Functional regression: Use
/regression-testfor behavior validation
Anti-Patterns
- ❌ Running without sufficient traffic - statistical tests need minimum sample size (n ≥ 30)
- ❌ Ignoring HOLD verdict - proceeding without investigation masks issues
- ❌ Setting thresholds too loose - defeats purpose of gradual rollout
- ❌ Auto-advancing without monitoring - removes human oversight from critical path
- ❌ Using single metric - error rate alone misses latency degradation
Principles
- #3 Complete Execution - Queries metrics, runs tests, renders verdict - no manual analysis
- #9 Based on Facts - Uses statistical hypothesis testing, not eyeball comparison
- #10 Self-Provisioning - Auto-installs metrics query libraries (prometheus-api-client) if missing
- #11 Confirm Destructive Only - Auto-advance is opt-in; rollback requires confirmation
Full Standard: CODITECT-STANDARD-AUTOMATION.md