/canary-check - Canary Deployment Analysis

Compares canary instance metrics against baseline using statistical tests (Mann-Whitney U, Kolmogorov-Smirnov) to determine whether to advance, hold, or rollback a progressive deployment.

System Prompt

EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

IMMEDIATELY execute - no questions first
Load the skill at skills/canary-analysis-patterns/SKILL.md
Identify target service and current traffic step
Query metrics from observability backend (Prometheus, Datadog, CloudWatch)
Execute statistical comparison (Mann-Whitney U test, KS test)
Apply threshold rules (error rate, latency p99, saturation)
Render verdict: ADVANCE, HOLD, or ROLLBACK
Optionally trigger auto-advance if configured

Usage

# Basic canary check for current deployment
/canary-check --service api-gateway

# Check specific traffic step
/canary-check --service api-gateway --step 10

# Override default metrics
/canary-check --service checkout --metrics error_rate,latency_p99,cpu_utilization

# Custom thresholds
/canary-check --service auth --threshold error_rate=0.01,latency_p99=500

# Auto-advance on pass
/canary-check --service payments --auto-advance

# CI integration with JSON output
/canary-check --service billing --json --ci

Options

Option	Values	Description
`--service`	service name	Target service to analyze (required)
`--step`	percentage	Current traffic step (default: auto-detect from deployment)
`--metrics`	comma-separated	Custom metric names (default: error_rate, latency_p99, saturation)
`--threshold`	key=value pairs	Override thresholds (e.g., `error_rate=0.005`)
`--auto-advance`	flag	Automatically advance traffic on PASS verdict
`--window`	duration	Metrics observation window (default: 10m)
`--json`	flag	Output results as JSON
`--ci`	flag	Exit with non-zero code on ROLLBACK verdict

/release-gate - Combined quality gates for release readiness
/smoke-test - Quick functional validation after deployment
/flag-audit - Feature flag hygiene audit
/load-test - Performance testing under load

Success Output

✓ Canary Analysis Complete

Service: api-gateway
Traffic Step: 10% → canary, 90% → baseline
Observation Window: 10 minutes
Statistical Test: Mann-Whitney U (α=0.05)

Metrics Comparison:
  ✓ error_rate:     canary=0.003 baseline=0.004 (p=0.42) PASS
  ✓ latency_p99:    canary=245ms baseline=280ms (p=0.18) PASS
  ✓ saturation:     canary=42%   baseline=45%   (p=0.31) PASS

Verdict: ADVANCE
Recommendation: Increase traffic to 25%

Next Steps:
  1. Advance traffic: kubectl argo rollouts promote api-gateway
  2. Wait 10 minutes for stabilization
  3. Run /canary-check --service api-gateway --step 25

Completion Checklist

Target service identified
Current traffic step detected or specified
Metrics queried from observability backend
Canary and baseline populations sampled
Statistical tests executed (Mann-Whitney U, KS test)
Thresholds applied to each metric
Verdict calculated: ADVANCE / HOLD / ROLLBACK
Recommendation generated with next steps
Auto-advance triggered if enabled and verdict is PASS

Failure Indicators

Service not found: Specified service does not exist in deployment
No canary detected: Service is not in progressive deployment state
Insufficient data: Observation window too short or no traffic
Metrics query failure: Observability backend unavailable or query syntax error
ROLLBACK verdict: Canary metrics significantly worse than baseline (p < 0.05)
Threshold breach: One or more metrics exceed configured thresholds

When NOT to Use

Initial deployment: No baseline to compare against - use /smoke-test instead
Blue-green deployments: Use /smoke-test or /integration-test for validation
Load testing: Use /load-test for synthetic load validation
Security validation: Use /security-scan for vulnerability detection
Functional regression: Use /regression-test for behavior validation

Anti-Patterns

❌ Running without sufficient traffic - statistical tests need minimum sample size (n ≥ 30)
❌ Ignoring HOLD verdict - proceeding without investigation masks issues
❌ Setting thresholds too loose - defeats purpose of gradual rollout
❌ Auto-advancing without monitoring - removes human oversight from critical path
❌ Using single metric - error rate alone misses latency degradation

Principles

#3 Complete Execution - Queries metrics, runs tests, renders verdict - no manual analysis
#9 Based on Facts - Uses statistical hypothesis testing, not eyeball comparison
#10 Self-Provisioning - Auto-installs metrics query libraries (prometheus-api-client) if missing
#11 Confirm Destructive Only - Auto-advance is opt-in; rollback requires confirmation

Full Standard: CODITECT-STANDARD-AUTOMATION.md

System Prompt​

Usage​

Options​

Related Commands​

Success Output​

Completion Checklist​

Failure Indicators​

When NOT to Use​

Anti-Patterns​

Principles​