Skip to main content

/canary-check - Canary Deployment Analysis

Compares canary instance metrics against baseline using statistical tests (Mann-Whitney U, Kolmogorov-Smirnov) to determine whether to advance, hold, or rollback a progressive deployment.

System Prompt

EXECUTION DIRECTIVE: When the user invokes this command, you MUST:

  1. IMMEDIATELY execute - no questions first
  2. Load the skill at skills/canary-analysis-patterns/SKILL.md
  3. Identify target service and current traffic step
  4. Query metrics from observability backend (Prometheus, Datadog, CloudWatch)
  5. Execute statistical comparison (Mann-Whitney U test, KS test)
  6. Apply threshold rules (error rate, latency p99, saturation)
  7. Render verdict: ADVANCE, HOLD, or ROLLBACK
  8. Optionally trigger auto-advance if configured

Usage

# Basic canary check for current deployment
/canary-check --service api-gateway

# Check specific traffic step
/canary-check --service api-gateway --step 10

# Override default metrics
/canary-check --service checkout --metrics error_rate,latency_p99,cpu_utilization

# Custom thresholds
/canary-check --service auth --threshold error_rate=0.01,latency_p99=500

# Auto-advance on pass
/canary-check --service payments --auto-advance

# CI integration with JSON output
/canary-check --service billing --json --ci

Options

OptionValuesDescription
--serviceservice nameTarget service to analyze (required)
--steppercentageCurrent traffic step (default: auto-detect from deployment)
--metricscomma-separatedCustom metric names (default: error_rate, latency_p99, saturation)
--thresholdkey=value pairsOverride thresholds (e.g., error_rate=0.005)
--auto-advanceflagAutomatically advance traffic on PASS verdict
--windowdurationMetrics observation window (default: 10m)
--jsonflagOutput results as JSON
--ciflagExit with non-zero code on ROLLBACK verdict
  • /release-gate - Combined quality gates for release readiness
  • /smoke-test - Quick functional validation after deployment
  • /flag-audit - Feature flag hygiene audit
  • /load-test - Performance testing under load

Success Output

✓ Canary Analysis Complete

Service: api-gateway
Traffic Step: 10% → canary, 90% → baseline
Observation Window: 10 minutes
Statistical Test: Mann-Whitney U (α=0.05)

Metrics Comparison:
✓ error_rate: canary=0.003 baseline=0.004 (p=0.42) PASS
✓ latency_p99: canary=245ms baseline=280ms (p=0.18) PASS
✓ saturation: canary=42% baseline=45% (p=0.31) PASS

Verdict: ADVANCE
Recommendation: Increase traffic to 25%

Next Steps:
1. Advance traffic: kubectl argo rollouts promote api-gateway
2. Wait 10 minutes for stabilization
3. Run /canary-check --service api-gateway --step 25

Completion Checklist

  • Target service identified
  • Current traffic step detected or specified
  • Metrics queried from observability backend
  • Canary and baseline populations sampled
  • Statistical tests executed (Mann-Whitney U, KS test)
  • Thresholds applied to each metric
  • Verdict calculated: ADVANCE / HOLD / ROLLBACK
  • Recommendation generated with next steps
  • Auto-advance triggered if enabled and verdict is PASS

Failure Indicators

  • Service not found: Specified service does not exist in deployment
  • No canary detected: Service is not in progressive deployment state
  • Insufficient data: Observation window too short or no traffic
  • Metrics query failure: Observability backend unavailable or query syntax error
  • ROLLBACK verdict: Canary metrics significantly worse than baseline (p < 0.05)
  • Threshold breach: One or more metrics exceed configured thresholds

When NOT to Use

  • Initial deployment: No baseline to compare against - use /smoke-test instead
  • Blue-green deployments: Use /smoke-test or /integration-test for validation
  • Load testing: Use /load-test for synthetic load validation
  • Security validation: Use /security-scan for vulnerability detection
  • Functional regression: Use /regression-test for behavior validation

Anti-Patterns

  • ❌ Running without sufficient traffic - statistical tests need minimum sample size (n ≥ 30)
  • ❌ Ignoring HOLD verdict - proceeding without investigation masks issues
  • ❌ Setting thresholds too loose - defeats purpose of gradual rollout
  • ❌ Auto-advancing without monitoring - removes human oversight from critical path
  • ❌ Using single metric - error rate alone misses latency degradation

Principles

  • #3 Complete Execution - Queries metrics, runs tests, renders verdict - no manual analysis
  • #9 Based on Facts - Uses statistical hypothesis testing, not eyeball comparison
  • #10 Self-Provisioning - Auto-installs metrics query libraries (prometheus-api-client) if missing
  • #11 Confirm Destructive Only - Auto-advance is opt-in; rollback requires confirmation

Full Standard: CODITECT-STANDARD-AUTOMATION.md