Post-Deploy Smoke Test
Purpose
- Execute health endpoint checks immediately after deployment to validate basic service availability
- Run critical path verification (user signup, auth flow, core API operations)
- Detect deployment issues within first 5 minutes before users are impacted
- Trigger automatic rollback if critical failures detected (non-blocking)
- Generate smoke test report with detailed failure analysis and remediation steps
Trigger
| Property | Value |
|---|---|
| Event | post-deploy |
| Blocking | No (can trigger rollback via side effect) |
| Timeout | 300s (5 minutes) |
| Failure Mode | If critical failures: Auto-rollback (requires ops team confirmation) |
| Rollback Command | kubectl set image deployment/service image=previous-tag -n prod |
Behavior
When Triggered
The hook executes immediately after deployment completes. It performs:
-
Health Checks: GET endpoints that verify service is responsive
/health- Basic readiness check/health/ready- Full dependency readiness/health/live- Liveness probe
-
Dependency Verification: Confirms external dependencies are accessible
- Database connectivity: SELECT 1 query with timeout
- Redis/cache layer: PING command
- Auth service: Token validation endpoint
- Message queue: Connection test
-
Critical Path Testing: Execute key user workflows
- User authentication (login/logout)
- Create/read operations on primary resources
- API rate limiting verification
Configuration
Create .coditect/config/smoke-test-hook.json:
{
"enabled": true,
"timeout_seconds": 300,
"auto_rollback": {
"enabled": true,
"critical_failure_threshold": 1,
"requires_ops_confirmation": true,
"rollback_timeout_seconds": 60
},
"health_checks": [
{
"name": "basic-health",
"endpoint": "/health",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 5,
"required": true
},
{
"name": "readiness",
"endpoint": "/health/ready",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 10,
"required": true
},
{
"name": "liveness",
"endpoint": "/health/live",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 5,
"required": false
}
],
"dependency_checks": [
{
"name": "database",
"type": "postgres",
"query": "SELECT 1",
"timeout_seconds": 5,
"required": true
},
{
"name": "redis-cache",
"type": "redis",
"command": "PING",
"timeout_seconds": 3,
"required": false
}
],
"critical_paths": [
{
"name": "user-login",
"steps": [
{"method": "POST", "endpoint": "/api/auth/login", "payload": {"email": "test@example.com", "password": "test"}},
{"method": "GET", "endpoint": "/api/user/profile", "required_status": 200}
],
"timeout_seconds": 15,
"required": true
},
{
"name": "core-api",
"steps": [
{"method": "GET", "endpoint": "/api/resources", "required_status": 200}
],
"timeout_seconds": 10,
"required": true
}
],
"rollback_strategy": {
"type": "previous-image",
"max_rollback_attempts": 1,
"notify_channels": ["slack-ops", "email-on-call"]
}
}
Integration
The hook integrates with:
- Skill:
smoke-testing-patterns- Health check and test scenarios - Deployment Tool: Executes after K8s deployment completes
- Rollback System: Can trigger auto-rollback on critical failures
- Monitoring: Feeds smoke test results to monitoring dashboard
- Notifications: Alerts on-call team if rollback triggered
Output
All Tests Passing (No Output)
✓ Smoke tests passed (all health checks OK)
- Health check: OK
- Dependencies: OK (db, redis)
- Critical paths: OK (login, core-api)
Deployment stable. Monitoring active.
Warnings, Tests Continue
⚠ WARNING: Non-critical test failed (deployment continues)
- Liveness check: Timeout (non-critical)
- Redis cache: Unavailable (optional dependency)
Impact: Minor feature degradation
Monitoring: Alert if pattern persists
Action: Check redis-cache status in next 30min
Critical Failure Detected
✗ CRITICAL: Smoke tests failed - Triggering rollback
CRITICAL FAILURES:
1. Health Check: /health/ready returned 503
Issue: Database connection failed
Impact: CRITICAL - Service cannot handle requests
2. Critical Path: User login failed (500 error)
Issue: Auth service timeout during authentication
Impact: CRITICAL - Core functionality broken
Automatic rollback initiated:
Current: django:v1.22.0-context-api
Rollback to: django:v1.21.8-stable
Status: Rolling back 3/3 pods...
Timeline:
14:32:15 - Deployment completed
14:32:17 - Health check started
14:32:25 - Critical failures detected
14:32:35 - Rollback started
14:33:15 - Rollback complete
Notification sent to @on-call team
Incident ticket created: INC-2026-0542
Failure Handling
| Scenario | Action | Rollback |
|---|---|---|
| Critical failure detected | Auto-rollback (needs ops confirm) | Yes |
| Multiple failures detected | Immediate rollback | Yes |
| Timeout during test | Timeout recorded, continue other tests | Conditional |
| Dependency unavailable | Mark as failure (required) or warning (optional) | Conditional |
| Test execution error | Log error, mark as warning | No |
| Rollback fails | Alert ops team immediately | N/A |
Error Recovery:
# Check smoke test results
kubectl logs -f deploy/django -c smoke-tests -n prod
# Manual rollback if auto-rollback failed
kubectl set image deployment/django django=django:v1.21.8-stable -n prod
# Re-run smoke tests manually
make smoke-tests --environment prod
# Review failure in monitoring dashboard
kubectl port-forward svc/monitoring 3000:3000 -n monitoring
# Visit http://localhost:3000/deployments/django
Related Hooks
| Hook | Timing | Relationship | Purpose |
|---|---|---|---|
pre-deploy-release-gate.md | Pre-deploy | Upstream | Gates deployment; smoke test validates gate didn't miss issues |
post-deploy-canary-monitor.md | Post-deploy (continuous) | Parallel | Continuous monitoring after smoke tests pass |
post-deploy-metric-dashboard.md | Post-deploy | Downstream | Displays metrics after smoke tests confirm stability |
ci-integration-hook.md | CI pipeline | Integration | Provides deployment artifact info to smoke tests |
Principles
- Fast Feedback: 5-minute timeout ensures issues detected before user impact
- Critical-First: Stops on critical failures; warnings don't block
- Auto-Recovery: Automatic rollback on critical failures respects human oversight
- Transparent Results: All tests logged; full report available for postmortem
- Dependency Awareness: Differentiates required vs optional dependencies
- Test Variety: Health checks + dependencies + critical paths = comprehensive coverage
- Skill-Driven: Test scenarios and health check logic managed by smoke-testing-patterns skill
Related Documentation:
- ADR-183 - Governance hook architecture
- ADR-060 - MoE verification layer
- skills/smoke-testing-patterns/SKILL.md - Test patterns