Skip to main content

Post-Deploy Smoke Test

Purpose

  1. Execute health endpoint checks immediately after deployment to validate basic service availability
  2. Run critical path verification (user signup, auth flow, core API operations)
  3. Detect deployment issues within first 5 minutes before users are impacted
  4. Trigger automatic rollback if critical failures detected (non-blocking)
  5. Generate smoke test report with detailed failure analysis and remediation steps

Trigger

PropertyValue
Eventpost-deploy
BlockingNo (can trigger rollback via side effect)
Timeout300s (5 minutes)
Failure ModeIf critical failures: Auto-rollback (requires ops team confirmation)
Rollback Commandkubectl set image deployment/service image=previous-tag -n prod

Behavior

When Triggered

The hook executes immediately after deployment completes. It performs:

  • Health Checks: GET endpoints that verify service is responsive

    • /health - Basic readiness check
    • /health/ready - Full dependency readiness
    • /health/live - Liveness probe
  • Dependency Verification: Confirms external dependencies are accessible

    • Database connectivity: SELECT 1 query with timeout
    • Redis/cache layer: PING command
    • Auth service: Token validation endpoint
    • Message queue: Connection test
  • Critical Path Testing: Execute key user workflows

    • User authentication (login/logout)
    • Create/read operations on primary resources
    • API rate limiting verification

Configuration

Create .coditect/config/smoke-test-hook.json:

{
"enabled": true,
"timeout_seconds": 300,
"auto_rollback": {
"enabled": true,
"critical_failure_threshold": 1,
"requires_ops_confirmation": true,
"rollback_timeout_seconds": 60
},
"health_checks": [
{
"name": "basic-health",
"endpoint": "/health",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 5,
"required": true
},
{
"name": "readiness",
"endpoint": "/health/ready",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 10,
"required": true
},
{
"name": "liveness",
"endpoint": "/health/live",
"method": "GET",
"expected_status": 200,
"timeout_seconds": 5,
"required": false
}
],
"dependency_checks": [
{
"name": "database",
"type": "postgres",
"query": "SELECT 1",
"timeout_seconds": 5,
"required": true
},
{
"name": "redis-cache",
"type": "redis",
"command": "PING",
"timeout_seconds": 3,
"required": false
}
],
"critical_paths": [
{
"name": "user-login",
"steps": [
{"method": "POST", "endpoint": "/api/auth/login", "payload": {"email": "test@example.com", "password": "test"}},
{"method": "GET", "endpoint": "/api/user/profile", "required_status": 200}
],
"timeout_seconds": 15,
"required": true
},
{
"name": "core-api",
"steps": [
{"method": "GET", "endpoint": "/api/resources", "required_status": 200}
],
"timeout_seconds": 10,
"required": true
}
],
"rollback_strategy": {
"type": "previous-image",
"max_rollback_attempts": 1,
"notify_channels": ["slack-ops", "email-on-call"]
}
}

Integration

The hook integrates with:

  • Skill: smoke-testing-patterns - Health check and test scenarios
  • Deployment Tool: Executes after K8s deployment completes
  • Rollback System: Can trigger auto-rollback on critical failures
  • Monitoring: Feeds smoke test results to monitoring dashboard
  • Notifications: Alerts on-call team if rollback triggered

Output

All Tests Passing (No Output)

✓ Smoke tests passed (all health checks OK)
- Health check: OK
- Dependencies: OK (db, redis)
- Critical paths: OK (login, core-api)
Deployment stable. Monitoring active.

Warnings, Tests Continue

⚠ WARNING: Non-critical test failed (deployment continues)
- Liveness check: Timeout (non-critical)
- Redis cache: Unavailable (optional dependency)

Impact: Minor feature degradation
Monitoring: Alert if pattern persists
Action: Check redis-cache status in next 30min

Critical Failure Detected

✗ CRITICAL: Smoke tests failed - Triggering rollback

CRITICAL FAILURES:
1. Health Check: /health/ready returned 503
Issue: Database connection failed
Impact: CRITICAL - Service cannot handle requests

2. Critical Path: User login failed (500 error)
Issue: Auth service timeout during authentication
Impact: CRITICAL - Core functionality broken

Automatic rollback initiated:
Current: django:v1.22.0-context-api
Rollback to: django:v1.21.8-stable
Status: Rolling back 3/3 pods...

Timeline:
14:32:15 - Deployment completed
14:32:17 - Health check started
14:32:25 - Critical failures detected
14:32:35 - Rollback started
14:33:15 - Rollback complete

Notification sent to @on-call team
Incident ticket created: INC-2026-0542

Failure Handling

ScenarioActionRollback
Critical failure detectedAuto-rollback (needs ops confirm)Yes
Multiple failures detectedImmediate rollbackYes
Timeout during testTimeout recorded, continue other testsConditional
Dependency unavailableMark as failure (required) or warning (optional)Conditional
Test execution errorLog error, mark as warningNo
Rollback failsAlert ops team immediatelyN/A

Error Recovery:

# Check smoke test results
kubectl logs -f deploy/django -c smoke-tests -n prod

# Manual rollback if auto-rollback failed
kubectl set image deployment/django django=django:v1.21.8-stable -n prod

# Re-run smoke tests manually
make smoke-tests --environment prod

# Review failure in monitoring dashboard
kubectl port-forward svc/monitoring 3000:3000 -n monitoring
# Visit http://localhost:3000/deployments/django
HookTimingRelationshipPurpose
pre-deploy-release-gate.mdPre-deployUpstreamGates deployment; smoke test validates gate didn't miss issues
post-deploy-canary-monitor.mdPost-deploy (continuous)ParallelContinuous monitoring after smoke tests pass
post-deploy-metric-dashboard.mdPost-deployDownstreamDisplays metrics after smoke tests confirm stability
ci-integration-hook.mdCI pipelineIntegrationProvides deployment artifact info to smoke tests

Principles

  1. Fast Feedback: 5-minute timeout ensures issues detected before user impact
  2. Critical-First: Stops on critical failures; warnings don't block
  3. Auto-Recovery: Automatic rollback on critical failures respects human oversight
  4. Transparent Results: All tests logged; full report available for postmortem
  5. Dependency Awareness: Differentiates required vs optional dependencies
  6. Test Variety: Health checks + dependencies + critical paths = comprehensive coverage
  7. Skill-Driven: Test scenarios and health check logic managed by smoke-testing-patterns skill

Related Documentation: