Post-Deploy Smoke Test

Purpose

Execute health endpoint checks immediately after deployment to validate basic service availability
Run critical path verification (user signup, auth flow, core API operations)
Detect deployment issues within first 5 minutes before users are impacted
Trigger automatic rollback if critical failures detected (non-blocking)
Generate smoke test report with detailed failure analysis and remediation steps

Trigger

Property	Value
Event	`post-deploy`
Blocking	No (can trigger rollback via side effect)
Timeout	300s (5 minutes)
Failure Mode	If critical failures: Auto-rollback (requires ops team confirmation)
Rollback Command	`kubectl set image deployment/service image=previous-tag -n prod`

Behavior

When Triggered

The hook executes immediately after deployment completes. It performs:

Health Checks: GET endpoints that verify service is responsive
- /health - Basic readiness check
- /health/ready - Full dependency readiness
- /health/live - Liveness probe
Dependency Verification: Confirms external dependencies are accessible
- Database connectivity: SELECT 1 query with timeout
- Redis/cache layer: PING command
- Auth service: Token validation endpoint
- Message queue: Connection test
Critical Path Testing: Execute key user workflows
- User authentication (login/logout)
- Create/read operations on primary resources
- API rate limiting verification

Configuration

Create .coditect/config/smoke-test-hook.json:

{
  "enabled": true,
  "timeout_seconds": 300,
  "auto_rollback": {
    "enabled": true,
    "critical_failure_threshold": 1,
    "requires_ops_confirmation": true,
    "rollback_timeout_seconds": 60
  },
  "health_checks": [
    {
      "name": "basic-health",
      "endpoint": "/health",
      "method": "GET",
      "expected_status": 200,
      "timeout_seconds": 5,
      "required": true
    },
    {
      "name": "readiness",
      "endpoint": "/health/ready",
      "method": "GET",
      "expected_status": 200,
      "timeout_seconds": 10,
      "required": true
    },
    {
      "name": "liveness",
      "endpoint": "/health/live",
      "method": "GET",
      "expected_status": 200,
      "timeout_seconds": 5,
      "required": false
    }
  ],
  "dependency_checks": [
    {
      "name": "database",
      "type": "postgres",
      "query": "SELECT 1",
      "timeout_seconds": 5,
      "required": true
    },
    {
      "name": "redis-cache",
      "type": "redis",
      "command": "PING",
      "timeout_seconds": 3,
      "required": false
    }
  ],
  "critical_paths": [
    {
      "name": "user-login",
      "steps": [
        {"method": "POST", "endpoint": "/api/auth/login", "payload": {"email": "test@example.com", "password": "test"}},
        {"method": "GET", "endpoint": "/api/user/profile", "required_status": 200}
      ],
      "timeout_seconds": 15,
      "required": true
    },
    {
      "name": "core-api",
      "steps": [
        {"method": "GET", "endpoint": "/api/resources", "required_status": 200}
      ],
      "timeout_seconds": 10,
      "required": true
    }
  ],
  "rollback_strategy": {
    "type": "previous-image",
    "max_rollback_attempts": 1,
    "notify_channels": ["slack-ops", "email-on-call"]
  }
}

Integration

The hook integrates with:

Skill: smoke-testing-patterns - Health check and test scenarios
Deployment Tool: Executes after K8s deployment completes
Rollback System: Can trigger auto-rollback on critical failures
Monitoring: Feeds smoke test results to monitoring dashboard
Notifications: Alerts on-call team if rollback triggered

Output

All Tests Passing (No Output)

✓ Smoke tests passed (all health checks OK)
  - Health check: OK
  - Dependencies: OK (db, redis)
  - Critical paths: OK (login, core-api)
  Deployment stable. Monitoring active.

Warnings, Tests Continue

⚠ WARNING: Non-critical test failed (deployment continues)
  - Liveness check: Timeout (non-critical)
  - Redis cache: Unavailable (optional dependency)
  
  Impact: Minor feature degradation
  Monitoring: Alert if pattern persists
  Action: Check redis-cache status in next 30min

Critical Failure Detected

✗ CRITICAL: Smoke tests failed - Triggering rollback

CRITICAL FAILURES:
  1. Health Check: /health/ready returned 503
     Issue: Database connection failed
     Impact: CRITICAL - Service cannot handle requests
     
  2. Critical Path: User login failed (500 error)
     Issue: Auth service timeout during authentication
     Impact: CRITICAL - Core functionality broken
     
Automatic rollback initiated:
  Current: django:v1.22.0-context-api
  Rollback to: django:v1.21.8-stable
  Status: Rolling back 3/3 pods...
  
Timeline:
  14:32:15 - Deployment completed
  14:32:17 - Health check started
  14:32:25 - Critical failures detected
  14:32:35 - Rollback started
  14:33:15 - Rollback complete

Notification sent to @on-call team
Incident ticket created: INC-2026-0542

Failure Handling

Scenario	Action	Rollback
Critical failure detected	Auto-rollback (needs ops confirm)	Yes
Multiple failures detected	Immediate rollback	Yes
Timeout during test	Timeout recorded, continue other tests	Conditional
Dependency unavailable	Mark as failure (required) or warning (optional)	Conditional
Test execution error	Log error, mark as warning	No
Rollback fails	Alert ops team immediately	N/A

Error Recovery:

# Check smoke test results
kubectl logs -f deploy/django -c smoke-tests -n prod

# Manual rollback if auto-rollback failed
kubectl set image deployment/django django=django:v1.21.8-stable -n prod

# Re-run smoke tests manually
make smoke-tests --environment prod

# Review failure in monitoring dashboard
kubectl port-forward svc/monitoring 3000:3000 -n monitoring
# Visit http://localhost:3000/deployments/django

Hook	Timing	Relationship	Purpose
`pre-deploy-release-gate.md`	Pre-deploy	Upstream	Gates deployment; smoke test validates gate didn't miss issues
`post-deploy-canary-monitor.md`	Post-deploy (continuous)	Parallel	Continuous monitoring after smoke tests pass
`post-deploy-metric-dashboard.md`	Post-deploy	Downstream	Displays metrics after smoke tests confirm stability
`ci-integration-hook.md`	CI pipeline	Integration	Provides deployment artifact info to smoke tests

Principles

Fast Feedback: 5-minute timeout ensures issues detected before user impact
Critical-First: Stops on critical failures; warnings don't block
Auto-Recovery: Automatic rollback on critical failures respects human oversight
Transparent Results: All tests logged; full report available for postmortem
Dependency Awareness: Differentiates required vs optional dependencies
Test Variety: Health checks + dependencies + critical paths = comprehensive coverage
Skill-Driven: Test scenarios and health check logic managed by smoke-testing-patterns skill

Related Documentation:

ADR-183 - Governance hook architecture
ADR-060 - MoE verification layer
skills/smoke-testing-patterns/SKILL.md - Test patterns

Purpose​

Trigger​

Behavior​

When Triggered​

Configuration​

Integration​

Output​

Failure Handling​

Related Hooks​

Principles​