/postmortem - Blameless Incident Postmortem
Generate structured, blameless incident postmortems. Guides through timeline construction, 5-Whys root cause analysis, corrective action categorization (detect/prevent/mitigate/process), and produces professional postmortem documents.
System Prompt
EXECUTION DIRECTIVE: When the user invokes this command, you MUST:
- IMMEDIATELY execute - no questions first
- Load the skill at
skills/incident-postmortem-patterns/SKILL.md - Gather incident details - ID, description, timeline start, impact metrics
- Construct timeline - chronological sequence of events from detection to resolution
- Perform 5-Whys analysis - iterative root cause investigation
- Categorize corrective actions - detect (monitoring), prevent (safeguards), mitigate (response), process (improvements)
- Apply blameless principle - focus on systems and processes, not individuals
- Generate postmortem document - structured markdown, Confluence, or Notion format
- Extract action items - assign owners and due dates for follow-up
Usage
# Basic postmortem
/postmortem --incident "Database outage on 2026-02-01"
# With specific template
/postmortem --incident INC-1234 --template security
# From timeline start
/postmortem --incident "API degradation" --timeline-from "2026-02-01T14:30:00Z"
# Output format selection
/postmortem --incident INC-5678 --output confluence
# Action items only (for tracking)
/postmortem --incident INC-9999 --action-items-only
Options
| Option | Description | Default |
|---|---|---|
--incident | Incident ID or description | Interactive prompt |
--template | Template: standard|abbreviated|security | standard |
--timeline-from | Timeline start timestamp (ISO 8601) | Auto-detect |
--output | Output format: markdown|confluence|notion | markdown |
--action-items-only | Generate only corrective actions (no full postmortem) | false |
Related Commands
/incident-response- Real-time incident response workflow/triage- Defect triage for bugs discovered during incident/chaos-test- Validate fixes with chaos engineering
Success Output
✅ Postmortem Generated
━━━━━━━━━━━━━━━━━━━━━━━━━━
📋 Incident Summary
ID: INC-1234
Title: Database Connection Pool Exhaustion
Date: 2026-02-01
Duration: 2h 15m
Severity: S1 (Critical)
Impact: 100% of API requests failed, ~$45K revenue loss
⏱️ Timeline
14:30 - First alert: API error rate spike (PagerDuty)
14:32 - On-call engineer acknowledged
14:45 - Identified: Database connection pool exhausted
15:00 - Mitigation attempted: Restart application pods
15:15 - Mitigation failed: Pool exhausted again within 5m
15:30 - Root cause found: Connection leak in payment service
15:45 - Fix deployed: Patch connection handling + increase pool size
16:00 - Monitoring: Error rate dropping
16:45 - Resolved: All metrics returned to baseline
🔍 5-Whys Root Cause Analysis
1. Why did API fail? → Database connections exhausted
2. Why were connections exhausted? → Connection pool too small
3. Why was pool too small? → Payment service leaking connections
4. Why was service leaking? → Missing connection.close() in error path
5. Why was leak not detected? → No connection pool monitoring
Root Cause: Missing connection cleanup in error handling path, undetected due to lack of connection pool monitoring.
📝 Corrective Actions
🔍 DETECT (Monitoring & Alerting)
- [ ] Add connection pool utilization metrics (@alice, Due: 2026-02-08)
- [ ] Alert on pool >80% utilization (@bob, Due: 2026-02-08)
- [ ] Dashboard: connection pool health per service (@charlie, Due: 2026-02-15)
🛡️ PREVENT (Safeguards & Design)
- [ ] Code review checklist: verify connection cleanup (@dave, Due: 2026-02-05)
- [ ] Linter rule: flag missing close() in try-catch (@eve, Due: 2026-02-12)
- [ ] Connection pool sizing: calculate based on pod count (@alice, Due: 2026-02-15)
🚨 MITIGATE (Incident Response)
- [ ] Runbook: connection pool exhaustion diagnosis (@bob, Due: 2026-02-10)
- [ ] Auto-scaling: trigger on connection pool metric (@frank, Due: 2026-02-20)
📋 PROCESS (Organizational)
- [ ] Load testing: mandate connection pool stress test (@charlie, Due: 2026-02-12)
- [ ] Postmortem review: share learnings in engineering all-hands (@dave, Due: 2026-02-05)
📄 Document: postmortems/INC-1234-database-connection-pool-2026-02-01.md
🔗 Action Items: Exported to Linear/Jira
Completion Checklist
- Incident details gathered
- Timeline constructed (detection → resolution)
- 5-Whys analysis completed
- Root cause identified
- Corrective actions categorized (detect/prevent/mitigate/process)
- Action items assigned with owners and due dates
- Blameless language verified
- Postmortem document generated
- Stakeholders notified
- Action items tracked in project management tool
Failure Indicators
- ⛔ Blame language - postmortem targets individuals instead of systems
- ⛔ Shallow root cause - stopped at first "why" instead of iterating to systemic cause
- ⛔ No action items - postmortem documents but doesn't improve
- ⛔ Missing timeline - events described but not chronologically ordered
When NOT to Use
- ❌ Minor incidents that don't require formal postmortem (use incident log instead)
- ❌ Ongoing incidents (use
/incident-responsefirst, postmortem after resolution) - ❌ Security incidents requiring confidentiality (use
--template securitywith limited distribution)
Anti-Patterns
- ❌ Blame-focused language - "Engineer X forgot to..." instead of "Process lacks safeguard..."
- ❌ Skipping 5-Whys - jumping to solution without understanding root cause
- ❌ Action items without owners - corrective actions never get completed
- ❌ Postmortem without review - document written but never discussed with team
Principles
- #3 Complete Execution - Full postmortem lifecycle (timeline → analysis → actions → follow-up)
- #9 Based on Facts - Use objective timeline data and metrics, avoid speculation
Full Standard: CODITECT-STANDARD-AUTOMATION.md