Skip to main content

Post-Incident Postmortem Initiator

Purpose

  1. Auto-initiates postmortem workflow when incidents transition to resolved state
  2. Creates postmortem document from standardized template
  3. Assigns postmortem owner (typically incident commander or senior engineer)
  4. Schedules review meeting within 48 hours of resolution
  5. Creates corrective action (CAP) tickets for four categories: detection, prevention, mitigation, process
  6. Sends notifications to incident participants and stakeholders
  7. Ensures organizational learning from every incident without blame

Trigger

Event Type: Incident Resolution Event

Source: Incident management system (PagerDuty, Opsgenie, custom webhook)

Blocking: No (non-blocking background job)

Timeout: 60 seconds

Trigger Condition: Incident status changes to resolved or closed after being in active or acknowledged state

Behavior

When Triggered

  1. Receives incident webhook with:
    • Incident ID, title, severity (S1-S4), duration
    • Incident commander, responders list
    • Start time, detection time, resolution time
    • Brief summary/description
  2. Creates postmortem document in shared wiki/docs platform:
    • Title: [POSTMORTEM-YYYY-MM-DD] {incident_title}
    • Sections: Summary, Timeline, Root Cause Analysis, Impact Assessment, Action Items
    • Pre-fills with incident metadata and responders
    • Sets document permissions for team access
  3. Identifies postmortem owner:
    • Priority 1: Incident commander (if available)
    • Priority 2: Most senior responder
    • Priority 3: On-call manager
  4. Assigns owner in postmortem document and sends notification
  5. Schedules postmortem review meeting:
    • For S1: Within 24 hours
    • For S2/S3: Within 48 hours
    • For S4: Within 5 business days
    • Includes all responders + stakeholders
    • Adds to team calendar with reminder 24h before
  6. Creates CAP tickets for four areas:
    • Detection: How can we detect this earlier next time?
    • Prevention: How can we prevent this from happening?
    • Mitigation: How can we reduce impact if it does happen?
    • Process: What process gaps contributed to this?
  7. Links all CAP tickets to postmortem doc and original incident
  8. Sends notification to #incidents and #incident-postmortems channels

Configuration

# .coditect/config/postmortem-hook.json
{
"enabled": true,
"timeout_seconds": 60,
"incident_provider": "pagerduty",
"incident_api_url": "https://api.pagerduty.com/v2/",
"postmortem_creation": {
"enabled": true,
"platform": "confluence",
"space_key": "INCIDENTS",
"template_key": "postmortem-standard",
"auto_assign": true
},
"meeting_scheduling": {
"enabled": true,
"calendar_provider": "google_calendar",
"calendar_id": "team-incidents@company.com",
"timezone": "UTC",
"timing_by_severity": {
"S1": 24,
"S2": 48,
"S3": 72,
"S4": 120
},
"default_duration_minutes": 60
},
"cap_ticket_creation": {
"enabled": true,
"project_key": "INFRA",
"issue_type": "Task",
"labels": [
"postmortem",
"corrective-action"
],
"priority_mapping": {
"S1": "HIGH",
"S2": "MEDIUM",
"S3": "LOW",
"S4": "LOWEST"
}
},
"notifications": {
"slack_channel": "#incident-postmortems",
"include_summary": true,
"include_cap_tickets": true,
"include_meeting_link": true
}
}

Integration

Integrates with:

  • Incident management platform (PagerDuty, Opsgenie, custom)
  • Documentation platform (Confluence, Google Docs, Notion)
  • Calendar system (Google Calendar, Outlook)
  • Issue tracking system (Jira, GitHub Issues)
  • Slack API for notifications
  • Internal database for incident history tracking

Output

Postmortem Document:

  • Title: [POSTMORTEM-YYYY-MM-DD] {incident_title}
  • Sections: Executive Summary, Timeline, Root Cause Analysis, Impact Assessment, Action Items
  • Pre-filled fields: Incident severity, duration, responders, detection/resolution times
  • Owner assigned and tagged in doc
  • Permissions: Team view, responders can edit

CAP Tickets Created (4 tickets):

  1. Detection CAP: [S{severity}] [Detection] {incident_title}
  2. Prevention CAP: [S{severity}] [Prevention] {incident_title}
  3. Mitigation CAP: [S{severity}] [Mitigation] {incident_title}
  4. Process CAP: [S{severity}] [Process] {incident_title}

Each CAP ticket includes:

  • Description template with guiding questions
  • Links to postmortem and original incident
  • Assigned to incident commander or team lead
  • Priority mapped from incident severity
  • Labels: postmortem, corrective-action, {cap_type}

Calendar Event:

  • Title: Postmortem Review: {incident_title}
  • Duration: 60 minutes
  • Attendees: Incident commander, responders, team lead, stakeholders
  • Description: Links to postmortem doc and CAP tickets
  • Reminders: 24 hours, 1 hour before meeting

Slack Notification:

Incident Postmortem Initiated

Incident: {title} (Severity: S{severity})
Duration: {hours}h {minutes}m
Owner: {owner_name}

Postmortem doc: {doc_link}
Review meeting: {calendar_link}

CAP Tickets:
- Detection: {ticket_link}
- Prevention: {ticket_link}
- Mitigation: {ticket_link}
- Process: {ticket_link}

Failure Handling

Failure ScenarioHandling
Incident provider API unreachableRetry with exponential backoff (3 attempts), alert #platform-oncall
Postmortem doc creation failsLog error, notify postmortem owner via email, manual creation required
Calendar API unavailableLog error, email meeting details to participants with manual scheduling request
CAP ticket creation failsCreate remaining tickets, alert team lead to manually create missing ones
Owner identification failsAssign to on-call manager, notify for reassignment
Slack notification failsLog error, continue (non-fatal)

Retry Logic: 3 retries with exponential backoff (2s, 5s, 10s) Alert Channel: #platform-oncall for critical failures Manual Fallback: Alert incident commander if all automation fails

HookRelationshipPurpose
incident-severity-classifierUpstreamDetermines incident severity for hook trigger
incident-notification-dispatcherParallelSends real-time incident notifications
cap-ticket-trackerDownstreamTracks corrective action completion
incident-metrics-aggregatorRelatedAggregates postmortem metrics for trends

Principles

  1. Blameless Culture: Focus on systems and processes, not individuals
  2. Timeliness: Postmortems scheduled within 48h while context is fresh
  3. Structure: Standard templates ensure consistency and completeness
  4. Actionable: CAP tickets translate learnings into concrete improvements
  5. Learning Loop: Systematic review prevents repeated incidents
  6. Transparency: All incidents documented and accessible to team
  7. Follow-Through: CAP ticket tracking ensures improvements are completed
  8. Continuous Improvement: Postmortem metrics used for trend analysis and capability building