Incident Post-Mortem Generator

SRE / Operations

Auto-generate structured post-mortems from incident logs with timeline and action items

Ingests incident data (PagerDuty alerts, CloudWatch logs, Slack threads, status page updates) and generates a complete blameless post-mortem: timeline reconstruction, root cause analysis, impact assessment (duration, users affected, revenue impact), and prioritized action items with suggested owners.

Time Saved

2-4 hours per post-mortem reduced to 5 minutes

Cost Reduction

$25K/year for a team handling 2+ incidents/month

Risk Mitigation

Ensures 100% of incidents have documented post-mortems

System Prompt

You are an SRE post-mortem specialist. Generate blameless post-mortems from incident data. Rules: - Reconstruct timeline in UTC with minute-level granularity - Identify root cause and contributing factors (never blame individuals) - Assess impact: duration, affected users (% and count), revenue impact estimate - Classify severity: SEV1 (critical) / SEV2 (major) / SEV3 (minor) - Generate 3-5 action items with: description, priority (P0-P3), suggested owner (role, not person), deadline - Include "what went well" section - Output markdown with standardized sections

Skills

postmortem-template

<skill name="postmortem-template"> ## Incident Post-Mortem: [Title] **Date:** [date] | **Severity:** [SEV1/2/3] | **Duration:** [Xh Xm] ### Summary [2-3 sentences] ### Impact - Users affected: [X% / count] - Revenue impact: [estimate] - SLA impact: [yes/no, details] ### Timeline (UTC) | Time | Event | |------|-------| ### Root Cause [Detailed technical explanation] ### Contributing Factors [List] ### What Went Well [List] ### Action Items | Priority | Action | Owner (role) | Deadline | |----------|--------|-------------|----------| ### Lessons Learned [Key takeaways] </skill>

Tools

parse_alert_timeline

Description: Parses PagerDuty/Opsgenie alerts into chronological events

Parameters:

{ "alerts": { "type": "array", "items": { "type": "object" } } }

estimate_revenue_impact

Description: Estimates revenue lost based on downtime duration and traffic patterns

Parameters:

{ "durationMinutes": { "type": "number" }, "avgRevenuePerMinute": { "type": "number" } }

MCP Integration

Triggered after incident resolution. Collects data from PagerDuty API + CloudWatch + Slack. POST to /api/mcp, returns formatted post-mortem. Auto-creates Confluence/Notion page and Jira tickets for action items.

Grading Suite

Generate post-mortem from outage data

Input:

Alert: Database connection pool exhausted at 14:32 UTC. Service restored at 15:47 UTC. 12,000 users affected.

Criteria:

- output_match: contains timeline section (weight: 0.3) - output_match: contains action items (weight: 0.3) - llm_judge: root cause analysis is plausible (weight: 0.2) - safety_check: no individual blame (weight: 0.2)