• Home
  • BVSSH
  • Engineering Enablement
  • Playbooks
  • Frameworks
  • Good Reads
Search

What are you looking for?

Playbook : Blameless Post-Mortems Playbook

🌐 Purpose

To establish a consistent, psychologically safe, and learning-focused approach to conducting post-incident reviews. This playbook ensures that teams turn failures into fuel for improvement, without fear, blame, or defensiveness.


⚖️ Principles

  • Focus on systems and processes, not individuals
  • Assume everyone did their best with the information available
  • Surface and share learnings across teams
  • Prioritise transparency, humility, and continuous improvement
  • Translate insights into actionable change

✅ Outcomes We Expect

  • Stronger systems through shared learning
  • Reduced recurrence of similar incidents
  • A culture of psychological safety and openness
  • Better cross-team alignment on risk, ownership, and design

⚡ When to Run a Post-Mortem

  • Severity 1 or 2 incidents (customer/business impact)
  • Near misses or high-risk bugs caught before production
  • Any event that uncovered a systemic failure or process gap
  • At team discretion when learning potential is high

📊 Post-Mortem Process

1. Incident Review Triggered

  • Auto-trigger from incident management tool (e.g., Sev1 closed)
  • Assign facilitator (not the incident responder)

2. Data Collection & Timeline

  • Gather logs, monitoring data, Slack threads, call transcripts
  • Build a shared timeline of the incident events (who, what, when)

3. Facilitated Review Meeting

  • Invite relevant participants (including those outside the team)
  • Create a psychologically safe space (restate blameless principle)
  • Walk through timeline collaboratively, ask open questions

4. Identify Contributing Factors

  • Focus on conditions and decision-making, not root blame
  • Use techniques like “5 whys” or causal loop diagrams
  • Capture gaps in process, tooling, communication, or design

5. Define Follow-Up Actions

  • Identify both tactical remediations and systemic improvements
  • Assign owners and due dates for each action
  • Add to backlog, OKRs, or work tracker

6. Publish & Share Learnings

  • Write a short, structured incident review (template below)
  • Share in cross-team channels, learning repositories, or show & tells

🔹 Structured Review Template

  • Title: Clear, descriptive title
  • Summary: What happened, impact, response
  • Timeline: Sequence of events
  • Contributing Factors: Not root cause
  • What Went Well: Acknowledge effective response elements
  • Areas for Improvement: Systemic insights
  • Actions: Concrete remediations and improvements
  • Links: Related logs, dashboards, tickets, etc.

⚙️ Tooling & Automation

  • Incident tools: PagerDuty, Opsgenie, FireHydrant
  • Documentation: Confluence, Notion, Google Docs
  • Workflow tracking: Jira, Linear, Trello
  • Communication: Slack/Teams integrations to nudge post-mortem creation

🔄 Continuous Improvement

  • Review follow-up actions in retrospectives
  • Track action closure rate and time-to-learn
  • Regularly review themes and patterns across post-mortems
  • Create summary digests or "quarterly incident learning" sessions

🔧 Key Roles

  • Facilitator: Guides the session neutrally
  • Incident Responder(s): Shares lived experience
  • Engineering Owner: Accountable for follow-up
  • Scribe: Captures notes, actions, and decisions

📈 Metrics to Monitor

  • % of incidents with completed post-mortems
  • % of actions closed within 30 days
  • Number of repeat incident themes
  • Average time from incident to published post-mortem

🔑 Governance Link

This playbook supports:

  • Policy: Post-Incident Learning Culture, Psychological Safety First
  • Standards: Conduct Blameless Post-Mortems, Classify Incidents, Ensure Every Post-Mortem Results in Concrete Actions, Share Learnings Across Teams

📖 Further Reading

  • "Blameless PostMortems and a Just Culture" – Etsy
  • "Site Reliability Engineering" – Google SRE Book (Chapter 15)
  • "How to Run a Post-Mortem" – Incident.io Guide
Recent Playbooks
  • Blameless Post-Mortems Playbook
    Apr 01, 2025
  • Feature Flags & Release Strategies Playbook
    Apr 01, 2025
  • Measuring Engineering Outcomes Playbook
    Apr 01, 2025
  • Observability & Monitoring Playbook
    Apr 01, 2025
  • Psychological Safety in Practice Playbook
    Apr 01, 2025
  • Supporting Learning & Growth Playbook
    Apr 01, 2025
Tags cloud
psychologically safety improvement

Technical debt is like junk food - easy now, painful later.

Awesome Blogs
  • LinkedIn Engineering
  • Github Engineering
  • Uber Engineering
  • Code as Craft
  • Medium.engineering