
Why Most Postmortems Are Useless (And How to Fix Them)
Every engineering team claims to do blameless postmortems. Most of them are lying, even if they don't realize it.
Postmortems fail in predictable ways. They turn into blame sessions where everyone defends their decisions. They produce 20-page documents nobody reads. They generate long lists of action items that never get done. And then the same incident happens again six months later.
If your postmortems feel like a checkbox exercise rather than actual learning, here's what's probably going wrong and how to fix it.
The Problem With "5 Whys"
A lot of postmortem guides recommend the "5 Whys" technique. You ask "why" five times to get to the root cause. In theory, this works. In practice, it often doesn't.
The problem is that complex systems don't have a single root cause. There are contributing factors, cascading failures, and decisions that made sense at the time but look questionable in hindsight. Forcing everything into a linear chain of whys creates false certainty.
Use the 5 Whys if it's helping you understand the incident. But if you find yourself forcing it or if the answers feel contrived, skip it. Focus on building an accurate timeline instead.
Timeline Accuracy Matters More Than You Think
The single most valuable part of a postmortem is the timeline. Not the root cause analysis, not the action items, but the boring chronological list of what happened when.
A good timeline includes:
- When the issue started (based on monitoring, not when someone noticed)
- When alerts fired
- When engineers responded
- What hypotheses they tested and what they found
- When the issue was mitigated
- When it was fully resolved
This sounds obvious, but most postmortems get the timeline wrong. People reconstruct events from memory instead of checking logs. They skip over dead ends or failed hypotheses because they seem embarrassing. And they treat "when we noticed" as "when it started," which hides monitoring gaps.
Building an accurate timeline takes effort. You need to check logs, correlate timestamps, and interview everyone involved. But it's worth it, because the timeline reveals patterns that narrative explanations miss.
Keep It Short
Nobody reads 15-page postmortem documents. They're too long, too detailed, and too boring. If you want your postmortems to actually get read, keep them short.
A good postmortem fits on two pages. One page for the timeline and key facts, one page for what you learned and what you're going to do differently. That's it.
If you feel like you need more space, you're probably over-explaining. Postmortems are not the place to document every technical detail or justify every decision. Save that for your incident chat logs. The postmortem should be a distilled summary.
Action Items With Owners and Deadlines
Most postmortems end with a list of action items like "improve monitoring" or "add better error handling." These items are vague, have no owner, and never get done.
If you want action items to actually happen, make them specific and assign them to someone with a deadline.
Bad action item: "Improve database monitoring"
Good action item: "Add query latency alerting to the orders database (Owner: Sarah, Due: Sept 30)"
And here's the critical part: review action items from past postmortems regularly. If you discover that 80% of your action items never get done, stop writing so many. Better to have two action items that get completed than ten that get ignored.
Blame Happens Even in Blameless Postmortems
The idea of a blameless postmortem is that you focus on systemic issues, not individual mistakes. But in practice, blame sneaks in through tone and framing.
Saying "The engineer did not check the deployment checklist" is blame. Saying "We don't have an automated pre-deployment check, so manual steps get skipped during urgent deploys" is systemic.
One way to keep postmortems actually blameless is to avoid naming individuals wherever possible. Instead of "John deployed the bad config," write "A bad config was deployed." This shifts focus from who did it to why the system allowed it to happen.
When to Skip the Postmortem
Not every incident needs a postmortem. If the issue was trivial, well-understood, and quickly resolved, writing a postmortem is busywork.
A good rule of thumb: if the incident taught you something new or exposed a gap in your systems, write a postmortem. If it was routine and boring, skip it.
Some teams do "postmortem lite" for minor incidents, which is just a quick Slack summary with a timeline and one or two takeaways. That works too. The format doesn't matter as much as whether you're actually learning.
Make Them Readable
Postmortems should be written for someone who wasn't there. That means avoiding jargon, explaining acronyms, and not assuming context.
Also, make them easy to find. Don't bury postmortems in a wiki or a Google Drive folder nobody checks. Keep them in a central, searchable location. When a similar incident happens later, being able to quickly pull up the old postmortem saves a ton of time.
The Goal Is Learning, Not Process
A postmortem is a tool, not a ritual. If your postmortems are teaching your team how to prevent future incidents and improving your systems, you're doing them right. If they're just checking a box, you're wasting time.
Focus on accuracy, clarity, and actionable takeaways. Skip the fluff. And if an action item isn't going to get done, don't write it down. Better to be honest about what you will and won't fix than to pretend you're going to fix everything.