NearIRM Team
NearIRM Team5 min read

Incident Communication Templates: What to Say and When

Most incident playbooks cover detection, triage, and resolution. What they skip is the part that makes everyone nervous when a P1 hits: what do you actually say, to whom, and how often?

Good incident communication isn't just courtesy. It cuts down on the number of people pinging you for status, keeps leadership from escalating unnecessarily, and gives your customers enough to work with so they stop refreshing the status page every 30 seconds. Bad communication (or silence) does the opposite.

Here are templates that work in practice, with notes on timing and audience.

The First Message (Within 5 Minutes of Acknowledgment)

This goes to your internal incident channel, tagged to the alert or ticket. Don't wait until you know what's wrong. The goal is to show that a human is on it.

🚨 Incident Declared: [Service/Feature] is degraded
Severity: P1 / P2
Incident Commander: @you
Status: Investigating

We've detected an issue with [service]. Impact is [brief description, e.g., "elevated error rates on checkout"]. We're investigating now.

Next update in 15 minutes.

Two things that people get wrong here: they wait too long (trying to include a root cause they don't have yet), and they forget to name an incident commander. Name yourself or whoever is leading. It tells everyone else where to direct questions.

Stakeholder Notification (Within 10-15 Minutes for P1)

This goes to your manager, the affected team leads, and whoever owns the customer relationship. Keep it short. Executives and account managers don't need technical detail at this stage.

Subject: [INCIDENT] Checkout errors affecting [X% of users / enterprise customers]

We're currently experiencing elevated error rates on [feature/service]. 
Our team is actively investigating and working toward a fix.

- Started: [time]
- Current impact: [plain English description]
- Next update: [time]

We'll send a full update in 30 minutes or sooner if we have a resolution.

If you have a status page, link it here. If you don't have one yet, the incident status page guide explains how to set one up quickly.

The 30-Minute Update

By now you probably have a theory, even if you don't have a fix. Give an honest read on progress.

🔄 Incident Update: [Service] degradation
Status: Investigating / Mitigation in progress

What we know:
- [Symptom, e.g., "Error rates on /checkout are running at 18%, up from baseline of 0.2%"]
- [What you've ruled out, if relevant]
- [Current hypothesis, if confident]

What we're doing:
- [Action 1, e.g., "Rolling back the 14:32 deployment"]
- [Action 2, e.g., "Monitoring DB replica lag"]

Next update in 30 minutes or on resolution.

The "what we've ruled out" section is optional but underrated. It shows forward progress even when you haven't fixed anything yet, and it prevents people from suggesting things you've already tried.

Customer-Facing Status Page Update

This is different from internal communication. Customers don't know your stack, don't care about your deployment pipeline, and shouldn't have to.

Investigating (early):

We are investigating reports of errors affecting [product feature]. 
Our team is working to identify the cause. We will provide an update shortly.

Identified / Mitigation in progress:

We've identified the cause of [issue] and are working to resolve it. 
Some users may continue to experience [symptom] while we apply a fix.
Last updated: [time]

Resolved:

This incident has been resolved. [Feature] is operating normally.

Affected period: [start time] to [end time]
Root cause: [1-2 sentences, plain English]

We'll publish a full postmortem within [3-5 business days].

Keep each status page entry factual and tense-appropriate. "We are investigating" present tense during the incident, past tense after.

Resolution Message (Internal)

When the incident is resolved, post to the incident channel before closing the call. This acts as the official end of incident and kicks off the postmortem process.

✅ Incident Resolved: [Service] degradation
Duration: [start] to [end time] ([X hours Y minutes])
Impact: [What was affected and how many users/requests]

Resolution: [What fixed it, in one or two sentences]

Action items:
- [ ] Write postmortem by [date]
- [ ] File ticket for [follow-up work]
- [ ] Update runbook: [link]

Incident Commander: @you
Thanks everyone who jumped in.

The action items section matters. Without it, the fixes and improvements discussed on the call evaporate. Someone has to own each one.

Escalation Request

Sometimes you're stuck and need to bring in someone who isn't on the current call. This is awkward to write on the fly, so having a template helps.

Hey @[person], sorry to pull you in. We're in a P1 for [service] and need your help.

Current situation: [2 sentences max]
What we've tried: [bullet list]
What we think you can help with: [specific ask]

Incident channel: #incident-[id]

Be specific about what you need from them. "Can you look at this?" is harder to act on than "Can you check the Redis cluster? We're seeing connection timeouts and want to rule out memory pressure."

After Hours / Paging Escalation

When you need to page someone outside business hours, the message they wake up to matters. Too much context and they're reading for two minutes before doing anything. Too little and they don't know what they're walking into.

P1 ALERT: [Service] is down / degraded
Impact: [one line, e.g., "All API requests returning 503"]
Started: [time]
On-call lead: @[name] in #incident-[id]

Please join if you have [specific expertise or access].

If your on-call tool supports it, attach a direct link to the incident. The fewer steps between getting paged and getting context, the better.

Timing Guidelines

Here's a practical reference for how often to communicate based on severity:

SeverityFirst messageInternal updatesStakeholder updatesStatus page
P1Within 5 minEvery 15-30 minEvery 30-60 minEvery 30 min
P2Within 15 minEvery 30-60 minEvery 60 minEvery 60 min
P3Within 30 minEvery 60 minOn resolutionOptional

These are starting points, not rules. If you have meaningful progress to report, post sooner. If you've said all there is to say and nothing has changed, don't post a "no update" update just to hit a cadence, though a brief "still investigating, no change" at the 30-minute mark does prevent people from wondering if anyone is still working on it.

Making Templates Actually Stick

Templates only help if people use them under pressure. A few things that make that more likely:

Pin them somewhere findable. A templates doc in your runbook repository or incident wiki is fine. A Notion page that nobody can find under stress is not.

Make your on-call tool do the first message for you. NearIRM and most modern incident platforms can auto-post an acknowledgment message to Slack when an incident is declared. One less thing to think about when you're already triaging.

Run a practice incident. A game day or chaos engineering exercise is good for this. Run it like a real incident, including the communications. You'll find out fast where the templates break down or feel unnatural.

Review comms in your postmortem. Most postmortems focus on the technical timeline. Spend five minutes on the communication timeline too. Were stakeholders notified in time? Was the first status page update clear? Small adjustments here pay off over multiple incidents.

The goal isn't perfect prose during an outage. It's getting the right information to the right people fast enough that they can make decisions, while you focus on fixing the thing.

Related Posts