Incident Management Template
Generate a complete incident management workflow template. Define detection, triage, response, and postmortem phases with roles and communication plans.
# Incident Management Template
*Generated on February 28, 2026*
## Roles & Responsibilities
| Role | Assigned To |
| --- | --- |
| Incident Commander | TBD |
| Communications Lead | TBD |
| Technical Lead | TBD |
## Phase 1: Detection
### Monitoring Tools
- [x] APM
- [x] NearIRM Alerts
- [ ] Log Aggregation
- [ ] Synthetic Monitoring
- [ ] User Reports
### Alert Thresholds
Alert when error rate exceeds 1% or p99 latency exceeds 2s for 5 consecutive minutes.
## Phase 2: Triage
**Triaged by:** On-call engineer
**Default severity:** P3
### Triage Criteria
Assess user impact, number of affected services, and whether revenue-generating flows are disrupted.
## Phase 3: Response
**Communication Channel:** Slack
**Status Page Updates:** Yes
**Customer Communication:** No
## Phase 4: Resolution
### Resolution Steps
1. Identify the failing component
2. Apply mitigation (rollback, feature flag, scaling)
3. Verify metrics return to normal
4. Monitor for 15 minutes before declaring resolved
### Verification Checklist
- Error rates back to baseline
- Latency within SLA thresholds
- No ongoing alerts firing
- Customer-facing functionality verified
## Phase 5: Postmortem
**Timeline Required:** Yes
**Blameless:** Yes
### Template Sections
- Timeline
- Root Cause
- Impact
- Action Items
- Lessons Learned
## Communication Templates
### Internal Notification
```
[INCIDENT - {{severity}}] {{title}}
Impact: {{impact}}
Status: Investigating
Incident Commander: {{commander}}
Channel: {{channel}}
Please join the incident channel for updates.
```
### Customer Notification
```
We are currently investigating an issue affecting {{service}}. Some users may experience {{impact}}. Our team is actively working on a resolution and we will provide updates as they become available.
```
### Status Page Update
```
Identified: We have identified an issue affecting {{service}}. Our engineering team is working on a fix. We will provide an update within 30 minutes.
```
Automate your incident response
Free tools are a great start. NearIRM automates the entire workflow — alerting, escalation, on-call scheduling, and notifications — starting at $29/mo.
Frequently asked questions
What is incident management?
Incident management is the process of identifying, analyzing, and resolving unplanned interruptions or reductions in service quality. It encompasses everything from initial detection and triage through response coordination, resolution, and post-incident review to restore normal operations as quickly as possible and minimize business impact.
What are the stages of incident management?
Incident management typically follows five stages: Detection (identifying that something is wrong via monitoring, alerts, or user reports), Triage (assessing severity and impact to prioritize the response), Response (coordinating the team, communicating with stakeholders, and working toward a fix), Resolution (implementing the fix and verifying the service is restored), and Postmortem (reviewing what happened, identifying root causes, and creating action items to prevent recurrence).
How do you set up an incident management process?
Start by defining severity levels (P1 through P4) with clear criteria for each. Assign roles such as Incident Commander, Communications Lead, and Technical Lead. Set up monitoring and alerting to detect issues early. Create communication templates for internal teams, customers, and status pages. Establish escalation policies with timeouts. Finally, make postmortems a mandatory, blameless practice after every significant incident.
What is a postmortem?
A postmortem (also called a post-incident review) is a structured analysis conducted after an incident is resolved. It documents the timeline of events, identifies root causes, quantifies impact, and produces concrete action items to prevent similar incidents. Best practice is to run blameless postmortems that focus on systemic improvements rather than individual fault, encouraging honest and thorough analysis.