
Incident Escalation Policy Examples and Templates for Every Team Size
An escalation policy is the backbone of incident response. Without one, alerts go unanswered. With a bad one, the wrong people get woken up at 3am for something that could have waited until morning, or worse, a real outage gets ignored because everyone assumed someone else was handling it.
Most teams know they need an escalation policy. The hard part is knowing what it should look like. The answer depends on your team size, how many services you run, and whether your engineers are in one timezone or spread across the world.
Here are four escalation policy templates you can copy and adapt. Pick the one closest to your situation and adjust from there.
What Makes a Good Escalation Policy
Before jumping into templates, here's what separates a useful escalation policy from one that exists on paper but fails in practice:
- Clear ownership at every level. Each escalation tier should map to a specific person or schedule. "The team" is not a valid escalation target.
- Reasonable timeouts. People need time to see an alert, read it, and decide what to do. Five minutes is a good floor for most tiers.
- A defined endpoint. Every policy needs a last resort. If the alert makes it through all tiers without acknowledgment, what happens? Someone has to be the backstop.
- Regular testing. An untested escalation policy is a guess. Run drills. Confirm that alerts actually reach the right people through the right channels.
Template 1: Small Team (3-5 Engineers)
This is the simplest setup that works. If you have a handful of engineers and everyone is reasonably familiar with the full stack, two levels are enough.
Level 1: Primary on-call
- Timeout: 5 minutes
- Who: One engineer, rotating weekly
- Notification: Push notification + SMS
Level 2: Team lead or CTO
- Timeout: 10 minutes
- Who: The most senior technical person on the team
- Notification: Phone call
Best for: Early-stage startups, small product teams, or any group where everyone can fix most things.
At this size, you don't need complex routing or multiple schedules. The primary on-call handles everything. If they don't respond, the team lead picks it up. Keep it simple.
One thing to watch out for: if your team lead is always the Level 2 backstop, they will burn out. Even at a small team, rotate the backup role if you can.
Template 2: Mid-Size Team (10-20 Engineers)
Once your team grows past a handful of people, you need a proper rotation schedule and a third escalation tier. Two levels isn't enough because the "team lead" can't be the permanent backstop for 15 engineers.
Level 1: On-call schedule rotation
- Timeout: 5 minutes
- Who: Primary on-call engineer (weekly rotation across the team)
- Notification: Push notification + SMS
Level 2: Team lead on-call
- Timeout: 10 minutes
- Who: Secondary schedule rotating among senior engineers or team leads
- Notification: Push notification + SMS + phone call
Level 3: Engineering manager
- Timeout: 15 minutes
- Who: Engineering manager or director
- Notification: Phone call
Best for: Growing teams with defined on-call schedules, teams that have split into sub-teams but still share infrastructure.
The key difference from Template 1 is that Level 2 is a schedule, not a single person. You should have two or three senior engineers rotating through the backup role. This prevents burnout and means there's always someone with enough context to help.
Level 3 exists as a safety net. In practice, it should almost never fire. If your engineering manager is getting paged regularly, your Level 1 and Level 2 coverage has gaps.
Template 3: Multiple Teams (20+ Engineers)
When you have multiple teams owning different services, a single escalation policy doesn't cut it. Alerts from the payments service should go to the payments team, not to the team that builds the search feature.
Per-service routing: Each service or domain gets its own escalation policy. Alert matching rules route incoming alerts to the correct policy based on the source, labels, or service name.
Level 1: Team's primary on-call schedule
- Timeout: 5 minutes
- Who: The owning team's primary on-call rotation
- Notification: Push notification + SMS
Level 2: Team's secondary on-call schedule
- Timeout: 10 minutes
- Who: A senior engineer on the same team (separate rotation from primary)
- Notification: Push notification + SMS + phone call
Level 3: Cross-team incident commander rotation
- Timeout: 15 minutes
- Who: A rotation of senior engineers or team leads from across the org
- Notification: Phone call + Slack channel alert
Best for: Organizations with multiple services, microservices architectures, or any setup where different teams own different parts of the system.
The incident commander at Level 3 serves two purposes. They're the final backstop if the owning team is completely unresponsive. And for major incidents that span multiple services, they can coordinate across teams.
Make sure your alert matching rules are accurate. Mis-routed alerts are one of the top reasons escalation policies fail at this scale. If a database alert goes to the frontend team, your policy is technically working but practically useless.
Template 4: Follow-the-Sun (Distributed Teams)
If your engineers are spread across multiple timezones, you can take advantage of that. Instead of waking someone up at 3am, route alerts to the team that's currently in business hours.
Level 1: Regional on-call (business hours team)
- Timeout: 5 minutes
- Who: On-call engineer in the region currently in business hours
- Notification: Push notification + SMS
Level 2: Next timezone region
- Timeout: 10 minutes
- Who: On-call engineer in the next closest active timezone
- Notification: Push notification + SMS + phone call
Level 3: Global incident lead
- Timeout: 15 minutes
- Who: A rotation of senior engineers across all regions
- Notification: Phone call
Best for: Companies with engineering teams in two or more timezones (e.g., US + Europe, or US + Asia-Pacific).
Follow-the-sun only works well if each region has enough engineers to staff an on-call rotation. If your "Europe team" is two people, they'll burn out fast. You need at least three or four engineers per region for a sustainable rotation.
The tricky part is defining handoff windows. There's usually a gap between when one region's business hours end and the next region's begin. Decide in advance who covers those gaps. Some teams split the difference, others assign the gap to the region that's closer to business hours.
Common Mistakes
These mistakes show up in escalation policies at every team size:
Timeouts that are too short. A 1-minute timeout sounds aggressive and responsive. In reality, it means you escalate before anyone has a chance to read the alert, open their laptop, and check what's going on. Five minutes is a reasonable minimum for most situations.
Too many escalation levels. If you have five or six tiers, something is structurally wrong. Either your on-call engineers don't have enough access to fix issues, or you're routing alerts to people who can't actually help. Three tiers handles almost every scenario. Four is occasionally justified. Five is a red flag.
No final escalation. Every policy needs an endpoint. If alerts can pass through all tiers without anyone acknowledging them, they vanish into nothing. Your last tier should be someone who will always respond, even if that means calling a phone number that rings loudly.
Never testing the policy. Escalation policies rot. People leave the company, phone numbers change, notification preferences get misconfigured. Test your policies quarterly at minimum. Send a test alert through the full chain and verify that the right people get notified at the right times.
Using the same policy for everything. A disk space warning and a full production outage shouldn't follow the same escalation path. At minimum, have a "standard" policy and a "critical" policy. Critical alerts should escalate faster and notify more people.
How to Implement These in NearIRM
NearIRM's policy builder lets you set up any of these templates in a few minutes. The workflow is straightforward:
- Create matching rules to route alerts to the right policy based on source, severity, labels, or service name.
- Add escalation steps with the timeout duration you want at each level.
- Assign a schedule or team at each step. You can point to an existing on-call rotation or a specific list of people.
- Set notification channels per step. Push notifications for Level 1, phone calls for the final tier, or whatever combination fits your team.
Check the escalation policy docs for a step-by-step walkthrough of configuring each template.
Start Simple and Iterate
Don't try to design the perfect escalation policy on day one. Start with the simplest template that matches your team size. Run it for a few weeks. Pay attention to what happens during real incidents: Did the right person get alerted? Was the timeout too long or too short? Did alerts reach the final tier too often?
Adjust based on what you learn. The best escalation policies aren't designed in a meeting room. They're shaped by real incidents over time. Get something reasonable in place, then make it better.