
Building On-Call Rotations That Don't Burn People Out
On-call rotations are never going to be anyone's favorite part of the job. Getting woken up at 3am because a service crashed is not fun. But there's a difference between on-call being occasionally inconvenient and on-call being a constant source of stress and burnout.
The teams that handle on-call well aren't the ones with zero incidents. They're the ones that have built rotations and processes that distribute the load fairly, compensate people appropriately, and don't leave anyone feeling like they're on-call all the time even when they're not.
Rotation Length Matters
The most common rotation lengths are weekly and daily. Both have trade-offs.
Weekly rotations mean you're on-call for seven days straight, which sounds rough but has a big advantage: predictability. You know when your week is coming up, you can plan around it, and once it's over, you're off for a while.
Daily rotations spread the pain out, so nobody is on-call for too long at once. But they also mean you're in the rotation more frequently, and it's harder to mentally disconnect when you know you might be on-call again tomorrow.
For most teams, weekly rotations work better. People can adjust their schedules, cancel plans if needed, and have a clear boundary when their shift ends. Daily rotations can feel like you're always kind of on-call, which is exhausting.
Time Zones Are Tricky
If your team is distributed across time zones, on-call gets complicated. You can either have a follow-the-sun rotation where coverage shifts with time zones, or you can have everyone share the pain of off-hours alerts.
Follow-the-sun is ideal if you have enough people in each region to sustain a rotation. You avoid waking people up in the middle of the night, and you don't burn out engineers in a single time zone.
But if your team is small or heavily concentrated in one region, follow-the-sun might not be realistic. In that case, you need to be honest about the trade-offs. If someone in Europe is covering US night shifts regularly, that's a tough ask and should be compensated accordingly.
Compensate On-Call Engineers
Some companies pay a stipend for on-call shifts. Some offer comp time. Some do nothing and treat it as part of the job.
If you want to retain good engineers, you need to compensate on-call in some way. Even if your on-call is quiet most of the time, the fact that someone has to stay near their computer and be ready to respond has value.
A common model is a flat stipend for being on-call (regardless of whether you get paged) plus additional pay or time off if you actually have to respond to incidents. This acknowledges both the availability cost and the interruption cost.
Comp time is another option. If you get paged at 2am and spend three hours fixing an issue, you should be able to take that time back during normal work hours. Don't expect people to pull an all-nighter and then show up for meetings the next morning.
Handling Holidays and Vacations
Holidays are a frequent source of on-call tension. Nobody wants to be on-call over Thanksgiving or New Year's, but someone has to do it.
The fairest approach is to rotate holiday coverage and keep track of who covered which holidays. If you were on-call for Christmas this year, you shouldn't be on-call for Christmas next year. Some teams even offer extra pay or comp time for holiday shifts.
For vacations, use an override schedule. If someone is scheduled for on-call but has a vacation planned, they should be able to swap with someone else or designate a replacement. Don't make people cancel vacations because they're in the rotation.
And if someone is on vacation, they should actually be on vacation. No "just in case" pings, no expectation that they'll check Slack. If you can't let people disconnect, your on-call process is broken.
What to Do When Someone Is Overwhelmed
Sometimes an on-call shift is brutal. Alerts fire constantly, multiple incidents overlap, and the person on-call is drowning. In these situations, escalate early and bring in help.
If the on-call engineer is dealing with an active incident and a new alert fires, someone else should take the new one. Don't force one person to juggle multiple critical issues at once.
Some teams have a secondary on-call role, a backup who can step in when things get overwhelming. This is especially useful for teams with high alert volume or complex systems where incidents can take hours to resolve.
Solo On-Call Is Risky
If you only have one person on-call at a time and no backup, what happens if they can't respond? Maybe they're sick, maybe their phone died, maybe they're in a dead zone with no service. Single points of failure are bad in systems, and they're bad in on-call rotations too.
At a minimum, have a backup contact who gets escalated to if the primary on-call engineer doesn't respond within a reasonable timeframe. Better yet, have a secondary on-call person who is available for escalations or overflow.
Make On-Call Manageable
The best way to make on-call less painful is to reduce the number of alerts. If your on-call engineers are getting paged 10 times a night for stuff that doesn't matter, fix that first. No rotation structure or compensation model will make up for noisy, useless alerts.
Similarly, if alerts regularly require deep expertise that only one person has, that's a knowledge-sharing problem. Your on-call engineer should be able to handle most issues with the help of runbooks and documentation. If every alert requires paging a specific person, you're creating burnout and a bus factor problem.
On-Call Should Be Temporary
Nobody should be on-call indefinitely. Rotations exist to spread the load. If you find yourself constantly in the rotation with no break, either your team is too small or the rotation isn't structured fairly.
A good rotation gives people enough time off between shifts to recover. If you're on-call one week out of every four, that's sustainable. If you're on-call every other week, that's getting rough. If you're on-call more often than you're off, something is wrong.
Set Expectations Clearly
When someone joins an on-call rotation, they should know what they're signing up for. What's the expected alert volume? What types of issues will they handle? Who do they escalate to if they're stuck? What's the compensation?
Surprises are bad. If someone expects a quiet week and gets paged 20 times, they'll be frustrated and burned out. If they know it's a high-traffic rotation and are compensated accordingly, they can prepare.
On-call is part of the job for most engineering teams, but it doesn't have to be miserable. Fair rotations, clear expectations, proper compensation, and a focus on reducing unnecessary alerts make a huge difference. Treat your on-call engineers well, and they'll stick around. Burn them out, and they'll leave.