An on-call responder is the first line of defence when something breaks. They assess the situation and take appropriate action. This guide walks you through what that actually looks like. You’ll see how on-call responders think through an incident and figure out what needs to be done.
Table of contents
Assessing and resolving incidents
On-call responders keep watch over systems. Centuries ago, castle guards did much the same thing at the gates. They stood high up in the castle and watched the horizon.
And when they spotted a lone rider approaching the gates, they probably wouldn’t ring the bell right away. That would cause panic for what’s likely just a merchant or traveller. Instead, they usually ask the rider a few questions and check their identification. If everything seems fine, they’d let them through and go back to watching the horizon.
On-call responders often have shifts like that. When an incident occurs, their first job is to work out what it is and what it affects. If it’s a familiar issue, they usually handle it alone and the rest of the team doesn’t have to get involved.
Escalating incidents for help
Not every incident can be handled alone. Sometimes, bringing in more people makes sense.
For example, in the film Don’t Look Up, an astronomy student discovers a comet during a routine shift. When she runs the calculations, she realises it’s heading straight for Earth. She doesn’t try to manage it by herself. She calls her supervisor, who runs the numbers himself. They check multiple times to be certain before raising the alarm. Once they’ve confirmed it’s real, they escalate to NASA. NASA verifies it independently and briefs the White House.
The student had already run the calculations herself and was fairly certain of what she’d found. But she escalated it anyway, as some decisions tend to be too important to make alone.
On-call responders face similar moments. They might encounter incidents that probably need more people. Perhaps a Subject Matter Expert (SME) to double-check the diagnosis. Or maybe an extra set of hands to help work through the fix more quickly. In such situations, on-call responders usually escalate the incident.
Spotting patterns in incidents
Incidents don’t always happen in isolation. Sometimes they might arrive in clusters that could suggest a bigger problem.
Consider a fire station receiving three emergency calls from people in the same street within ten minutes. The firefighters would probably want to check if these reports might be connected. Three fires on the same street could share a common cause.
A similar pattern can show up when on-call. Perhaps the checkout page fails, and order confirmation emails stop sending around the same time. That pattern often suggests there might be a single root cause. If the payment service goes down, the checkout page might not be able to process transactions, and the email system would have no orders to confirm. It can look like two separate problems, but it’s often one issue cascading through the system.
When that seems to be the case, fixing the payment service usually resolves the other incidents as well. Other times, incidents might occur over 40 minutes with no clear pattern. That typically means they could be separate issues that probably need individual attention.
On-call responders usually need to work out whether everything might be connected or if they’re dealing with separate issues.
Documenting what happened
Once an incident is resolved, on-call responders usually document what broke and how it was fixed.
This documentation often helps the next person on-call if a similar issue comes up during their shift. They’ll have a sense of what worked, what didn’t, and who might be able to help. It can save them from starting from scratch or repeating approaches that didn’t quite work the first time.
This kind of context-sharing typically happens during handoffs. If you’d like to learn more about making handoffs smoother, we’ve written a guide on handoff best practices for on-call teams.
FAQs
What happens if an on-call responder can’t resolve an incident?
They escalate to someone who can help. That might be a subject matter expert or another responder. Escalating is a normal part of on-call work, and it’s often the right call when an incident needs expertise or verification.
Are on-call responders expected to prevent incidents or just respond to them?
Mostly respond. On-call responders are there to handle incidents when they happen. If they spot something risky during their shift, they’ll often flag it. But when it comes to prevention, it’s typically a team-wide responsibility that happens during regular working hours.
