When a critical incident triggers, there’s no time to figure out who to call. That decision needs to be made well before the incident arrives. A dedicated escalation policy for critical incidents gives your team a clear path to follow the moment things go wrong, rather than leaving it to whoever happens to be around. This guide covers the key decisions involved in building that policy.
Table of contents
What counts as a critical incident
Before you set up an escalation policy for critical incidents, you need a clear definition of what critical actually means for your team. Without that, two engineers will classify the same incident differently and your policy will behave inconsistently.
A useful question to start with is which incident would wake you up at 3am? That’s your critical incident.
A payment API going down, an authentication service failing during peak hours, or a database becoming unreachable are good examples. Incidents that wouldn’t pull you out of bed at 3am (low-priority incidents) belong in a default escalation policy instead.
Once that line exists, the rest follows naturally. Your critical incident policy has a clear entry point and your default policy handles everything else.
Why critical incidents need their own escalation policy
A default escalation policy works well for most incidents but critical incidents are different. The response window is tight and the cost of nobody picking up is high. Routing them through the same policy as everything else is a risk.
A dedicated critical incident policy removes the guesswork. It gives your team a clear, pre-decided path to follow the moment a critical incident triggers. Nobody has to decide in the moment who to call or how long to wait. That decision was already made.
It also sets the right expectations. When your team knows a critical incident triggers a phone call within five minutes, they treat it with a different level of urgency than a Slack message that might get seen an hour later.
Who to call when a critical incident triggers
Critical incidents are situations where speed and collective awareness matter more than anything else. Rather than alerting one person and waiting to see if they respond, teams prefer to call the entire engineering team the moment a critical incident triggers. Everyone is looped in from the start and the team works on the incident together rather than sequentially.
A policy structured this way might look something like this:
- Step 1: Phone call to the entire engineering team
- Step 2: Phone call to the CTO
- Step 3: Phone call to the incident communicator

You can set up such an escalation policy on Spike in a few minutes. Try now →
The engineering team usually handles the resolution. The steps that follow are about keeping the right people informed as the incident unfolds. The CTO needs to know that a critical incident is active. Stakeholders may need to communicate with customers or coordinate a response outside the engineering team. Each step serves a different purpose and together they make sure the right people are aware at the right time.
One thing worth deciding early is who can classify an incident as critical in the first place. Calling the entire engineering team is a high-signal action and it’s worth keeping that reserved for incidents that genuinely need it.
Choosing the right alert channel
For critical incidents, a phone call is the right choice. It’s the only channel that demands attention the moment it comes in, regardless of what the responder is doing or where they are at that point in time.
Other channels like Slack or email work well for lower-priority incidents where awareness matters more than an immediate response. For a critical incident though, waiting for someone to notice a message is a risk your team probably doesn’t want to take.
There’s also a broader reason to keep phone calls reserved for critical incidents. When phone calls go out for every incident regardless of severity, your team gradually starts to treat them as routine. Keeping them exclusive to critical incidents means the signal stays strong and everyone on the team knows exactly what it means when their phone rings.
How long to wait between steps
Taking the earlier example as a reference, when a critical incident triggers, the engineering team gets called right away. There’s no delay at that point. The wait time only comes into play for the steps that follow.
The second step, calling the CTO, is purely about keeping them informed. A 10 to 15-minute gap before that call usually works well. It gives the engineering team a reasonable window to acknowledge and start working on the incident before the next step kicks in. The same thinking applies to any steps after that. Critical incidents move fast and the people who need to be informed should hear about it reasonably early.
Your escalation policy will change over time
Your first critical incident escalation policy probably won’t be your last. Teams change, systems grow and what counts as critical today might look different a year from now.
The important thing is to have something in place rather than waiting for the perfect version. Set up an escalation policy that reflects where your team is today and edit it as you learn. The policies that work best are usually the ones that have been adjusted a few times based on real incidents rather than built once and left untouched.
FAQs
What’s the difference between a critical incident escalation policy and a default escalation policy?
A default escalation policy is built for most incidents where a reasonable response time is acceptable. A critical incident policy is built around speed and awareness. The steps, wait times, and alert channels are all calibrated for situations where the cost of a slow response is high.
Should a small team have a critical incident escalation policy?
Yes, and it’s usually simpler. In a small team, a single step often covers everything. One phone call goes out to the whole team, which already includes the founder, senior leads, and anyone else who needs to know. Everyone is aware from the start and the team takes it from there.
Should every service have its own critical incident policy?
Not necessarily. Many teams run a single critical incident policy that covers all their services. A service-specific policy makes more sense when different services have genuinely different response requirements or separate teams responsible for them.
