How to set up Alert Routing rules effectively

A well set-up alert routing system means your team only sees what actually needs attention. This guide covers the three things an effective routing setup should do and how to get there.

Sreekar

12th March, 2026

Different incidents need different levels of attention. Some need a phone call at 3 AM and others can wait until morning. Alert Routing rules are what let you act on that understanding without doing it manually every time.

An effective routing setup does three things:

Triage incidents so the right context is attached
Route incidents to the right escalation policies
Reduce noise so your team only sees what matters

Getting all three of these working is what makes a routing setup useful.

Table of contents

Triaging incidents

Triaging an incident involves answering three questions:

What is the incident severity?
What is the incident priority?
Who owns the incident?

Alert Routing rules can answer all these automatically so that when the on-call responder picks up the incident, it already has the right context attached.

You can set rules based on incident payload, time of occurrence, and frequency to triage incidents.

Incident payload

The incident title and details carry significant information. A keyword like “prod” or “database down” in the title is a strong severity signal. An incident that mentions an enterprise customer in the details probably warrants a higher priority than a routine background job failure.

A simple setup in Spike would look like this: Set the condition to “Incident title contains prod” and the action to “Mark severity as SEV-1”. Any incident with “prod” in the title gets classified automatically.

Triaging an incident based on payload (created on Spike)

💡Spike’s tip: Use Title Remapper

Some monitoring tools send raw error codes as incident titles. To set up a routing rule against a title like ERR_5023 you first need to know what that code actually means. That’s extra knowledge your team has to carry. Spike’s Title Remapper converts those codes into readable titles so ERR_5023 becomes “Payment API timeout”. Once your titles are readable, setting up routing rules becomes straightforward.

Time of occurrence

A database failure at 2 AM on Tuesday is a different situation from the same failure at 11 AM on Wednesday. At 11 AM, your team is probably already online and can respond through Slack. At 2 AM, someone needs to be woken up.

In Spike, the Time of day and Day of week conditions are built for exactly this. A rule that marks an incident as P1 when it triggers between 10 PM and 8 AM is a good place to start. During business hours, the same incident could be P3. Based on that priority, different escalation policies load and your team gets alerted differently. More on that in the next section.

Triaging an incident based on time of day (created on Spike)

Frequency

If a background job hits its retry limit once, it probably does not need urgent attention. If it triggers ten times within thirty minutes, that is a different situation altogether.

In Spike, the “incident has occurred within” condition handles this. Set the condition to fire when an incident triggers more than five times within thirty minutes. Add the action “Mark severity as SEV-1” and that incident moves up the queue automatically. It is a useful way to catch something that is quietly getting worse before it becomes a bigger problem.

Triaging an incident based on frequency (created on Spike)

Routing incidents

Once an incident is triaged, routing is fairly simple. It is about loading the right escalation policy based on what the incident is.

You probably already have a sense of your critical incidents from the triage section. Those same patterns in your titles and details are what your routing rules should be built around. A payment service going down or a production database becoming unreachable are typical examples.

Once you have that list, a simple two-policy setup usually covers a lot of ground. One policy for critical incidents with phone call alerts and short wait times and one default policy for everything else. Routes that match your critical patterns load the first policy. Everything else falls through to the default. As your incident patterns evolve you can add more policies and refine the rules.

💡 Spike tip: Use time-based routing

You can take your routing setup further with time. As we saw in the triage section a P-1 incident at 2 AM carries a different urgency than the same incident at 11 AM. In Spike you can set up separate escalation policies for business hours and off-hours and use the Time of day condition to load the right one automatically.

Routing SEV1 & P1 incidents to the critical escalation policy (created on Spike)

Noise reduction

Not every incident that triggers needs a human to act on it. An effective routing setup handles a portion of your queue automatically so your team’s attention stays focused on what actually matters.

There are four actions worth building into your rules for noise reduction:

Auto-acknowledge: This stops the escalation policy from running. A nightly database backup job that always throws a warning on completion is a typical example. Your team knows about it and there is nothing to act on.
Auto-resolve: This works well for known false positives. A CPU usage spike that fires every time your batch job runs at midnight but always drops back to normal within two minutes does not need anyone’s attention.
Resolve by timer: This waits for a set period and resolves the incident if nothing has changed. A memory usage warning that occasionally self-corrects works well here. You still catch it if it persists beyond the timer.
Do not create incident: This suppresses the incident before it enters your queue. A health check ping from a third-party uptime monitor that fires every thirty seconds is a reasonable candidate. It is worth using this action carefully and only for signals you are completely confident are irrelevant.

The same conditions that drive triage and routing work here too: payload, time of occurrence, and frequency. A low disk space warning on a dev server could be suppressed on weekends when nobody is working. A test environment going offline during off-hours could resolve by timer automatically since nobody needs it until the next morning.

The real value of an effective Alert Routing setup shows up over time. Your team responds faster because the context is already there. Incidents that do not matter do not distract anyone. The right people get called at the right time. What starts as a handful of rules gradually becomes a setup that reflects how your team actually works.

FAQs

Where should I start if I’ve never set up routing rules before?

Routing is usually the most natural place to begin. Start by identifying your most critical incidents and pointing them at a dedicated escalation policy. Even one rule that separates critical incidents from the rest is a meaningful first step. Triage and noise reduction can follow once you have a better picture of your incident patterns.

Do routing rules replace escalation policies?

No. Routing rules work alongside escalation policies. An escalation policy decides who gets paged and in what order. Routing rules decide which escalation policy gets loaded for a given incident. The two complement each other.

What happens if a routing rule misclassifies an incident?

The on-call responder can always update the severity or priority manually. Routing rules are there to reduce manual work but they are not the last line of defence. Reviewing misclassified incidents often helps you spot routing rules that need tightening.