What is the Incident Response Glossary?

It's a curated collection of 500+ terms to help teams understand key concepts in incident management, monitoring, on-call response, and DevOps.

How can I use this glossary?

You can browse terms alphabetically, use the search, or explore related terms to learn incident response more effectively.

Fault Injection Testing (Chaos Engineering)

Fault injection testing, also known as chaos engineering, is a disciplined approach to improving system resilience by deliberately introducing failures in controlled environments.

← Glossary

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

What Is Fault Injection Testing (Chaos Engineering)

Fault injection testing, also known as chaos engineering, is a disciplined approach to improving system resilience by deliberately introducing failures in controlled environments. This practice helps teams understand how systems respond to disruptions and identify weaknesses before they cause real incidents.

Why Is Fault Injection Testing Important

Fault injection testing builds confidence in system resilience by revealing hidden vulnerabilities and response gaps. It helps teams develop better incident response procedures, create more robust systems, and reduce the impact of actual failures when they occur.

Example of Fault Injection Testing

A payment processing company runs a controlled experiment where they simulate the failure of a primary database server during low-traffic hours. They observe how their systems handle the failover process and identify a delay in routing traffic to backup servers that they can fix before it affects customers.

How to Do Fault Injection Testing

Start with a hypothesis about how your system should respond to specific failures
Define the scope and blast radius of experiments
Create a controlled testing environment that mimics production
Monitor system behavior during experiments
Document findings and implement improvements

Best Practices

Begin with small, contained experiments and gradually increase complexity
Run tests in production-like environments before moving to actual production
Establish abort conditions to stop experiments if they cause unexpected damage

Fault Injection Testing (Chaos Engineering)

What Is Fault Injection Testing (Chaos Engineering)

Why Is Fault Injection Testing Important

Example of Fault Injection Testing

How to Do Fault Injection Testing

Best Practices

What's the Root Cause?

Our take on PagerDuty's Pricing breakdown

Further reading:

Fault Isolation Dashboard

Fault Prediction with AI/ML

Fault Tolerance