What is Chaos Engineering?

Chaos Engineering is a discipline for experimenting on a system in production to build confidence in its ability to withstand turbulent conditions. It proactively identifies systemic weaknesses before they impact users.

Who should implement Chaos Engineering?

It is primarily intended for engineers and organizations managing large-scale, distributed software systems. It helps them gain confidence in complex production deployments despite inherent chaotic behaviors.

How does Chaos Engineering approach system reliability differently from traditional testing?

Unlike traditional testing that validates expected functionality, Chaos Engineering empirically learns about system behavior by observing it during controlled experiments. It focuses on how systems behave under unexpected real-world events rather than just validating how they are supposed to work.

When should Chaos Engineering experiments be run?

Experiments should be run continuously and ideally directly on production traffic. This ensures authenticity of the system's exercise and relevance to the currently deployed system, capturing real-world utilization patterns.

What is the concept of "steady state" in Chaos Engineering?

"Steady state" refers to a measurable output of a system that indicates normal behavior, such as throughput, error rates, or latency percentiles. Chaos Engineering experiments hypothesize that this steady state will continue and try to disprove it by introducing disruptions.

principlesofchaos.org · 11 JAN '21

Principles of chaos engineering

Item: Principles of chaos engineering
Rating: 5
Author: Simon Frey

This site defines Chaos Engineering, an empirical approach to building confidence in distributed systems by actively experimenting on them in production. It outlines core principles for proactively identifying systemic weaknesses.

Visit principlesofchaos.org →

Questions & Answers

What is Chaos Engineering?: Chaos Engineering is a discipline for experimenting on a system in production to build confidence in its ability to withstand turbulent conditions. It proactively identifies systemic weaknesses before they impact users.
Who should implement Chaos Engineering?: It is primarily intended for engineers and organizations managing large-scale, distributed software systems. It helps them gain confidence in complex production deployments despite inherent chaotic behaviors.
How does Chaos Engineering approach system reliability differently from traditional testing?: Unlike traditional testing that validates expected functionality, Chaos Engineering empirically learns about system behavior by observing it during controlled experiments. It focuses on how systems behave under unexpected real-world events rather than just validating how they are supposed to work.
When should Chaos Engineering experiments be run?: Experiments should be run continuously and ideally directly on production traffic. This ensures authenticity of the system's exercise and relevance to the currently deployed system, capturing real-world utilization patterns.
What is the concept of "steady state" in Chaos Engineering?: "Steady state" refers to a measurable output of a system that indicates normal behavior, such as throughput, error rates, or latency percentiles. Chaos Engineering experiments hypothesize that this steady state will continue and try to disprove it by introducing disruptions.

Principles of chaos engineering

Questions & Answers

More from SRE

bunny.net

Anubis

3FS

litellm

openinference

llama.cpp