Actions

Actions are the building blocks of Chaos Engineering experiments and allow you to attack, check, or load test your systems during an experiment run.

For instance, a simple experiment may consist of a single attack, e.g., to increase CPU usage within a container. In more advanced cases, you may compose various actions to, e.g., inject traffic, increase memory usage, and check whether your Observability monitor notices anything. Check out Steadybit's Recipes to learn more about how to combine actions into valuable experiments.

An action is always coming from a Steadybit extension and is submitted via the Steadybit Agent to the platform.

If you haven't installed any extensions, only one action is available: a simple wait action.

Kinds of Actions

The following sections will explain the supported kinds of actions in more detail.

Attacks

Attacks, as the name implies, have a deliberate negative impact on your system. For example, you may decide to:

Kill a set of containers or simulate a Kubernetes deployment rollout (state attack).
Increase the memory or CPU usage on a host (resource attack).
Artificially increase network latency or reconfigure AWS VPC configurations to simulate an availability zone outage (network attack).

In the end, an attack execution always has the same intended effect: Let me understand how a system behaves under adverse/turbulent conditions. Attacks introduce adverse/turbulent conditions for as long as necessary, e.g., as long as necessary to observe the effect. Once done, the attack is rolled back (whenever possible) to move the system back into an operational state.

You can learn about all supported attacks within our Reliability Hub.

Checks

Checks (sometimes called probes) act as verifications or assertions within experiments. Once you progress past one-off executions of attacks, you should verify your expectations automatically. Not only for easier execution but also consistency and knowledge sharing with team members. On top of this, automatic verification is faster, thus reducing the time attacks need to be executed and reducing the time your system will be affected by attacks.

You can incorporate various checks into your experiments. Some examples:

Ensure that no Kubernetes rollout is in progress before starting an attack (see Kubernetes Deployment Rollout Status check in Reliability Hub
Verify that your Datadog monitors are reporting issues when there are issues (injected by attacks, see Datadog Monitor Status check in Reliability Hub)
Run Prometheus PromQL queries to check internal system metrics (see Prometheus Metrics check in Reliability Hub)
Leverage your Postman API tests to check that your API is still working as intended while a part of your system is under attack (see Postman Collection check in Reliability Hub)

Most checks will immediately abort the experiment run and mark it as failed. You can learn about all supported checks within our Reliability Hub.

Load Tests

Load tests, within the chaos engineering experiment context, are helpful to introduce a baseline of expected traffic on your system – primarily when operating in non-production environments. Without any load on the system, some effects of attacks may not be observable, e.g., your observability solution might not identify problems for end-users because there were no end-users.

You can learn about all supported load tests within our Reliability Hub.

Supported Actions

Check out our Reliability Hub to learn about actions you can leverage with Steadybit.

Missing an action? Extend Steadybit!

Are you missing support for a particular action? No problem, Steadybit is extensible, allowing you to use open- and closed-source extensions to enhance its capabilities. Learn more within our extension documentation!

Last updated 3 months ago