Chaos Engineering Lite: Testing your AWS Alarms with Intentional Failures

#aws #testing #sre #devops

We spend hours writing code to avoid errors. Today, I wrote code to cause one. On Day 39, I ran a "Game Day" simulation to verify that my observability stack actually works when things go wrong.

The Logic

I added a simple "Kill Switch" to my Python Lambda:

Python

if event.get('chaos') == 'true':
raise Exception("🔥 Chaos Test Triggered")
The Drill

Deploy: Pushed the sabotaged code.

Attack: Invoked the function with {"chaos": "true"}.

Observation:

The execution failed immediately.

CloudWatch metrics for Errors spiked to 1.

My FinanceAgent-Critical-Error alarm transitioned to ALARM state.

Amazon SNS delivered the payload to my inbox.

Why do this?

Configuration drift is real. Maybe you deleted the SNS subscription by mistake. Maybe the alarm threshold is too high. By manually triggering a failure, you validate the entire pipeline—from code exception to SMS notification.

Takeaway: Don't trust your alarms until you've seen them ring.

DEV Community

Chaos Engineering Lite: Testing your AWS Alarms with Intentional Failures

Top comments (0)