AWS won't stop charging you when your budget runs out. A leaked key, a forgotten GPU instance, a runaway Lambda - and the bill arrives 30 days later. Here's how to deploy budget alerts, Slack notifications, anomaly detection, and an automatic kill switch with Terraform.
A dev team got an $89K bill overnight after committing API keys to GitHub - bots found them in 4 minutes and spun up 500 GPU instances for crypto mining. A dev left a SageMaker notebook running over the holidays - $4,800 gone. A misconfigured Auto Scaling group spun up 200 instances overnight. Setting a budget in the console takes 5 minutes but nobody does it. Here's how to make it impossible to forget. π
AWS does not cap your spending by default. There's no hard limit. No guardrails. Your account is an open credit line to Amazon, and if something goes wrong - a runaway Lambda, a misconfigured Auto Scaling group, a leaked IAM access key - you won't know until the invoice lands in your inbox. π±
The fix? Budgets, alerts, anomaly detection, and automated actions - all deployed as code so every account gets them from day one.
πΈ The 4 Layers of Cost Protection
Most teams stop at Layer 1. That's why they still get surprised.
| Layer | What It Does | Response Time |
|---|---|---|
| Budget Alerts (Email) | Emails billing admins at thresholds | Hours (someone reads the email) |
| SNS β Slack | Posts to your team channel instantly | Minutes (someone sees Slack) |
| Cost Anomaly Detection | ML-powered spike detection | Hours (catches the weird stuff) π€ |
| Budget Actions | Auto-applies deny policies or stops instances | Seconds (fully automated) π‘οΈ |
Let's deploy all four.
π§ Layer 1: Budget Alerts with Terraform
The aws_budgets_budget resource is the foundation. This sets up email alerts at 50%, 80%, and 100% of your monthly budget:
resource "aws_budgets_budget" "monthly" {
name = "${var.account_alias}-monthly-budget"
budget_type = "COST"
limit_amount = var.monthly_budget
limit_unit = "USD"
time_unit = "MONTHLY"
cost_types {
include_tax = true
include_subscription = true
include_support = true
include_discount = false
include_refund = false
include_credit = false
use_blended = false
}
# Alert at 50% actual spend
notification {
comparison_operator = "GREATER_THAN"
threshold = 50
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
}
# Alert at 80% actual spend
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
}
# Alert at 100% actual spend
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
}
# Alert at 90% FORECASTED spend (early warning!) π
notification {
comparison_operator = "GREATER_THAN"
threshold = 90
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.alert_emails
}
}
variable "monthly_budget" {
type = string
description = "Monthly budget in USD"
default = "1000"
}
variable "account_alias" {
type = string
description = "Account alias for naming"
}
variable "alert_emails" {
type = list(string)
description = "Email addresses for budget alerts"
}
β οΈ Critical gotcha:
FORECASTEDalerts warn you before you hit the limit by projecting current trends to end of month. Most teams only setACTUALalerts and get notified after the money is already gone. AWS needs ~5 weeks of usage data to generate forecasts, so set these up early. Always include at least one forecasted threshold.
Cost: The first two budgets per account are free. Additional budgets cost $0.02/day (~$0.62/month). There's basically zero excuse not to have them. π―
π Layer 2: SNS β Slack Notifications
Email alerts get buried. Slack alerts get seen. Here's the full pipeline:
Budget β SNS Topic β Lambda β Slack Webhook
Step 1: Create the SNS topic and wire it to the budget
resource "aws_sns_topic" "budget_alerts" {
name = "budget-alerts"
tags = {
Environment = "shared"
ManagedBy = "terraform"
}
}
# SNS topic policy β allow AWS Budgets to publish
resource "aws_sns_topic_policy" "budget_alerts" {
arn = aws_sns_topic.budget_alerts.arn
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowBudgetsPublish"
Effect = "Allow"
Principal = { Service = "budgets.amazonaws.com" }
Action = "SNS:Publish"
Resource = aws_sns_topic.budget_alerts.arn
Condition = {
StringEquals = {
"aws:SourceAccount" = data.aws_caller_identity.current.account_id
}
}
}
]
})
}
data "aws_caller_identity" "current" {}
# Update the budget to publish to SNS
resource "aws_budgets_budget" "monthly_with_sns" {
name = "${var.account_alias}-monthly-budget"
budget_type = "COST"
limit_amount = var.monthly_budget
limit_unit = "USD"
time_unit = "MONTHLY"
cost_types {
include_tax = true
include_support = true
include_credit = false
include_refund = false
use_blended = false
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 50
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 90
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
}
Step 2: Deploy a Lambda function that posts to Slack
# IAM role for Lambda
data "aws_iam_policy_document" "lambda_assume" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["lambda.amazonaws.com"]
}
}
}
resource "aws_iam_role" "budget_slack_lambda" {
name = "budget-alert-slack-lambda"
assume_role_policy = data.aws_iam_policy_document.lambda_assume.json
}
resource "aws_iam_role_policy_attachment" "lambda_basic" {
role = aws_iam_role.budget_slack_lambda.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole"
}
# Lambda function
data "archive_file" "budget_slack" {
type = "zip"
source_file = "${path.module}/functions/budget_slack.py"
output_path = "${path.module}/functions/budget_slack.zip"
}
resource "aws_lambda_function" "budget_slack" {
filename = data.archive_file.budget_slack.output_path
source_code_hash = data.archive_file.budget_slack.output_base64sha256
function_name = "budget-alert-to-slack"
role = aws_iam_role.budget_slack_lambda.arn
handler = "budget_slack.lambda_handler"
runtime = "python3.12"
timeout = 30
memory_size = 128
environment {
variables = {
SLACK_WEBHOOK_URL = var.slack_webhook_url
}
}
}
# SNS β Lambda subscription
resource "aws_sns_topic_subscription" "budget_to_slack" {
topic_arn = aws_sns_topic.budget_alerts.arn
protocol = "lambda"
endpoint = aws_lambda_function.budget_slack.arn
}
resource "aws_lambda_permission" "sns_invoke" {
statement_id = "AllowSNSInvoke"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.budget_slack.function_name
principal = "sns.amazonaws.com"
source_arn = aws_sns_topic.budget_alerts.arn
}
The Lambda function code (Python):
# budget_slack.py
import json
import os
import urllib.request
def lambda_handler(event, context):
"""Triggered by SNS budget notification."""
for record in event["Records"]:
message = json.loads(record["Sns"]["Message"])
# AWS Budget SNS messages have a specific format
account = message.get("account", "Unknown")
budget = message.get("budgetName", "Unknown")
threshold = message.get("threshold", "?")
actual = message.get("actualAmount", "?")
limit = message.get("budgetLimit", "?")
unit = message.get("unit", "USD")
# Pick emoji based on threshold
pct = float(threshold) if threshold != "?" else 0
if pct >= 100:
emoji = "π΄"
elif pct >= 80:
emoji = "π "
else:
emoji = "π‘"
slack_message = {
"text": (
f"{emoji} *AWS Budget Alert*\n"
f"β’ Budget: *{budget}*\n"
f"β’ Spent: *{actual} {unit}* of {limit} {unit} "
f"(*{threshold}%* threshold crossed)\n"
f"β’ Account: {account}"
)
}
webhook_url = os.environ["SLACK_WEBHOOK_URL"]
req = urllib.request.Request(
webhook_url,
data=json.dumps(slack_message).encode(),
headers={"Content-Type": "application/json"},
)
urllib.request.urlopen(req)
Now your team sees this in Slack the moment a threshold is crossed:
π AWS Budget Alert
β’ Budget: payment-api-prod-monthly-budget
β’ Spent: $812.50 USD of $1,000.00 USD (80% threshold crossed)
β’ Account: 123456789012
π¬ Layer 3: Cost Anomaly Detection (Catch the Weird Stuff)
Budgets catch gradual overspend. But what about sudden spikes? A developer accidentally launches 8 p4d.24xlarge GPU instances instead of t3.medium? That's $250/hour vs $0.04/hour β and a budget alert might not fire until the damage is done.
AWS Cost Anomaly Detection uses ML to spot unusual patterns. Deploy it in a few resource blocks:
# Monitor all services for anomalies
resource "aws_ce_anomaly_monitor" "service_monitor" {
name = "all-services-anomaly-monitor"
monitor_type = "DIMENSIONAL"
monitor_dimension = "SERVICE"
tags = {
ManagedBy = "terraform"
}
}
# SNS topic for anomaly alerts
resource "aws_sns_topic" "anomaly_alerts" {
name = "cost-anomaly-alerts"
}
resource "aws_sns_topic_policy" "anomaly_alerts" {
arn = aws_sns_topic.anomaly_alerts.arn
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "AllowCostAnomalyPublish"
Effect = "Allow"
Principal = { Service = "costalerts.amazonaws.com" }
Action = "SNS:Publish"
Resource = aws_sns_topic.anomaly_alerts.arn
}
]
})
}
# Subscribe to anomaly alerts β notify when impact > $100 AND > 20%
resource "aws_ce_anomaly_subscription" "alerts" {
name = "cost-anomaly-alerts"
frequency = "IMMEDIATE"
monitor_arn_list = [
aws_ce_anomaly_monitor.service_monitor.arn
]
subscriber {
type = "SNS"
address = aws_sns_topic.anomaly_alerts.arn
}
# Alert when BOTH conditions are true:
# - Anomaly cost impact β₯ $100 (ignore trivial spikes)
# - Anomaly cost impact β₯ 20% above baseline (catch real anomalies)
threshold_expression {
and {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
match_options = ["GREATER_THAN_OR_EQUAL"]
values = ["100"]
}
}
and {
dimension {
key = "ANOMALY_TOTAL_IMPACT_PERCENTAGE"
match_options = ["GREATER_THAN_OR_EQUAL"]
values = ["20"]
}
}
}
}
π‘ Pro tip: Use BOTH
ANOMALY_TOTAL_IMPACT_ABSOLUTEANDANOMALY_TOTAL_IMPACT_PERCENTAGEtogether. Percentage-only alerts fire on a $5 β $10 spike (100% increase, but who cares). Absolute-only alerts miss a $1,000 β $1,200 spike on a high-spend account. Combining both eliminates noise and catches real problems.
Now you'll get alerts when AWS detects anomalies like:
β οΈ Cost Anomaly Detected
Service: Amazon Elastic Compute Cloud
Impact: +$847/day above baseline
Root Cause: Unusual number of running instances in us-east-1
Severity: High
Cost: Anomaly Detection is free. Zero excuse not to have it. π―
β οΈ Layer 4: Budget Actions β The Kill Switch
This is AWS's native automated response system. When a budget threshold is breached, AWS can automatically apply a deny IAM policy to prevent further resource creation, or stop specific EC2/RDS instances. No Lambda required.
Option A: Auto-apply a deny policy (block new resource creation)
# The deny policy β prevents launching new EC2 and RDS instances
resource "aws_iam_policy" "deny_ec2_rds_create" {
name = "budget-deny-ec2-rds-create"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "DenyNewCompute"
Effect = "Deny"
Action = [
"ec2:RunInstances",
"ec2:StartInstances",
"ec2:CreateVolume",
"rds:CreateDBInstance",
"rds:StartDBInstance",
"sagemaker:CreateNotebookInstance",
"sagemaker:StartNotebookInstance"
]
Resource = "*"
}
]
})
}
# IAM role for AWS Budgets to execute actions
data "aws_iam_policy_document" "budgets_assume" {
statement {
actions = ["sts:AssumeRole"]
principals {
type = "Service"
identifiers = ["budgets.amazonaws.com"]
}
condition {
test = "StringEquals"
variable = "aws:SourceAccount"
values = [data.aws_caller_identity.current.account_id]
}
condition {
test = "ArnLike"
variable = "aws:SourceArn"
values = ["arn:aws:budgets::${data.aws_caller_identity.current.account_id}:budget/*"]
}
}
}
resource "aws_iam_role" "budgets_action" {
name = "budgets-action-execution-role"
assume_role_policy = data.aws_iam_policy_document.budgets_assume.json
}
resource "aws_iam_role_policy" "budgets_action" {
name = "budgets-action-permissions"
role = aws_iam_role.budgets_action.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"iam:AttachGroupPolicy",
"iam:AttachRolePolicy",
"iam:AttachUserPolicy",
"iam:DetachGroupPolicy",
"iam:DetachRolePolicy",
"iam:DetachUserPolicy"
]
Resource = "*"
}
]
})
}
# The Budget Action β auto-applies deny policy at 100% spend
resource "aws_budgets_budget_action" "deny_on_exceed" {
count = var.environment == "prod" ? 0 : 1 # π Never in prod!
budget_name = aws_budgets_budget.monthly_with_sns.name
action_type = "APPLY_IAM_POLICY"
approval_model = "AUTOMATIC"
notification_type = "ACTUAL"
execution_role_arn = aws_iam_role.budgets_action.arn
action_threshold {
action_threshold_type = "PERCENTAGE"
action_threshold_value = 100
}
definition {
iam_action_definition {
policy_arn = aws_iam_policy.deny_ec2_rds_create.arn
roles = var.dev_role_names # Roles to restrict
}
}
subscriber {
subscription_type = "EMAIL"
address = var.alert_emails[0]
}
}
β οΈ WARNING: Budget Actions that apply IAM policies block users from creating new resources until the next budget period. This is intentional for dev/staging. NEVER use
AUTOMATICapproval on production β useMANUALinstead so a human reviews the action before it executes.
The count = var.environment == "prod" ? 0 : 1 is your safety net β this action literally cannot exist in a production account. π‘οΈ
Option B: Auto-stop specific EC2/RDS instances (surgical kill switch)
resource "aws_budgets_budget_action" "stop_instances" {
count = var.environment == "prod" ? 0 : 1
budget_name = aws_budgets_budget.monthly_with_sns.name
action_type = "RUN_SSM_DOCUMENTS"
approval_model = "AUTOMATIC"
notification_type = "ACTUAL"
execution_role_arn = aws_iam_role.budgets_action_ssm.arn
action_threshold {
action_threshold_type = "PERCENTAGE"
action_threshold_value = 100
}
definition {
ssm_action_definition {
action_sub_type = "STOP_EC2_INSTANCES"
region = var.region
instance_ids = var.killable_instance_ids
}
}
subscriber {
subscription_type = "EMAIL"
address = var.alert_emails[0]
}
}
π‘ Key difference from GCP/Azure: AWS Budget Actions are native β no Lambda or Cloud Function required. AWS can directly apply IAM policies, SCPs (in Organizations), or stop EC2/RDS instances. Policy-based actions auto-reset at the start of the next budget period. Instance stop actions do NOT auto-reset β instances stay stopped until manually restarted.
π The Multi-Account Budget Matrix
Real companies don't have one account. They have dozens. Here's how to budget all of them from a single Terraform module:
variable "account_budgets" {
type = map(object({
account_id = string
monthly_budget = string
environment = string
alert_emails = list(string)
}))
default = {
"payment-api-prod" = {
account_id = "111111111111"
monthly_budget = "5000"
environment = "prod"
alert_emails = ["finops@company.com", "payments-lead@company.com"]
}
"payment-api-dev" = {
account_id = "222222222222"
monthly_budget = "500"
environment = "dev"
alert_emails = ["eng-lead@company.com"]
}
"ml-pipeline-staging" = {
account_id = "333333333333"
monthly_budget = "2000"
environment = "staging"
alert_emails = ["ml-team@company.com"]
}
}
}
resource "aws_budgets_budget" "per_account" {
for_each = var.account_budgets
name = "${each.key}-budget"
budget_type = "COST"
limit_amount = each.value.monthly_budget
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filter {
name = "LinkedAccount"
values = [each.value.account_id]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 50
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = each.value.alert_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = each.value.alert_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = each.value.alert_emails
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 90
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = each.value.alert_emails
}
}
Add a new account? Add one entry to the map. terraform apply. Done. Every account gets identical protection. β
π§± Per-Service Budgets (Catch the Expensive Outliers)
Some AWS services are particularly dangerous. EC2, SageMaker, and RDS can rack up thousands overnight. Create targeted budgets for your riskiest services:
variable "service_budgets" {
type = map(object({
service_name = string
monthly_budget = string
}))
default = {
ec2 = {
service_name = "Amazon Elastic Compute Cloud - Compute"
monthly_budget = "3000"
}
rds = {
service_name = "Amazon Relational Database Service"
monthly_budget = "2000"
}
sagemaker = {
service_name = "Amazon SageMaker"
monthly_budget = "1000"
}
}
}
resource "aws_budgets_budget" "per_service" {
for_each = var.service_budgets
name = "service-${each.key}-monthly"
budget_type = "COST"
limit_amount = each.value.monthly_budget
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filter {
name = "Service"
values = [each.value.service_name]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 90
threshold_type = "PERCENTAGE"
notification_type = "FORECASTED"
subscriber_email_addresses = var.alert_emails
subscriber_sns_topic_arns = [aws_sns_topic.budget_alerts.arn]
}
}
β‘ Quick Audit: Check Your Current Budget Coverage
# List ALL existing budgets on your account
aws budgets describe-budgets \
--account-id $(aws sts get-caller-identity --query Account --output text) \
--output table
# Quick check β do you have ANY budgets?
aws budgets describe-budgets \
--account-id $(aws sts get-caller-identity --query Account --output text) \
--query "Budgets | length(@)" \
--output text
# Check anomaly detection monitors
aws ce get-anomaly-monitors --output table
If describe-budgets returns zero results β that account has zero cost guardrails. Fix it today. π¨
π‘ Architect Pro Tips
Always use
FORECASTEDalerts βACTUALalerts tell you money is already spent.FORECASTEDalerts tell you money will be spent. The forecasted alert at 90% is the single most valuable budget notification you can deploy. Note: AWS needs ~5 weeks of data to generate forecasts.Don't over-alert β Alerts at 10%, 20%, 30%... creates noise and gets ignored. Stick to 3-4 strategic thresholds (50% info, 80% warning, 100% critical, 90% forecasted). Alert fatigue kills FinOps.
Billing data is delayed β AWS billing data updates up to 3 times per day, and there can be 24+ hours of lag. Set your budgets slightly BELOW your true pain threshold to account for this. If $1,000 is your real limit, set the budget to $800.
First 2 budgets are free β AWS gives you 2 free budget alerts per account. Each additional budget is ~$0.62/month. For most teams, you need 3-5 budgets (account-level + per-service). That's roughly $2/month for complete cost visibility.
Budget Actions reset differently β IAM/SCP policy actions auto-reset at the start of the next budget period. Instance stop actions do NOT β stopped instances stay stopped. Plan your automation accordingly.
Combine Budget Actions with Organizations SCPs β For multi-account setups, Budget Actions can apply Service Control Policies from the management account to member accounts. This is the most powerful kill switch available β it can block ALL resource creation org-wide.
Use Cost Allocation Tags β Per-team budgets filtered by
CostCenterorTeamtags only work if resources are tagged. Enforce tagging with AWS Organizations Tag Policies first, then build tag-filtered budgets on top.
π Quick Reference: What to Deploy First
| Layer | Effort | Cost | Impact |
|---|---|---|---|
| Budget alerts (email) | 5 min | Free (first 2) | Baseline visibility |
| Forecasted spend alert | 2 min | Free | Early warning before overspend |
| SNS β Slack | 20 min | ~$0/month (free tier) | Team-wide awareness |
| Cost Anomaly Detection | 5 min | Free | ML-powered spike detection |
| Budget Actions (non-prod) | 15 min | Free | Automatic cost cap |
| Multi-account budget map | 10 min | ~$0.62/budget/month | Org-wide protection |
| Per-service budgets | 10 min | ~$0.62/budget/month | Targeted monitoring |
Start with budget alerts and anomaly detection. They're both free, they take 10 minutes total, and they're the single best thing you can do to avoid a surprise bill. π―
π TL;DR
Budget alerts = FREE (first 2), takes 5 min, no excuse to skip
FORECASTED alerts = warns you BEFORE you hit the limit (needs ~5 weeks data)
SNS β Slack = real-time team visibility, pennies/month
Cost Anomaly Detection = FREE, ML-powered, catches spikes budgets miss
Budget Actions = native kill switch β deny policies or stop instances
for_each budgets = one Terraform map protects every account
Billing delay = up to 24hrs lag, so set budgets BELOW your true limit
No hard cap exists = AWS will never stop charging you automatically
Bottom line: AWS will happily charge you $72K while you sleep. Budget alerts are free, take 5 minutes, and are the only thing standing between you and a career-ending invoice. Deploy them now. π₯
Your dev account doesn't have a budget alert yet, does it? Run aws budgets describe-budgets right now β if it returns zero, go deploy that aws_budgets_budget resource. It's free and takes 5 minutes. Your future self (and your CFO) will thank you. π
Found this helpful? Follow for more AWS cost optimization with Terraform! π¬
Top comments (0)