Suhas Mallesh

Posted on Feb 18

Your Azure Bill Hit $20K and Nobody Noticed 😱 Deploy Budget Alerts as Code with Terraform

#azure #terraform #devops #cloud

By the time you check Azure Cost Management manually, the damage is done. Here's how to deploy budget alerts, anomaly detection, and Slack/Teams notifications as code with Terraform - so overspend gets caught in hours, not months.

Last month, a dev team spun up 12 GPU VMs for a weekend ML experiment. They forgot to tear them down. Cost: $6,400 in 9 days. Nobody knew until the invoice arrived. 💀

Sound familiar? Here's the thing — Azure Cost Management has budgets and alerts built in. But if they're not deployed, they don't exist. And if they're configured manually through the portal, they drift, get missed during new subscription provisioning, and nobody remembers who set them up.

Budget alerts belong in code. Every subscription, every resource group, every environment — provisioned automatically alongside the infrastructure it monitors. Zero manual clicks. Zero gaps.

Let's wire it up. ⚡

💸 The Real Cost of "We'll Check It Later"

Azure evaluates budgets every 24 hours. But here's the catch — most companies don't have budgets deployed at all:

❌ No budget alerts = Overspend discovered at invoice (30 days too late)
❌ Portal-only budgets = Forgotten when new subscriptions are created
❌ Email-only alerts = Buried in inboxes, never acted on
❌ No forecasted alerts = Spike hits 100% before anyone reacts
❌ No anomaly detection = Rogue resources run for weeks unnoticed

A proper cost alerting strategy has three layers:

Layer	What It Catches	How Fast
Budget alerts (actual)	Spend crossing thresholds	Within hours
Budget alerts (forecasted)	Projected overspend before it happens	1-2 days early
Anomaly detection	Unusual spending patterns	Daily

Let's deploy all three with Terraform. 🎯

🛡️ Step 1: Subscription-Level Budget with Smart Thresholds

Don't set one alert at 100%. By then the money is spent. Use progressive thresholds that escalate notifications:

# budgets/subscription-budget/main.tf

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 4.0"
    }
  }
}

provider "azurerm" {
  features {}
}

data "azurerm_subscription" "current" {}

# --- Action Group: Where alerts go ---
resource "azurerm_resource_group" "cost_mgmt" {
  name     = "rg-cost-management"
  location = "eastus"

  tags = {
    Environment = "shared"
    CostCenter  = "platform"
    Owner       = "team-finops"
    Project     = "cost-governance"
    ManagedBy   = "terraform"
  }
}

resource "azurerm_monitor_action_group" "cost_alerts" {
  name                = "ag-cost-alerts"
  resource_group_name = azurerm_resource_group.cost_mgmt.name
  short_name          = "CostAlert"

  email_receiver {
    name          = "FinOps Team"
    email_address = "finops@company.com"
  }

  email_receiver {
    name          = "Engineering Lead"
    email_address = "eng-lead@company.com"
  }

  # Optional: Send to Slack/Teams via webhook
  webhook_receiver {
    name        = "SlackNotification"
    service_uri = var.slack_webhook_url
  }
}

# --- Subscription Budget with Progressive Alerts ---
resource "azurerm_consumption_budget_subscription" "monthly" {
  name            = "budget-monthly-subscription"
  subscription_id = data.azurerm_subscription.current.id
  amount          = var.monthly_budget_amount
  time_grain      = "Monthly"

  time_period {
    start_date = "2026-02-01T00:00:00Z"
    # No end_date = budget auto-renews every month ✅
  }

  # 🟢 50% — Informational (heads up, halfway there)
  notification {
    enabled        = true
    threshold      = 50.0
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = ["finops@company.com"]
  }

  # 🟡 80% — Warning (time to investigate)
  notification {
    enabled        = true
    threshold      = 80.0
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = ["finops@company.com", "eng-lead@company.com"]
    contact_groups = [azurerm_monitor_action_group.cost_alerts.id]
  }

  # 🔴 100% — Budget exceeded (action required)
  notification {
    enabled        = true
    threshold      = 100.0
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = ["finops@company.com", "eng-lead@company.com", "vp-eng@company.com"]
    contact_groups = [azurerm_monitor_action_group.cost_alerts.id]
    contact_roles  = ["Owner", "Contributor"]
  }

  # 🔮 90% FORECASTED — Predict overspend BEFORE it happens
  notification {
    enabled        = true
    threshold      = 90.0
    operator       = "GreaterThan"
    threshold_type = "Forecasted"
    contact_emails = ["finops@company.com", "eng-lead@company.com"]
    contact_groups = [azurerm_monitor_action_group.cost_alerts.id]
  }
}

Why this works: The forecasted alert at 90% is your early warning system. Azure projects your month-end spend based on current burn rate — so you get notified before you actually overspend. That's the difference between preventing a $5K overrun and explaining one. 📬

🔬 Step 2: Per-Team Budgets Using Resource Group Filters

Subscription-level budgets catch the big picture. But you also need team-level accountability. Use the tag-based filtering (built on the tagging from Part 1) to create budgets per cost center:

# budgets/per-team-budgets/main.tf

variable "team_budgets" {
  description = "Budget configuration per team"
  type = map(object({
    amount       = number
    cost_center  = string
    alert_emails = list(string)
  }))
  default = {
    payments = {
      amount       = 8000
      cost_center  = "CC-1042"
      alert_emails = ["payments-team@company.com"]
    }
    data-platform = {
      amount       = 15000
      cost_center  = "CC-2001"
      alert_emails = ["data-team@company.com"]
    }
    frontend = {
      amount       = 3000
      cost_center  = "CC-3005"
      alert_emails = ["frontend-team@company.com"]
    }
  }
}

resource "azurerm_consumption_budget_subscription" "per_team" {
  for_each = var.team_budgets

  name            = "budget-team-${each.key}"
  subscription_id = data.azurerm_subscription.current.id
  amount          = each.value.amount
  time_grain      = "Monthly"

  time_period {
    start_date = "2026-02-01T00:00:00Z"
  }

  # Filter by CostCenter tag — only counts resources tagged to this team
  filter {
    tag {
      name   = "CostCenter"
      values = [each.value.cost_center]
    }
  }

  notification {
    enabled        = true
    threshold      = 80.0
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = each.value.alert_emails
  }

  notification {
    enabled        = true
    threshold      = 100.0
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = concat(each.value.alert_emails, ["finops@company.com"])
    contact_groups = [azurerm_monitor_action_group.cost_alerts.id]
  }

  notification {
    enabled        = true
    threshold      = 90.0
    operator       = "GreaterThan"
    threshold_type = "Forecasted"
    contact_emails = each.value.alert_emails
  }
}

Now each team sees their own spend and gets alerted when their resources are trending over budget — not the entire subscription's noise. This is where the mandatory tagging from Part 1 pays off. 🏷️ → 💰

🤖 Step 3: Anomaly Detection (Catch the Weird Stuff)

Budgets catch gradual overspend. But what about sudden spikes? A developer accidentally provisions 8 Standard_E64s_v5 VMs instead of E4s_v5? That's $2,400/day vs $300/day — and a budget alert might not fire until day 15.

Azure's cost anomaly detection uses ML to spot unusual patterns. Deploy it in one resource block:

# budgets/anomaly-detection/main.tf

resource "azurerm_cost_anomaly_alert" "subscription" {
  name            = "cost-anomaly-alert"
  display_name    = "Cost Anomaly Detection"
  subscription_id = "/subscriptions/${data.azurerm_subscription.current.subscription_id}"
  email_subject   = "⚠️ Cost Anomaly Detected - ${data.azurerm_subscription.current.display_name}"
  email_addresses = ["finops@company.com", "eng-lead@company.com"]
}

That's it. Five lines. You'll get daily emails when Azure detects spending anomalies like:

⚠️ Cost Anomaly Detected - Production Subscription

Anomaly: Daily run rate UP 340% on Feb 15
Resource Group: rg-ml-experiments
Estimated Impact: +$847/day above normal
Top Contributor: Microsoft.Compute/virtualMachines

Now your team catches runaway resources in 24 hours instead of 30 days. 🚨

🧱 Step 4: Reusable Budget Module (The Architect's Way)

In a real organization, you're managing 5-20+ subscriptions. You need a module that every team can drop in:

# modules/cost-alerting/main.tf

variable "subscription_id" {
  type = string
}

variable "monthly_budget" {
  type        = number
  description = "Monthly budget in USD"
}

variable "alert_emails" {
  type = list(string)
}

variable "action_group_id" {
  type    = string
  default = ""
}

variable "environment" {
  type = string
}

variable "enable_anomaly_detection" {
  type    = bool
  default = true
}

locals {
  budget_start = formatdate("YYYY-MM-01'T'00:00:00'Z'", timestamp())

  # Scale alert urgency by environment
  thresholds = {
    prod    = { warn = 70, critical = 90, forecast = 80 }
    staging = { warn = 80, critical = 100, forecast = 90 }
    dev     = { warn = 90, critical = 110, forecast = 100 }
  }

  env_thresholds = local.thresholds[var.environment]
}

resource "azurerm_consumption_budget_subscription" "this" {
  name            = "budget-${var.environment}-monthly"
  subscription_id = var.subscription_id
  amount          = var.monthly_budget
  time_grain      = "Monthly"

  time_period {
    start_date = local.budget_start
  }

  notification {
    enabled        = true
    threshold      = local.env_thresholds.warn
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = var.alert_emails
  }

  notification {
    enabled        = true
    threshold      = local.env_thresholds.critical
    operator       = "GreaterThan"
    threshold_type = "Actual"
    contact_emails = var.alert_emails
    contact_groups = var.action_group_id != "" ? [var.action_group_id] : []
    contact_roles  = ["Owner"]
  }

  notification {
    enabled        = true
    threshold      = local.env_thresholds.forecast
    operator       = "GreaterThan"
    threshold_type = "Forecasted"
    contact_emails = var.alert_emails
  }

  lifecycle {
    ignore_changes = [time_period]
  }
}

resource "azurerm_cost_anomaly_alert" "this" {
  count = var.enable_anomaly_detection ? 1 : 0

  name            = "anomaly-${var.environment}"
  display_name    = "Cost Anomaly - ${var.environment}"
  subscription_id = var.subscription_id
  email_subject   = "⚠️ Anomaly in ${var.environment}"
  email_addresses = var.alert_emails
}

Usage across all your subscriptions:

module "prod_cost_alerts" {
  source          = "./modules/cost-alerting"
  subscription_id = "/subscriptions/aaaa-bbbb-cccc"
  monthly_budget  = 25000
  environment     = "prod"
  alert_emails    = ["finops@company.com", "sre@company.com"]
  action_group_id = azurerm_monitor_action_group.cost_alerts.id
  # → Warns at 70%, critical at 90%, forecast at 80%
}

module "dev_cost_alerts" {
  source          = "./modules/cost-alerting"
  subscription_id = "/subscriptions/dddd-eeee-ffff"
  monthly_budget  = 5000
  environment     = "dev"
  alert_emails    = ["eng-lead@company.com"]
  # → More relaxed: warns at 90%, critical at 110%, forecast at 100%
}

Production gets tighter thresholds than dev. Because a 20% overspend on a $25K prod subscription is $5,000. On a $5K dev subscription, it's only $1,000. Prioritize accordingly. 🎯

⚡ Quick Audit: Check Your Current Budget Coverage

# List ALL existing budgets on your subscription
az consumption budget list --output table

# Check if you have ANY budgets (if this returns empty — you're flying blind)
az consumption budget list --query "length(@)" --output tsv

# View current spend vs budget for a specific budget
az consumption budget show \
  --budget-name "budget-monthly-subscription" \
  --query "{Name:name, Amount:amount, CurrentSpend:currentSpend.amount, Currency:currentSpend.unit}" \
  --output table

If az consumption budget list returns zero results on any subscription — that subscription has zero cost guardrails. Fix it today. 🚨

💡 Architect Pro Tips

Always use Forecasted alerts — Actual alerts tell you money is already spent. Forecasted alerts tell you money will be spent. The forecasted alert at 80-90% is the single most valuable budget notification you can deploy.
Don't over-alert — Alerts at 10%, 20%, 30%... creates noise and gets ignored. Stick to 3-4 strategic thresholds (50% info, 80% warning, 100% critical, 90% forecasted). Alert fatigue kills FinOps.
Budget start dates matter — start_date must be the 1st of a current or future month. Setting it to a past date causes Terraform errors. Use lifecycle { ignore_changes = [time_period] } so Terraform doesn't try to update it every run.
Action Groups are the multiplier — Email alone isn't enough. Wire action groups to Slack webhooks, Teams channels, PagerDuty, or even Azure Functions that can automatically shut down non-production resources when critical thresholds are breached.
Combine with tags from Part 1 — Per-team budgets filtered by CostCenter tag only work if resources are actually tagged. The tagging policies from Part 1 are a prerequisite for this to deliver real value.

📊 TL;DR

Action	Impact	Effort
Subscription budget with 4 thresholds	Catch overspend in hours, not months	10 minutes
Per-team budgets filtered by CostCenter	Team-level cost accountability	15 minutes
Cost anomaly detection	Catch rogue resources in 24 hours	2 minutes
Reusable module across subscriptions	Consistent coverage, zero gaps	20 minutes
Action Group with Slack/Teams webhook	Alerts where your team actually looks	10 minutes

Bottom line: Every Azure subscription should have budget alerts deployed before any workloads go on it. Budgets as code means every new subscription gets cost guardrails automatically — same as it gets networking, RBAC, and policies. Deploy this alongside your tagging strategy from Part 1 and you'll have full cost visibility and proactive alerting in under an hour. 📊

Run az consumption budget list right now. If it returns zero — you've been flying without instruments. Fix it before the next invoice arrives. 😏

This is Part 2 of the "Save on Azure with Terraform" series. Next up: Lights Out! 🌙 — Auto Shutdown & Start Schedules for Azure VMs that can cut your dev/test compute bill by 65%. 💬

DEV Community