Most EBS volumes are wildly over-provisioned. Here's how to find the bloated ones, safely shrink them, and automate right-sizing with Terraform.
Here's a question nobody asks often enough: How much of your EBS storage are you actually using?
In most AWS accounts, the answer is terrifyingly low. Teams provision 500GB "just in case" and use 40GB. They request io2 when gp3 would be fine. They set 10,000 IOPS when the volume barely hits 200.
You're paying for every unused gigabyte, every idle IOP, every megabyte of throughput β every second of every day.
Let's find the waste and kill it. πͺ
πΈ Where the Money Hides
EBS pricing has three dimensions, and most teams overspend on all of them:
EBS Cost = Storage (GB) + IOPS + Throughput
gp3 pricing:
Storage: $0.08/GB/month
IOPS: Free up to 3,000, then $0.005/IOPS
Throughput: Free up to 125 MB/s, then $0.04/MB/s
io2 pricing:
Storage: $0.125/GB/month
IOPS: $0.065/IOPS/month β This gets expensive FAST
A real example from a production account:
| Volume | Provisioned | Actual Usage | Monthly Waste |
|---|---|---|---|
| 500GB gp3, 5000 IOPS | $50/mo | 45GB used, 200 IOPS peak | $35/mo |
| 200GB io2, 10000 IOPS | $675/mo | 80GB used, 1500 IOPS peak | $537/mo |
| 1TB gp3, 3000 IOPS | $80/mo | 120GB used, 500 IOPS peak | $70/mo |
| Total waste | $642/mo = $7,704/yr π€― |
Three volumes. Nearly $8K/year wasted. And most accounts have dozens.
π Step 1: Find the Bloated Volumes (Terraform + CloudWatch)
Deploy this monitoring module to identify over-provisioned volumes:
# modules/ebs-monitor/main.tf
resource "aws_lambda_function" "ebs_analyzer" {
filename = data.archive_file.analyzer.output_path
function_name = "ebs-rightsizing-analyzer"
role = aws_iam_role.analyzer.arn
handler = "index.handler"
runtime = "python3.12"
timeout = 300
source_code_hash = data.archive_file.analyzer.output_base64sha256
environment {
variables = {
SNS_TOPIC_ARN = aws_sns_topic.ebs_alerts.arn
LOOKBACK_DAYS = "14"
USAGE_THRESHOLD = "50" # Flag if <50% utilized
}
}
}
data "archive_file" "analyzer" {
type = "zip"
output_path = "${path.module}/analyzer.zip"
source {
content = <<-PYTHON
import boto3
import os
from datetime import datetime, timedelta
ec2 = boto3.client('ec2')
cw = boto3.client('cloudwatch')
sns = boto3.client('sns')
def get_metric_max(volume_id, metric_name, days):
"""Get max value of a CloudWatch metric over N days."""
response = cw.get_metric_statistics(
Namespace='AWS/EBS',
MetricName=metric_name,
Dimensions=[{'Name': 'VolumeId', 'Value': volume_id}],
StartTime=datetime.utcnow() - timedelta(days=days),
EndTime=datetime.utcnow(),
Period=3600,
Statistics=['Maximum']
)
points = response.get('Datapoints', [])
return max((p['Maximum'] for p in points), default=0)
def handler(event, context):
days = int(os.environ['LOOKBACK_DAYS'])
threshold = int(os.environ['USAGE_THRESHOLD'])
volumes = ec2.describe_volumes(
Filters=[{'Name': 'status', 'Values': ['in-use']}]
)['Volumes']
recommendations = []
for vol in volumes:
vol_id = vol['VolumeId']
vol_type = vol['VolumeType']
size_gb = vol['Size']
provisioned_iops = vol.get('Iops', 0)
provisioned_tp = vol.get('Throughput', 0)
# Get peak usage over lookback period
peak_read_ops = get_metric_max(vol_id, 'VolumeReadOps', days)
peak_write_ops = get_metric_max(vol_id, 'VolumeWriteOps', days)
peak_iops = (peak_read_ops + peak_write_ops) / 3600 # Convert to per-second
peak_read_bytes = get_metric_max(vol_id, 'VolumeReadBytes', days)
peak_write_bytes = get_metric_max(vol_id, 'VolumeWriteBytes', days)
peak_throughput = (peak_read_bytes + peak_write_bytes) / 3600 / 1024 / 1024 # MB/s
savings = []
# Check IOPS utilization
if provisioned_iops > 3000 and peak_iops < provisioned_iops * (threshold / 100):
recommended_iops = max(3000, int(peak_iops * 1.3)) # 30% headroom
iops_savings = (provisioned_iops - recommended_iops) * 0.005
if vol_type == 'io2':
iops_savings = (provisioned_iops - recommended_iops) * 0.065
savings.append(f" IOPS: {provisioned_iops} β {recommended_iops} (save ${iops_savings:.2f}/mo)")
# Check if io2 can downgrade to gp3
if vol_type in ('io1', 'io2') and peak_iops < 16000 and peak_throughput < 1000:
current_cost = size_gb * 0.125 + provisioned_iops * 0.065
gp3_iops = max(3000, int(peak_iops * 1.3))
gp3_cost = size_gb * 0.08 + max(0, gp3_iops - 3000) * 0.005
type_savings = current_cost - gp3_cost
if type_savings > 5:
savings.append(f" Type: {vol_type} β gp3 (save ${type_savings:.2f}/mo)")
# Check throughput utilization (gp3 only)
if vol_type == 'gp3' and provisioned_tp > 125:
if peak_throughput < provisioned_tp * (threshold / 100):
recommended_tp = max(125, int(peak_throughput * 1.3))
tp_savings = (provisioned_tp - recommended_tp) * 0.04
savings.append(f" Throughput: {provisioned_tp} β {recommended_tp} MB/s (save ${tp_savings:.2f}/mo)")
if savings:
# Get instance name
attachments = vol.get('Attachments', [])
instance_id = attachments[0]['InstanceId'] if attachments else 'detached'
tags = {t['Key']: t['Value'] for t in vol.get('Tags', [])}
name = tags.get('Name', vol_id)
recommendations.append(
f"{name} ({vol_id}) - attached to {instance_id}\n"
f" Current: {size_gb}GB {vol_type}, {provisioned_iops} IOPS\n"
f" Peak IOPS: {peak_iops:.0f}, Peak Throughput: {peak_throughput:.1f} MB/s\n"
+ "\n".join(savings)
)
if recommendations:
total_recs = len(recommendations)
message = (
f"EBS Right-Sizing Report ({total_recs} volumes need attention)\n"
f"Lookback period: {days} days\n\n"
+ "\n\n".join(recommendations)
)
sns.publish(
TopicArn=os.environ['SNS_TOPIC_ARN'],
Subject=f'EBS Right-Sizing: {total_recs} volumes over-provisioned',
Message=message
)
return {'volumes_analyzed': len(volumes), 'recommendations': len(recommendations)}
PYTHON
filename = "index.py"
}
}
# Run weekly
resource "aws_cloudwatch_event_rule" "weekly_ebs_check" {
name = "ebs-rightsizing-check"
schedule_expression = "rate(7 days)"
}
resource "aws_cloudwatch_event_target" "ebs_analyzer" {
rule = aws_cloudwatch_event_rule.weekly_ebs_check.name
arn = aws_lambda_function.ebs_analyzer.arn
}
resource "aws_lambda_permission" "allow_eventbridge" {
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.ebs_analyzer.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.weekly_ebs_check.arn
}
resource "aws_sns_topic" "ebs_alerts" {
name = "ebs-rightsizing-alerts"
}
resource "aws_iam_role" "analyzer" {
name = "ebs-rightsizing-analyzer-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "lambda.amazonaws.com" }
}]
})
}
resource "aws_iam_role_policy" "analyzer" {
name = "ebs-rightsizing-analyzer-policy"
role = aws_iam_role.analyzer.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = ["ec2:DescribeVolumes"]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"cloudwatch:GetMetricStatistics"
]
Resource = "*"
},
{
Effect = "Allow"
Action = ["sns:Publish"]
Resource = aws_sns_topic.ebs_alerts.arn
},
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "arn:aws:logs:*:*:*"
}
]
})
}
You'll get a weekly email like this:
EBS Right-Sizing Report (3 volumes need attention)
Lookback period: 14 days
app-server-data (vol-0abc123) - attached to i-0def456
Current: 500GB gp3, 5000 IOPS
Peak IOPS: 180, Peak Throughput: 12.3 MB/s
IOPS: 5000 β 3000 (save $10.00/mo)
database-logs (vol-0xyz789) - attached to i-0ghi012
Current: 200GB io2, 10000 IOPS
Peak IOPS: 1420, Peak Throughput: 45.2 MB/s
Type: io2 β gp3 (save $537.00/mo)
IOPS: 10000 β 3000 (save $0.00/mo)
Actionable, specific, dollar amounts. No guessing. π¬
ποΈ Step 2: Apply the Right-Sizing (Terraform)
Downgrade io2 β gp3 (Biggest savings)
# Before: io2 with expensive provisioned IOPS πΈ
resource "aws_ebs_volume" "database_logs" {
availability_zone = "us-east-1a"
size = 200
type = "io2"
iops = 10000 # Paying $650/mo for IOPS alone
tags = { Name = "database-logs" }
}
# After: gp3 with free baseline IOPS β
resource "aws_ebs_volume" "database_logs" {
availability_zone = "us-east-1a"
size = 200
type = "gp3"
iops = 3000 # Free baseline
throughput = 125 # Free baseline
tags = { Name = "database-logs" }
}
# Savings: $659/mo β $16/mo = $643/mo saved π€―
Reduce over-provisioned IOPS
# Before: 5000 IOPS but peaks at 180
resource "aws_ebs_volume" "app_data" {
size = 500
type = "gp3"
iops = 5000 # $10/mo for IOPS you don't use
throughput = 250 # $5/mo for throughput you don't use
tags = { Name = "app-data" }
}
# After: Use free baselines β
resource "aws_ebs_volume" "app_data" {
size = 500
type = "gp3"
iops = 3000 # Free! Covers 180 peak with 16x headroom
throughput = 125 # Free! Covers 12 MB/s peak easily
tags = { Name = "app-data" }
}
# Savings: $15/mo β $0 extra = $15/mo saved
Right-Size with Environment-Aware Defaults
# modules/ebs-rightsized/main.tf
variable "environment" {
type = string
}
variable "size_gb" {
type = number
}
variable "workload_type" {
type = string
default = "general" # general, database, logging
validation {
condition = contains(["general", "database", "logging"], var.workload_type)
error_message = "Must be: general, database, or logging."
}
}
locals {
# Smart defaults based on workload + environment
volume_configs = {
general = {
type = "gp3"
iops = 3000 # Free baseline is enough for most workloads
throughput = 125
}
database = {
type = var.environment == "prod" ? "gp3" : "gp3"
iops = var.environment == "prod" ? 6000 : 3000
throughput = var.environment == "prod" ? 250 : 125
}
logging = {
type = "gp3"
iops = 3000 # Logs are sequential writes, don't need high IOPS
throughput = var.environment == "prod" ? 250 : 125
}
}
config = local.volume_configs[var.workload_type]
}
resource "aws_ebs_volume" "this" {
availability_zone = var.availability_zone
size = var.size_gb
type = local.config.type
iops = local.config.iops
throughput = local.config.throughput
tags = {
Name = var.name
Environment = var.environment
Workload = var.workload_type
ManagedBy = "terraform"
}
}
Usage:
module "app_volume" {
source = "./modules/ebs-rightsized"
name = "app-data"
environment = "dev"
size_gb = 100
workload_type = "general"
# β gp3, 3000 IOPS (free), 125 MB/s (free) β
}
module "db_volume" {
source = "./modules/ebs-rightsized"
name = "postgres-data"
environment = "prod"
size_gb = 500
workload_type = "database"
# β gp3, 6000 IOPS, 250 MB/s (only pays for extra) β
}
β‘ Quick Audit: Run This Right Now
# Find your most expensive EBS volumes
aws ec2 describe-volumes \
--query 'Volumes[?State==`in-use`].{
ID:VolumeId,
Type:VolumeType,
Size:Size,
IOPS:Iops,
Throughput:Throughput,
Instance:Attachments[0].InstanceId
}' \
--output table
# Find io1/io2 volumes (biggest savings targets)
aws ec2 describe-volumes \
--filters "Name=volume-type,Values=io1,io2" \
--query 'Volumes[].{ID:VolumeId,Size:Size,IOPS:Iops,Cost:to_string(Size)}' \
--output table
If you see any io1 or io2 volumes β that's where the money is. π―
π‘ Pro Tips
- Never right-size blind β Always check 14+ days of CloudWatch metrics before changing anything
- Add 30% headroom β If peak IOPS is 1,500, set to 2,000 not 1,500. Traffic spikes happen
- io2 β gp3 is the biggest win β io2 IOPS cost 13x more than gp3 ($0.065 vs $0.005)
- gp3 baseline is generous β 3,000 IOPS and 125 MB/s are free. Most workloads never exceed this
- You can modify live volumes β AWS supports online volume modification. No downtime needed for type/IOPS/throughput changes β
- Size can only go up β You can't shrink an EBS volume. For oversized storage, you need to create a new smaller volume and migrate data
- Combine with gp2 β gp3 migration β If you haven't migrated from gp2 yet, do that first for an automatic 20% storage savings
β οΈ Important Gotcha
You can change volume type, IOPS, and throughput online β but you CANNOT shrink volume size. EBS only allows increasing size. If a volume is 500GB but you only use 50GB, you'd need to create a new 100GB volume, copy data, and swap. The monitoring Lambda focuses on IOPS/type optimization since those are zero-downtime changes.
π TL;DR
| Action | Savings | Effort |
|---|---|---|
| io2 β gp3 downgrade | 60-90% | 10 minutes |
| Remove excess IOPS (gp3) | $5/1000 IOPS/mo | 5 minutes |
| Remove excess throughput (gp3) | $0.04/MB/s/mo | 5 minutes |
| Deploy weekly monitoring | Ongoing alerts | 15 minutes |
| Environment-aware module | Prevent future waste | 20 minutes |
Bottom line: EBS is the silent budget killer. You can't see unused IOPS or throughput in the console β they just quietly drain your wallet. Deploy the analyzer, check the report, and stop paying for air. π¨
Run the audit CLI command above. I bet you'll find at least one io2 volume that should be gp3. Go on, I'll wait. π
Found this helpful? Follow for more AWS cost optimization with Terraform! π¬
Top comments (0)