Suhas Mallesh

Posted on Feb 12 • Edited on Feb 16

Running Redis 24/7? You're Leaving 40% on the Table Without Reserved Nodes 🔥

#aws #terraform #redis #devops

If your ElastiCache Redis or Memcached runs around the clock, you're overpaying by 40%. Here's how to automate Reserved Node purchases and tracking with Terraform.

Here's a painful truth: If your ElastiCache cluster has been running for more than a month, you've already overpaid.

Most teams deploy Redis or Memcached, set it, forget it — and never think about reserved pricing.

Let's fix that.

💸 The On-Demand Tax

Here's what a typical ElastiCache setup costs on-demand:

cache.r7g.large (Redis, Multi-AZ)

On-Demand:  $0.252/hour × 730 hours = $184/month
1-Year RI:  $0.150/hour × 730 hours = $110/month
3-Year RI:  $0.102/hour × 730 hours =  $74/month

That's 40% savings (1-year) or 60% savings (3-year) — for the exact same cluster doing the exact same thing. 💰

Scale that across a real environment:

Setup	On-Demand	1-Year RI	3-Year RI
1 node (cache.r7g.large)	$2,208/yr	$1,320/yr	$888/yr
3-node cluster	$6,624/yr	$3,960/yr	$2,664/yr
3-node + 2 replicas	$11,040/yr	$6,600/yr	$4,440/yr

A 3-node cluster with replicas saves $4,440/year with 1-year RIs. No changes to your application. Zero downtime. Just cheaper. ✅

🤔 When Should You Reserve?

The break-even point for a 1-year No Upfront RI is roughly 7-8 months. So if your cluster has been running for 8+ months and you haven't reserved — you're burning money.

Reserve when:

✅ Cluster has been stable for 3+ months
✅ You don't plan to change node types soon
✅ It's a production workload running 24/7
✅ You're using consistent node families (e.g., r7g, m7g)

Don't reserve when:

❌ Dev/test clusters that get torn down
❌ You're actively testing different node sizes
❌ Cluster is less than 3 months old
❌ Planning a migration to a different engine or service

🏗️ Terraform Implementation

Step 1: Deploy Your ElastiCache Cluster

# modules/elasticache/main.tf

variable "environment" {
  type = string
}

variable "node_type" {
  type    = string
  default = "cache.r7g.large"
}

variable "num_cache_clusters" {
  type    = number
  default = 3
}

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id = "${var.environment}-redis"
  description          = "${var.environment} Redis cluster"

  node_type            = var.node_type
  num_cache_clusters   = var.num_cache_clusters
  engine               = "redis"
  engine_version       = "7.1"
  port                 = 6379
  parameter_group_name = "default.redis7"

  # Multi-AZ for production
  automatic_failover_enabled = var.environment == "prod"
  multi_az_enabled           = var.environment == "prod"

  # Encryption
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true

  # Maintenance
  maintenance_window       = "sun:05:00-sun:07:00"
  snapshot_retention_limit = var.environment == "prod" ? 7 : 0
  snapshot_window          = "03:00-05:00"

  tags = {
    Environment  = var.environment
    ManagedBy    = "terraform"
    ReserveReady = "true"  # 👈 Tag for RI tracking
  }
}

Step 2: Purchase Reserved Nodes with Terraform

# reserved-instances/elasticache.tf

resource "aws_elasticache_reserved_cache_node" "redis_prod" {
  reserved_cache_nodes_offering_id = data.aws_elasticache_reserved_cache_node_offering.redis.offering_id
  cache_node_count                 = 3  # Match your cluster size
}

data "aws_elasticache_reserved_cache_node_offering" "redis" {
  cache_node_type     = "cache.r7g.large"
  duration            = "P1Y"           # 1 year (P3Y for 3-year)
  offering_type       = "No Upfront"    # or "Partial Upfront", "All Upfront"
  product_description = "redis"
}

⚠️ Important: Running terraform apply on reserved node resources commits you to a purchase. There's no undo. Always run terraform plan first and review carefully.

Step 3: Payment Options Compared

# Option A: No Upfront (most flexible, least savings)
# Pay monthly, cancel-proof but still committed for term
offering_type = "No Upfront"
# Savings: ~33-36%

# Option B: Partial Upfront (balanced)
# Pay some upfront + reduced monthly
offering_type = "Partial Upfront"
# Savings: ~38-41%

# Option C: All Upfront (maximum savings)
# Pay everything upfront, nothing monthly
offering_type = "All Upfront"
# Savings: ~40-44%

My recommendation: Start with No Upfront 1-Year. You get most of the savings with maximum flexibility. Graduate to Partial/All Upfront once you're confident in your setup. 🎯

📊 Automated RI Coverage Monitoring

Don't let reservations expire silently. This Lambda checks coverage and alerts you:

# monitoring/ri-coverage.tf

resource "aws_lambda_function" "ri_monitor" {
  filename         = data.archive_file.ri_monitor.output_path
  function_name    = "elasticache-ri-monitor"
  role             = aws_iam_role.ri_monitor.arn
  handler          = "index.handler"
  runtime          = "python3.12"
  timeout          = 30
  source_code_hash = data.archive_file.ri_monitor.output_base64sha256

  environment {
    variables = {
      SNS_TOPIC_ARN = aws_sns_topic.cost_alerts.arn
    }
  }
}

data "archive_file" "ri_monitor" {
  type        = "zip"
  output_path = "${path.module}/ri_monitor.zip"

  source {
    content  = <<-PYTHON
import boto3
import os
from datetime import datetime, timedelta

def handler(event, context):
    ec = boto3.client('elasticache')
    sns = boto3.client('sns')

    # Get all running nodes
    clusters = ec.describe_cache_clusters()['CacheClusters']
    running_nodes = {}
    for c in clusters:
        key = f"{c['CacheNodeType']}|{c['Engine']}"
        running_nodes[key] = running_nodes.get(key, 0) + c['NumCacheNodes']

    # Get active reservations
    reservations = ec.describe_reserved_cache_nodes()['ReservedCacheNodes']
    reserved = {}
    expiring_soon = []

    for r in reservations:
        if r['State'] == 'active':
            key = f"{r['CacheNodeType']}|{r['ProductDescription']}"
            reserved[key] = reserved.get(key, 0) + r['CacheNodeCount']

            # Check if expiring within 30 days
            end_time = r['StartTime'] + timedelta(seconds=r['Duration'])
            if end_time - datetime.now(end_time.tzinfo) < timedelta(days=30):
                expiring_soon.append({
                    'id': r['ReservedCacheNodeId'],
                    'type': r['CacheNodeType'],
                    'expires': end_time.strftime('%Y-%m-%d')
                })

    # Find unreserved nodes
    unreserved = []
    for key, count in running_nodes.items():
        reserved_count = reserved.get(key, 0)
        if count > reserved_count:
            node_type, engine = key.split('|')
            unreserved.append(
                f"  {node_type} ({engine}): "
                f"{count - reserved_count} unreserved of {count} total"
            )

    # Build alert
    alerts = []
    if unreserved:
        alerts.append("UNRESERVED NODES (wasting money!):\n" 
                      + "\n".join(unreserved))
    if expiring_soon:
        alerts.append("EXPIRING WITHIN 30 DAYS:\n" + "\n".join(
            f"  {e['id']} ({e['type']}) expires {e['expires']}"
            for e in expiring_soon
        ))

    if alerts:
        sns.publish(
            TopicArn=os.environ['SNS_TOPIC_ARN'],
            Subject='ElastiCache RI Coverage Alert',
            Message="\n\n".join(alerts)
        )

    return {'unreserved': len(unreserved), 'expiring': len(expiring_soon)}
    PYTHON
    filename = "index.py"
  }
}

# Run weekly
resource "aws_cloudwatch_event_rule" "weekly_ri_check" {
  name                = "elasticache-ri-check"
  schedule_expression = "rate(7 days)"
}

resource "aws_cloudwatch_event_target" "ri_monitor" {
  rule = aws_cloudwatch_event_rule.weekly_ri_check.name
  arn  = aws_lambda_function.ri_monitor.arn
}

resource "aws_lambda_permission" "allow_eventbridge" {
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.ri_monitor.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.weekly_ri_check.arn
}

resource "aws_sns_topic" "cost_alerts" {
  name = "elasticache-cost-alerts"
}

You'll get an email alert whenever nodes are unreserved or reservations are about to expire. No more surprise bills. 📬

⚡ Quick Audit: Are You Wasting Money Right Now?

Run this CLI command to check your current RI coverage:

# List all running ElastiCache nodes
aws elasticache describe-cache-clusters \
  --query 'CacheClusters[].{ID:CacheClusterId,Type:CacheNodeType,Engine:Engine,Nodes:NumCacheNodes}' \
  --output table

# List active reservations
aws elasticache describe-reserved-cache-nodes \
  --query 'ReservedCacheNodes[?State==`active`].{Type:CacheNodeType,Count:CacheNodeCount,Expires:StartTime}' \
  --output table

If the first table has more nodes than the second — you're overpaying. 🚨

🎯 Implementation Checklist

Audit — Run the CLI commands above to find unreserved nodes
Identify stable clusters — Production clusters running 3+ months
Start conservative — 1-Year, No Upfront for your first reservation
Deploy monitoring — Set up the Lambda to catch gaps and expirations
Review quarterly — Reassess node types and reservation coverage

💡 Pro Tips

Reservations are region-specific — A reservation in us-east-1 won't cover nodes in eu-west-1
Node type must match exactly — cache.r7g.large RI won't cover cache.r7g.xlarge
Reservations apply automatically — Once purchased, billing adjusts immediately. No cluster changes needed
Combine with Graviton — If you haven't migrated to r7g/m7g yet, do that first (20% cheaper), then reserve the Graviton nodes for compounding savings 🔥

📊 TL;DR

Action	Savings	Effort
1-Year No Upfront RI	~36%	5 minutes
1-Year All Upfront RI	~42%	5 minutes
3-Year All Upfront RI	~60%	5 minutes
+ Graviton migration	+20% on top	5 minutes

Bottom line: If your Redis has been running for 8+ months and you haven't reserved, you're throwing away 40% of that bill. Fix it today. ⚡

Running ElastiCache without Reserved Nodes is like paying rent monthly when the landlord offers 40% off for signing a lease. Same apartment, just cheaper. 🏠

Found this helpful? Follow for more AWS cost optimization with Terraform! 💬

DEV Community