DEV Community

Cover image for Running Redis 24/7? You're Leaving 40% on the Table Without Reserved Nodes πŸ”₯
Suhas Mallesh
Suhas Mallesh

Posted on • Edited on

Running Redis 24/7? You're Leaving 40% on the Table Without Reserved Nodes πŸ”₯

If your ElastiCache Redis or Memcached runs around the clock, you're overpaying by 40%. Here's how to automate Reserved Node purchases and tracking with Terraform.

Here's a painful truth: If your ElastiCache cluster has been running for more than a month, you've already overpaid.

Most teams deploy Redis or Memcached, set it, forget it β€” and never think about reserved pricing.

Let's fix that.

πŸ’Έ The On-Demand Tax

Here's what a typical ElastiCache setup costs on-demand:

cache.r7g.large (Redis, Multi-AZ)

On-Demand:  $0.252/hour Γ— 730 hours = $184/month
1-Year RI:  $0.150/hour Γ— 730 hours = $110/month
3-Year RI:  $0.102/hour Γ— 730 hours =  $74/month
Enter fullscreen mode Exit fullscreen mode

That's 40% savings (1-year) or 60% savings (3-year) β€” for the exact same cluster doing the exact same thing. πŸ’°

Scale that across a real environment:

Setup On-Demand 1-Year RI 3-Year RI
1 node (cache.r7g.large) $2,208/yr $1,320/yr $888/yr
3-node cluster $6,624/yr $3,960/yr $2,664/yr
3-node + 2 replicas $11,040/yr $6,600/yr $4,440/yr

A 3-node cluster with replicas saves $4,440/year with 1-year RIs. No changes to your application. Zero downtime. Just cheaper. βœ…

πŸ€” When Should You Reserve?

The break-even point for a 1-year No Upfront RI is roughly 7-8 months. So if your cluster has been running for 8+ months and you haven't reserved β€” you're burning money.

Reserve when:

  • βœ… Cluster has been stable for 3+ months
  • βœ… You don't plan to change node types soon
  • βœ… It's a production workload running 24/7
  • βœ… You're using consistent node families (e.g., r7g, m7g)

Don't reserve when:

  • ❌ Dev/test clusters that get torn down
  • ❌ You're actively testing different node sizes
  • ❌ Cluster is less than 3 months old
  • ❌ Planning a migration to a different engine or service

πŸ—οΈ Terraform Implementation

Step 1: Deploy Your ElastiCache Cluster

# modules/elasticache/main.tf

variable "environment" {
  type = string
}

variable "node_type" {
  type    = string
  default = "cache.r7g.large"
}

variable "num_cache_clusters" {
  type    = number
  default = 3
}

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id = "${var.environment}-redis"
  description          = "${var.environment} Redis cluster"

  node_type            = var.node_type
  num_cache_clusters   = var.num_cache_clusters
  engine               = "redis"
  engine_version       = "7.1"
  port                 = 6379
  parameter_group_name = "default.redis7"

  # Multi-AZ for production
  automatic_failover_enabled = var.environment == "prod"
  multi_az_enabled           = var.environment == "prod"

  # Encryption
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true

  # Maintenance
  maintenance_window       = "sun:05:00-sun:07:00"
  snapshot_retention_limit = var.environment == "prod" ? 7 : 0
  snapshot_window          = "03:00-05:00"

  tags = {
    Environment  = var.environment
    ManagedBy    = "terraform"
    ReserveReady = "true"  # πŸ‘ˆ Tag for RI tracking
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Purchase Reserved Nodes with Terraform

# reserved-instances/elasticache.tf

resource "aws_elasticache_reserved_cache_node" "redis_prod" {
  reserved_cache_nodes_offering_id = data.aws_elasticache_reserved_cache_node_offering.redis.offering_id
  cache_node_count                 = 3  # Match your cluster size
}

data "aws_elasticache_reserved_cache_node_offering" "redis" {
  cache_node_type     = "cache.r7g.large"
  duration            = "P1Y"           # 1 year (P3Y for 3-year)
  offering_type       = "No Upfront"    # or "Partial Upfront", "All Upfront"
  product_description = "redis"
}
Enter fullscreen mode Exit fullscreen mode

⚠️ Important: Running terraform apply on reserved node resources commits you to a purchase. There's no undo. Always run terraform plan first and review carefully.

Step 3: Payment Options Compared

# Option A: No Upfront (most flexible, least savings)
# Pay monthly, cancel-proof but still committed for term
offering_type = "No Upfront"
# Savings: ~33-36%

# Option B: Partial Upfront (balanced)
# Pay some upfront + reduced monthly
offering_type = "Partial Upfront"
# Savings: ~38-41%

# Option C: All Upfront (maximum savings)
# Pay everything upfront, nothing monthly
offering_type = "All Upfront"
# Savings: ~40-44%
Enter fullscreen mode Exit fullscreen mode

My recommendation: Start with No Upfront 1-Year. You get most of the savings with maximum flexibility. Graduate to Partial/All Upfront once you're confident in your setup. 🎯

πŸ“Š Automated RI Coverage Monitoring

Don't let reservations expire silently. This Lambda checks coverage and alerts you:

# monitoring/ri-coverage.tf

resource "aws_lambda_function" "ri_monitor" {
  filename         = data.archive_file.ri_monitor.output_path
  function_name    = "elasticache-ri-monitor"
  role             = aws_iam_role.ri_monitor.arn
  handler          = "index.handler"
  runtime          = "python3.12"
  timeout          = 30
  source_code_hash = data.archive_file.ri_monitor.output_base64sha256

  environment {
    variables = {
      SNS_TOPIC_ARN = aws_sns_topic.cost_alerts.arn
    }
  }
}

data "archive_file" "ri_monitor" {
  type        = "zip"
  output_path = "${path.module}/ri_monitor.zip"

  source {
    content  = <<-PYTHON
import boto3
import os
from datetime import datetime, timedelta

def handler(event, context):
    ec = boto3.client('elasticache')
    sns = boto3.client('sns')

    # Get all running nodes
    clusters = ec.describe_cache_clusters()['CacheClusters']
    running_nodes = {}
    for c in clusters:
        key = f"{c['CacheNodeType']}|{c['Engine']}"
        running_nodes[key] = running_nodes.get(key, 0) + c['NumCacheNodes']

    # Get active reservations
    reservations = ec.describe_reserved_cache_nodes()['ReservedCacheNodes']
    reserved = {}
    expiring_soon = []

    for r in reservations:
        if r['State'] == 'active':
            key = f"{r['CacheNodeType']}|{r['ProductDescription']}"
            reserved[key] = reserved.get(key, 0) + r['CacheNodeCount']

            # Check if expiring within 30 days
            end_time = r['StartTime'] + timedelta(seconds=r['Duration'])
            if end_time - datetime.now(end_time.tzinfo) < timedelta(days=30):
                expiring_soon.append({
                    'id': r['ReservedCacheNodeId'],
                    'type': r['CacheNodeType'],
                    'expires': end_time.strftime('%Y-%m-%d')
                })

    # Find unreserved nodes
    unreserved = []
    for key, count in running_nodes.items():
        reserved_count = reserved.get(key, 0)
        if count > reserved_count:
            node_type, engine = key.split('|')
            unreserved.append(
                f"  {node_type} ({engine}): "
                f"{count - reserved_count} unreserved of {count} total"
            )

    # Build alert
    alerts = []
    if unreserved:
        alerts.append("UNRESERVED NODES (wasting money!):\n" 
                      + "\n".join(unreserved))
    if expiring_soon:
        alerts.append("EXPIRING WITHIN 30 DAYS:\n" + "\n".join(
            f"  {e['id']} ({e['type']}) expires {e['expires']}"
            for e in expiring_soon
        ))

    if alerts:
        sns.publish(
            TopicArn=os.environ['SNS_TOPIC_ARN'],
            Subject='ElastiCache RI Coverage Alert',
            Message="\n\n".join(alerts)
        )

    return {'unreserved': len(unreserved), 'expiring': len(expiring_soon)}
    PYTHON
    filename = "index.py"
  }
}

# Run weekly
resource "aws_cloudwatch_event_rule" "weekly_ri_check" {
  name                = "elasticache-ri-check"
  schedule_expression = "rate(7 days)"
}

resource "aws_cloudwatch_event_target" "ri_monitor" {
  rule = aws_cloudwatch_event_rule.weekly_ri_check.name
  arn  = aws_lambda_function.ri_monitor.arn
}

resource "aws_lambda_permission" "allow_eventbridge" {
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.ri_monitor.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.weekly_ri_check.arn
}

resource "aws_sns_topic" "cost_alerts" {
  name = "elasticache-cost-alerts"
}
Enter fullscreen mode Exit fullscreen mode

You'll get an email alert whenever nodes are unreserved or reservations are about to expire. No more surprise bills. πŸ“¬

⚑ Quick Audit: Are You Wasting Money Right Now?

Run this CLI command to check your current RI coverage:

# List all running ElastiCache nodes
aws elasticache describe-cache-clusters \
  --query 'CacheClusters[].{ID:CacheClusterId,Type:CacheNodeType,Engine:Engine,Nodes:NumCacheNodes}' \
  --output table

# List active reservations
aws elasticache describe-reserved-cache-nodes \
  --query 'ReservedCacheNodes[?State==`active`].{Type:CacheNodeType,Count:CacheNodeCount,Expires:StartTime}' \
  --output table
Enter fullscreen mode Exit fullscreen mode

If the first table has more nodes than the second β€” you're overpaying. 🚨

🎯 Implementation Checklist

  1. Audit β€” Run the CLI commands above to find unreserved nodes
  2. Identify stable clusters β€” Production clusters running 3+ months
  3. Start conservative β€” 1-Year, No Upfront for your first reservation
  4. Deploy monitoring β€” Set up the Lambda to catch gaps and expirations
  5. Review quarterly β€” Reassess node types and reservation coverage

πŸ’‘ Pro Tips

  • Reservations are region-specific β€” A reservation in us-east-1 won't cover nodes in eu-west-1
  • Node type must match exactly β€” cache.r7g.large RI won't cover cache.r7g.xlarge
  • Reservations apply automatically β€” Once purchased, billing adjusts immediately. No cluster changes needed
  • Combine with Graviton β€” If you haven't migrated to r7g/m7g yet, do that first (20% cheaper), then reserve the Graviton nodes for compounding savings πŸ”₯

πŸ“Š TL;DR

Action Savings Effort
1-Year No Upfront RI ~36% 5 minutes
1-Year All Upfront RI ~42% 5 minutes
3-Year All Upfront RI ~60% 5 minutes
+ Graviton migration +20% on top 5 minutes

Bottom line: If your Redis has been running for 8+ months and you haven't reserved, you're throwing away 40% of that bill. Fix it today. ⚑


Running ElastiCache without Reserved Nodes is like paying rent monthly when the landlord offers 40% off for signing a lease. Same apartment, just cheaper. 🏠

Found this helpful? Follow for more AWS cost optimization with Terraform! πŸ’¬

Top comments (0)