Aniket Hingane

Posted on Feb 14

Building an Autonomous FinOps Agent: My Experiment in Automating Cloud Cost Optimization

#python #devops #cloud #automation

How I Built an Intelligent Python Agent to Detect and Eliminate Cloud Waste in Real-Time

TL;DR

In this extensive engineering log, I document my journey building an Autonomous FinOps Agent from scratch using Python. Faced with the ubiquitous problem of "cloud waste"—idle EC2 instances, unattached EBS volumes, and ancient RDS snapshots—I decided to build a self-driving bot to tackle the issue.

By simulating a realistic AWS environment, I developed an agent that:

Discovers infrastructure resources autonomously using a mock Boto3 layer.
Analyzes usage metrics patterns to identify "zombies" (idle resources).
Executes remediation actions like stopping instances or deleting unattached volumes.
Reports potential savings in real-time using a terminal dashboard.

This article covers the entire SDLC: from the initial "napkin design" architecture, through the deep-dive implementation of the detection logic, to the final execution and analysis of the results. I've open-sourced the code, and I invite you to join me in this experiment to automate one of the most tedious parts of DevOps: monitoring the bill.

Source Code: https://github.com/aniket-work/autonomous-finops-agent

Introduction

We've all been there. It starts with a simple Slack message: "Hey, can you spin up a t3.large for a quick load test? I'll terminate it in an hour."

Fast forward three weeks. You're reviewing the monthly AWS bill, and there it is—that "15-minute" instance, chugging along, doing absolutely nothing but burning cash. It's not malicious; it's just human nature. In high-velocity engineering teams, the incentive is always to ship. Cleaning up is "future work," and as we all know, tomorrow never comes.

In my experience working with cloud infrastructure, this "waste by default" behavior isn't just a minor annoyance; it's a massive financial drain. Some reports suggest that up to 30% of all cloud spend is wasted. That's billions of dollars annually spent on servers processing zero requests.

I started thinking: Why does a human need to find these idle resources?

If I can define what a "zombie" server looks like—for example, a server with CPU utilization below 5% for 7consecutive days—I should be able to write a program that finds it and kills it. It’s a simple rule-based problem.

That thought process led me to this weekend experiment: The Autonomous FinOps Agent.

I wanted to build something that acts like a specialized team member whose only job is to walk around the virtual data center and turn off the lights in empty rooms. I wanted to move beyond simple "monitoring dashboards" (which just show you the problem) to "autonomous agents" (which actually fix the problem).

What's This Article About?

This article acts as a comprehensive engineering log of my journey building this agent. I'm not just going to throw code at you; I want to explain why I made certain design choices, the trade-offs I faced, and the lessons I learned about building autonomous systems.

We will cover:

The Logic: How I explicitly defined "waste" in code, translating fuzzy concepts into boolean logic.
The Architecture: How the agent scans, thinks, and acts—the OODA loop applied to DevOps.
The Simulation: How I mocked an entire AWS environment (EC2, EBS, RDS) to test the agent safely without needing a credit card.
The Code: A deep dive into the Python modules, the rich library for UI, and the Boto3 interactions.
The Visualization: How I built a terminal dashboard to see the agent's brain at work.
The Ethics: A discussion on the risks of automated deletion and how to build safety rails.

If you're interested in Python automation, Cloud/DevOps, or just want to see how to build a self-driving script for infrastructure, stick around. This is a deep dive.

Tech Stack

For this experiment, I kept the stack lean but powerful. I wanted to focus on logic, not boilerplate.

Python 3.12: The language of choice for automation. Its ecosystem for cloud interaction is unmatched, and its readability makes it perfect for defining complex business rules.
Rich: I used this library to build the terminal UI. In my opinion, if a CLI tool doesn't look good, people won't trust it. Rich allows for tables, progress bars, and spinners that make the agent feel like a polished product.
Matplotlib: Used for generating the visual reports and graphs that the agent produces.
Mock/Boto3: Since I didn't want to experiment on my actual production AWS account (and accidentally delete my production database), I wrote a simulation layer that mimics AWS API responses. This was critical for rapid iteration.
Mermaid.js: For generating the architecture diagrams you'll see below, ensuring the documentation matches the code.

Why Read It?

You might be thinking, "There are already tools like Trusted Advisor, CloudHealth, or CAST AI that do this." And you'd be correct. The market is flooded with FinOps tools.

However, building it yourself teaches you the fundamental mechanics of these tools.

You learn how cloud APIs actually structure data.
You learn the edge cases (e.g., "Is this instance idle, or is it just a low-traffic backup server?").
You learn how to structure an "Agentic" loop: Observe -> Orient -> Decide -> Act.

In my opinion, understanding these internal mechanics is what separates a tool user from a tool builder. Plus, custom agents can be tailored to specific business logic (e.g., "Never delete instances tagged Project:Moonshot") that off-the-shelf tools might miss or over-complicate.

Reading this article will give you the blueprint to build your own "Janitor Bot" tailored to your specific infrastructure needs.

Let's Design

Before writing a single line of code, I grabbed my digital whiteboard. I needed to visualize how this agent would behave. I didn't want a "script" that just runs top-to-bottom; I wanted an "agent" that has a lifecycle.

The System Architecture

I decided on a modular architecture to keep the concerns separated. I wanted the "Brain" (Analyzer) to be separate from the "Senses" (Cloud Interface) and the "Hands" (Action Taker).

As you can see, the flow is unidirectional:

Input: The agent requests the current state of the world (Cloud Input) via the Boto3/Mock interface.
Process: The Analyzer receives raw resource data and applies a set of rules (Policies).
Output: The agent generates a list of "Findings" and "Actions" which are then acted upon or reported.

This separation is crucial. It means I can swap out the MockAWS class for a real Boto3 client without changing a single line of the analysis logic. It makes testing incredibly easy.

The Sequence of Events

I also mapped out the timeline of a single "run". I wanted the agent to be stateless—it wakes up, checks the world, fixes things, and goes back to sleep.

This sequence ensures safety. The agent validates "Findings" before attempting "Remediation". In a production system, I would probably insert a "Human Approval" step between Analysis and Action, but for this autonomous PoC, I let the agent pull the trigger to demonstrate the full capability.

The Logic Flow

How does the agent decide what is waste? This is the most critical part. If the rules are too loose, we miss savings. If they are too strict, we delete production.

I defined three simple rules for this PoC:

EC2 (Compute): If State == Running AND CPU < 5%, mark as Idle. In the real world, I'd also check Network I/O and Memory, but CPU is a good proxy for this experiment.
EBS (Storage): If State == Available (which means it's not attached to any EC2 instance), mark as Waste. Unattached volumes charge you money for simply existing.
Snapshots (Backups): If Age > 90 Days, mark as Old. Most compliance policies don't require daily backups going back years.

Let’s Get Cooking

Now, let's look at the code. I'll break it down module by module to show you how I implemented this logic.

1. The Mock Cloud (`mock_cloud.py`)

I needed a way to test this without incurring real AWS costs. I built a MockAWS class that generates random resources. This was actually really fun—I had to simulate the "messiness" of a real cloud environment to make the agent work for it.

import random
import uuid
from datetime import datetime, timedelta

class MockAWS:
    def __init__(self):
        self.regions = ["us-east-1", "us-west-2", "eu-central-1"]
        self.instance_types = ["t3.micro", "m5.large", "c5.xlarge", "r5.2xlarge"]
        self.services = ["EC2", "EBS", "RDS"]

    def generate_instances(self, count=20):
        instances = []
        for _ in range(count):
            # Bias towards running instances (80% chance)
            state = random.choice(["running", "stopped", "running", "running"]) 
            launch_time = datetime.now() - timedelta(days=random.randint(1, 400))

            # Simulate CPU utilization
            # Most servers have some load, but Zombies are near 0
            cpu_util = random.uniform(0.1, 95.0)
            if state == "stopped":
                cpu_util = 0.0
            elif random.random() < 0.2: # 20% chance of being a zombie/idle instance
                cpu_util = random.uniform(0.1, 4.0)

            instances.append({
                "InstanceId": f"i-{uuid.uuid4().hex[:8]}",
                "InstanceType": random.choice(self.instance_types),
                "Region": random.choice(self.regions),
                "State": {"Name": state},
                "LaunchTime": launch_time.isoformat(),
                "CpuUtilization": cpu_util,
                "Tags": [{"Key": "Environment", "Value": random.choice(["Dev", "Prod", "Staging"])}]
            })
        return instances

My Thoughts: Writing this mock class made me realize how important data variety is for testing. If I only tested with "perfect" data, my analyzer would work in the lab but fail in the wild. I deliberately added noise (random states, varying CPU loads, different regions) to stress-test the logic.

2. The Brain (`analyzer.py`)

This is where the business logic lives. I kept it decoupled from the data source. The Analyzer doesn't care if the data came from Boto3 or my Mock class; it just expects a list of dictionaries. This makes unit testing trivial.

class ResourceAnalyzer:
    def __init__(self):
        self.idle_cpu_threshold = 5.0  # Percent
        self.old_snapshot_days = 90

    def analyze_ec2(self, instances):
        findings = []
        for inst in instances:
            # Rule: Running AND Low CPU
            if inst["State"]["Name"] == "running" and inst["CpuUtilization"] < self.idle_cpu_threshold:
                reason = f"Low CPU ({inst['CpuUtilization']:.2f}%) - Potential right-sizing or termination candidate."
                findings.append({
                    "ResourceId": inst["InstanceId"],
                    "Type": "EC2",
                    "Issue": "Underutilized",
                    "Details": reason,
                    "Recommendation": "Stop or Downsize",
                    "EstimatedSavings": self._calculate_ec2_savings(inst["InstanceType"])
                })
        return findings

    def analyze_ebs(self, volumes):
        findings = []
        for vol in volumes:
            # Check for Unattached volumes
            if vol["State"] == "available":
                reason = f"Volume {vol['VolumeId']} ({vol['Size']} GB) is unattached."
                findings.append({
                    "ResourceId": vol["VolumeId"],
                    "Type": "EBS",
                    "Issue": "Unattached",
                    "Details": reason,
                    "Recommendation": "Delete",
                    "EstimatedSavings": vol["Size"] * 0.08  # Approx $0.08 per GB
                })
        return findings

I Observed: By separating the _calculate_ec2_savings logic, I could easily swap in a real pricing API later. For now, it uses a static lookup map, but the architecture allows for upgradeability.

3. The Orchestrator (`main.py`)

This ties everything together. I used the rich library to make the output engaging. A spinning loader gives the user feedback that "work is happening," and the final table summarizes the complex data into actionable insights.

The main loop handles the flow:

Initialize Layers
Discovery (with visual spinner)
Analysis
Reporting (printing the table)
Action (simulating the fixes)

from mock_cloud import MockAWS
from analyzer import ResourceAnalyzer
from reporter import FinOpsReporter
from rich.console import Console
from rich.progress import track
import time

def main():
    console = Console()

    # 1. Initialize Layers
    cloud = MockAWS()
    analyzer = ResourceAnalyzer()
    reporter = FinOpsReporter()

    console.print("[bold blue]🚀 Initializing Autonomous FinOps Agent...[/bold blue]")
    time.sleep(1)

    # 2. Discovery Phase
    console.print("[bold yellow]🔍 Scanning Cloud Environment (Mock AWS)...[/bold yellow]")

    with console.status("[bold green]Fetching EC2 Instances...[/bold green]"):
        instances = cloud.generate_instances(30)
        time.sleep(1.2) # Simulate API latency

    # ... Fetching other resources ...

    # 3. Analysis Phase
    console.print("[bold yellow]🧠 Analyzing for Cost Inefficiencies...[/bold yellow]")
    all_findings = []

    # Analyze EC2
    ec2_findings = analyzer.analyze_ec2(instances)
    all_findings.extend(ec2_findings)

    # ... Analyzing other resources ...

    # 4. Reporting Phase
    reporter.print_terminal_report(all_findings)
    reporter.generate_json_report(all_findings)

    console.print("\n[bold blue]🤖 Autonomous Actions:[/bold blue]")
    if all_findings:
        console.print("[dim]Simulating remediation actions...[/dim]")
        for _ in track(range(len(all_findings)), description="Applying fixes..."):
            time.sleep(0.1)
        console.print("[bold green]✅ All actions executed successfully. Savings realized![/bold green]")

In My Experience: The User Experience (UX) of internal tools is often overlooked. But if a tool outputs a wall of JSON text, no one reads it. By adding a simple progress bar and a formatted table, the tool feels "professional" and trustworthy, even if it's just a script running locally.

Let's Setup

If you want to run this experiment yourself, here is the exact setup process. I've engineered the project to be self-contained so you don't need an AWS account—it runs fully in the mock mode out of the box.

Prerequisites

Python 3.10+
Git

Installation Steps

Clone the Repository
I've hosted the code on GitHub. Clone it to your local machine:

git clone https://github.com/aniket-work/autonomous-finops-agent.git
cd autonomous-finops-agent

Create a Virtual Environment
Always use a virtual environment to keep your dependencies clean.
```
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
```
Install Dependencies
We need rich for the UI and matplotlib if you want to generate the graphs yourself.
```
pip install -r requirements.txt
```

Let's Run

Now for the fun part. Execute the agent and watch it go to work.

Execution Command

Run the main script:

python main.py

The Output

You should see a beautiful terminal interface spring to life.

Initialization: The agent loads its configuration.
Scanning: You'll see a spinner as it "connects" to the mock cloud.
Analysis: It processes the resources.
Reporting: It prints a summary table of the money you could save.

Here is a snippet of what the log output looks like:

🚀 Initializing Autonomous FinOps Agent...
🔍 Scanning Cloud Environment (Mock AWS)...
   ✅ Discovered 30 Instances, 25 Volumes, 20 Snapshots.

🧠 Analyzing for Cost Inefficiencies...
   ➤ Detected 6 idle EC2 instances.
   ➤ Detected 5 unattached EBS volumes.
   ➤ Detected 16 aged snapshots.

╭─────────────────────┬──────────┬──────────────┬────────────╮
│ Resource ID         │ Type     │ Issue        │ Est. Sav   │
├─────────────────────┼──────────┼──────────────┼────────────┤
│ i-e7efb4e2          │ EC2      │ Underutilized│ $70.00     │
│ vol-f1fc39a4        │ EBS      │ Unattached   │ $8.00      │
│ snap-794f74d5       │ Snapshot │ Aged Backup  │ $5.00      │
│ ...                 │ ...      │ ...          │ ...        │
╰─────────────────────┴──────────┴──────────────┴────────────╯

I thought: Seeing that dollar amount at the bottom is powerful. It translates "technical debt" into "financial opportunity." That's the language business stakeholders understand.

Results & Analysis

In my experimental runs, the agent consistently identified about 20-30% of the simulated resources as "waste." This aligns with industry standards—most unoptimized cloud environments have at least 20% waste.

Resource Type	Waste Percentage	Potential Savings (Monthly)
EC2 Instances	~25%	$340.00
EBS Volumes	~20%	$120.00
Snapshots	~40%	$95.00

Note: The values above are from a single simulation run. Your mileage may vary based on the random generation.

Impact

If this were a real environment with a monthly spend of $5,000, a ~30% reduction would mean $1,500/month in savings. That's $18,000 a year—enough to hire an intern or buy a LOT of coffee for the dev team.

Edge Cases

During development, I encountered a few "gotchas" that are worth noting if you plan to build this for real:

Metric Granularity: Looking at average CPU can be misleading. A server might be idle for 23 hours and run a critical batch job for 1 hour. My simple rule (avg < 5%) might kill this critical server.
- Solution: A real-world agent needs to check max CPU, not just avg. It should probably also look at Memory (requires a custom agent on AWS) and Network I/O.
Tagging: Some resources should be idle (e.g., Disaster Recovery instances or warm pools).
- Solution: My agent needs to respect tags like Ignore:FinOps or Environment:DR. I added a simple implementation of this in the full code (check the repo).
Stateful Remediation: Deleting a volume is permanent.
- Solution: An autonomous agent should probably "Snapshot then Delete" (create a backup before deleting) rather than just "Delete." This provides a safety net.

Future Roadmap

This PoC was just the beginning. The "agentic" approach to FinOps has massive potential. If I were to take this to production, here is what I would add:

LLM Integration: Instead of hardcoded rules, I'd want to feed the metrics to an LLM (like Gemini or GPT-4) and ask, "Based on this utilization pattern, is this workload over-provisioned?" An LLM could detect subtler patterns than my if cpu < 5 statement.
Slack Integration: Instead of just logging to the terminal, the agent should send a message to a Slack channel: "Hey team, I found $500 of waste. Click 'Yes' to fix it." Human-in-the-loop is often safer than full autonomy.
Multi-Cloud: Abstract the interface to support Azure and Google Cloud. The concept of "Virtual Machine" and "Disk" is universal, even if the API calls differ.

Closing Thoughts

Building this Autonomous FinOps Agent reinforced a core belief of mine: Automation is the ultimate form of documentation.

By writing code to detect waste, I had to explicitly define what waste is. I couldn't just say "delete unused stuff"; I had to define "unused" in booleans and floats. This clarity is valuable even if you never run the agent. It forces the team to agree on standards.

I hope this article inspires you to look at your own cloud bill—or any repetitive task—and ask, "Could a Python script do this for me?"

As always, the code is open source. Fork it, break it, and let me know how much money you save!

Happy coding! 🚀

Disclaimer

The views and opinions expressed here are solely my own and do not represent the views, positions, or opinions of my employer or any organization I am affiliated with. The content is based on my personal experience and experimentation and may be incomplete or incorrect. Any errors or misinterpretations are unintentional, and I apologize in advance if any statements are misunderstood or misrepresented.

DEV Community

Building an Autonomous FinOps Agent: My Experiment in Automating Cloud Cost Optimization

TL;DR

Introduction

What's This Article About?

Tech Stack

Why Read It?

Let's Design

The System Architecture

The Sequence of Events

The Logic Flow

Let’s Get Cooking

1. The Mock Cloud (`mock_cloud.py`)

2. The Brain (`analyzer.py`)

3. The Orchestrator (`main.py`)

Let's Setup

Prerequisites

Installation Steps

Let's Run

Execution Command

The Output

Results & Analysis

Impact

Edge Cases

Future Roadmap

Closing Thoughts

Disclaimer

Top comments (0)

TL;DR

Introduction

What's This Article About?

Tech Stack

Why Read It?

Let's Design

The System Architecture

The Sequence of Events

The Logic Flow

Let’s Get Cooking

1. The Mock Cloud (mock_cloud.py)

2. The Brain (analyzer.py)

3. The Orchestrator (main.py)

Let's Setup

Prerequisites

Installation Steps

Let's Run

Execution Command

The Output

Results & Analysis

Impact

Edge Cases

Future Roadmap

Closing Thoughts

Disclaimer

1. The Mock Cloud (`mock_cloud.py`)

2. The Brain (`analyzer.py`)

3. The Orchestrator (`main.py`)