Hamdi (KHELIL) LION

Posted on Feb 9

🧠 Demystifying Metrics in Kubernetes

#kubernetes #cloud #keda #devops

Kubernetes does not magically know when to scale your workloads 🤖
It relies on metrics exposed through dedicated APIs to make scaling decisions.

There are three types of metrics you need to understand:

⚙️ Resource Metrics
📊 Custom Metrics
🌍 External Metrics

Each one represents a different kind of pressure on your system.

🤔 Why Metrics Matter for Autoscaling

Autoscaling in Kubernetes is mainly handled by the Horizontal Pod Autoscaler (HPA).

The HPA keeps asking:

“Hey… are my Pods struggling?” 😅

The answer comes from metrics. Without them, Kubernetes is basically guessing.

⚙️ 1 Resource Metrics = Pod Health Signals

These are the native Kubernetes metrics.

They come from the Metrics Server and only cover:

🧮 CPU usage
🧠 Memory usage

They are exposed via:

metrics.k8s.io

🧩 What they represent

They describe resource consumption, not business traffic.

Your app might be slow because of a database… but CPU could still be chill 🧊

📄 Example HPA based on CPU

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

If average CPU across Pods goes above 70 percent → more replicas 🔥

✅ Pros

Super simple
Works out of the box

❌ Limits

CPU is not always equal to user traffic
Memory often reacts too late

📊 2 Custom Metrics = Your App Talking to Kubernetes

Custom metrics come from applications inside your cluster.

They describe business or application load 💼

They are exposed through:

custom.metrics.k8s.io

🔁 Typical data flow

App exposes /metrics
Prometheus scrapes
Prometheus Adapter maps metrics → Kubernetes API

Now Kubernetes can ask:

“How busy is this Deployment really?” 👀

🌐 Example 1 HTTP requests per second

Metric in Prometheus:

http_requests_per_second

🧠 What it means
Real traffic handled by each Pod

📦 In Kubernetes
A metric attached to Pods

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: 200

If each Pod handles more than 200 requests per second → scale out 🚀

⏱ Example 2 Request duration

Metric:

request_duration_seconds

🧠 What it means
Application performance and saturation

Used as an Object metric:

- type: Object
  object:
    metric:
      name: avg_request_duration
    describedObject:
      apiVersion: apps/v1
      kind: Deployment
      name: api
    target:
      type: Value
      value: 0.5

If average latency goes above 500ms → time to add Pods 🏃‍♂️

🧵 Example 3 Active background jobs

Metric:

active_background_jobs

🧠 What it means
Internal workload of a worker

Each Pod reports its own load, and HPA scales when workers are overloaded 📈

✅ Pros

Scaling reflects real app behavior
Way smarter than CPU only

❌ Limits

Requires Prometheus + Prometheus Adapter
More components to maintain 😬

🌍 3 External Metrics = Work Waiting Outside the Cluster

External metrics come from systems outside Kubernetes.

They are exposed via:

external.metrics.k8s.io

These metrics describe work your Pods must process, even if it lives elsewhere 🌎

📬 Example 1 SQS queue length

Metric:

ApproximateNumberOfMessagesVisible

🧠 What it means
Number of messages waiting in the queue

In Kubernetes
A global metric not tied to specific Pods

If the queue grows → spawn more workers 💪

🐘 Example 2 Kafka consumer lag

Metric:

kafka_consumer_lag

🧠 What it means
Delay between producers and consumers

More lag = your consumers are falling behind 😱
Scale them up!

📦 Example 3 Redis job queue size

Metric:

redis_list_length

🧠 What it means
Number of jobs waiting in Redis queues

Perfect for worker autoscaling 🔄

⏰ Example 4 Time based scaling

Scale more Pods during office hours
Scale down at night 🌙

This is also treated as an external signal, because it’s not tied to Pod resource usage.

🧮 How HPA Uses These Metrics

HPA periodically queries:

Metric Type	API	What it measures
Resource	metrics.k8s.io	CPU and memory
Custom	custom.metrics.k8s.io	App level load
External	external.metrics.k8s.io	Event or system load

Then it calculates how many replicas you need 📐

⚡ KEDA vs Prometheus Adapter

Here comes the game changer 🎮

KEDA (Kubernetes Event Driven Autoscaling) focuses on event driven autoscaling and makes custom and external metrics way easier to use.

🧩 With Prometheus Adapter

You must:

Run Prometheus
Install and configure the Adapter
Write mapping rules
Manage RBAC and certs

It works, but it’s heavy 🏋️

⚡ With KEDA

You define a ScaledObject and KEDA does the magic ✨

Example SQS scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: worker
  minReplicaCount: 0
  maxReplicaCount: 20
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.eu-west-1.amazonaws.com/123/my-queue
      queueLength: "10"
      awsRegion: eu-west-1

KEDA fetches the metric
Exposes it to Kubernetes
Creates and manages the HPA
Handles provider auth 🔐

All without Prometheus Adapter 🤯

🧭 When to Use What

If you want to scale on…	Use…
CPU or memory	Resource metrics
HTTP traffic or app load	Custom metrics
Queues, streams, SaaS	External metrics + KEDA
Events or scale to zero	KEDA

🎯 Final Takeaway

Kubernetes becomes truly powerful when scaling is driven by real workload signals, not just CPU.

⚙️ Resource metrics are the starting point
📊 Custom metrics bring application awareness
🌍 External metrics unlock event driven architectures
⚡ KEDA makes advanced autoscaling simple and production friendly

Once you understand these three metric types, autoscaling stops being magic and becomes a design tool you control 💡🚀

Happy clustering :)

DEV Community

🧠 Demystifying Metrics in Kubernetes

🤔 Why Metrics Matter for Autoscaling

⚙️ 1 Resource Metrics = Pod Health Signals

🧩 What they represent

📄 Example HPA based on CPU

✅ Pros

❌ Limits

📊 2 Custom Metrics = Your App Talking to Kubernetes

🔁 Typical data flow

🌐 Example 1 HTTP requests per second

⏱ Example 2 Request duration

🧵 Example 3 Active background jobs

✅ Pros

❌ Limits

🌍 3 External Metrics = Work Waiting Outside the Cluster

📬 Example 1 SQS queue length

🐘 Example 2 Kafka consumer lag

📦 Example 3 Redis job queue size

⏰ Example 4 Time based scaling

🧮 How HPA Uses These Metrics

⚡ KEDA vs Prometheus Adapter

🧩 With Prometheus Adapter

⚡ With KEDA

🧭 When to Use What

🎯 Final Takeaway

Top comments (0)