Kubernetes does not magically know when to scale your workloads ๐ค
It relies on metrics exposed through dedicated APIs to make scaling decisions.
There are three types of metrics you need to understand:
โ๏ธ Resource Metrics
๐ Custom Metrics
๐ External Metrics
Each one represents a different kind of pressure on your system.
๐ค Why Metrics Matter for Autoscaling
Autoscaling in Kubernetes is mainly handled by the Horizontal Pod Autoscaler (HPA).
The HPA keeps asking:
โHeyโฆ are my Pods struggling?โ ๐
The answer comes from metrics. Without them, Kubernetes is basically guessing.
โ๏ธ 1 Resource Metrics = Pod Health Signals
These are the native Kubernetes metrics.
They come from the Metrics Server and only cover:
๐งฎ CPU usage
๐ง Memory usage
They are exposed via:
metrics.k8s.io
๐งฉ What they represent
They describe resource consumption, not business traffic.
Your app might be slow because of a databaseโฆ but CPU could still be chill ๐ง
๐ Example HPA based on CPU
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
If average CPU across Pods goes above 70 percent โ more replicas ๐ฅ
โ Pros
Super simple
Works out of the box
โ Limits
CPU is not always equal to user traffic
Memory often reacts too late
๐ 2 Custom Metrics = Your App Talking to Kubernetes
Custom metrics come from applications inside your cluster.
They describe business or application load ๐ผ
They are exposed through:
custom.metrics.k8s.io
๐ Typical data flow
App exposes /metrics
Prometheus scrapes
Prometheus Adapter maps metrics โ Kubernetes API
Now Kubernetes can ask:
โHow busy is this Deployment really?โ ๐
๐ Example 1 HTTP requests per second
Metric in Prometheus:
http_requests_per_second
๐ง What it means
Real traffic handled by each Pod
๐ฆ In Kubernetes
A metric attached to Pods
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 200
If each Pod handles more than 200 requests per second โ scale out ๐
โฑ Example 2 Request duration
Metric:
request_duration_seconds
๐ง What it means
Application performance and saturation
Used as an Object metric:
- type: Object
object:
metric:
name: avg_request_duration
describedObject:
apiVersion: apps/v1
kind: Deployment
name: api
target:
type: Value
value: 0.5
If average latency goes above 500ms โ time to add Pods ๐โโ๏ธ
๐งต Example 3 Active background jobs
Metric:
active_background_jobs
๐ง What it means
Internal workload of a worker
Each Pod reports its own load, and HPA scales when workers are overloaded ๐
โ Pros
Scaling reflects real app behavior
Way smarter than CPU only
โ Limits
Requires Prometheus + Prometheus Adapter
More components to maintain ๐ฌ
๐ 3 External Metrics = Work Waiting Outside the Cluster
External metrics come from systems outside Kubernetes.
They are exposed via:
external.metrics.k8s.io
These metrics describe work your Pods must process, even if it lives elsewhere ๐
๐ฌ Example 1 SQS queue length
Metric:
ApproximateNumberOfMessagesVisible
๐ง What it means
Number of messages waiting in the queue
In Kubernetes
A global metric not tied to specific Pods
If the queue grows โ spawn more workers ๐ช
๐ Example 2 Kafka consumer lag
Metric:
kafka_consumer_lag
๐ง What it means
Delay between producers and consumers
More lag = your consumers are falling behind ๐ฑ
Scale them up!
๐ฆ Example 3 Redis job queue size
Metric:
redis_list_length
๐ง What it means
Number of jobs waiting in Redis queues
Perfect for worker autoscaling ๐
โฐ Example 4 Time based scaling
Scale more Pods during office hours
Scale down at night ๐
This is also treated as an external signal, because itโs not tied to Pod resource usage.
๐งฎ How HPA Uses These Metrics
HPA periodically queries:
| Metric Type | API | What it measures |
|---|---|---|
| Resource | metrics.k8s.io | CPU and memory |
| Custom | custom.metrics.k8s.io | App level load |
| External | external.metrics.k8s.io | Event or system load |
Then it calculates how many replicas you need ๐
โก KEDA vs Prometheus Adapter
Here comes the game changer ๐ฎ
KEDA (Kubernetes Event Driven Autoscaling) focuses on event driven autoscaling and makes custom and external metrics way easier to use.
๐งฉ With Prometheus Adapter
You must:
Run Prometheus
Install and configure the Adapter
Write mapping rules
Manage RBAC and certs
It works, but itโs heavy ๐๏ธ
โก With KEDA
You define a ScaledObject and KEDA does the magic โจ
Example SQS scaler:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaler
spec:
scaleTargetRef:
name: worker
minReplicaCount: 0
maxReplicaCount: 20
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.eu-west-1.amazonaws.com/123/my-queue
queueLength: "10"
awsRegion: eu-west-1
KEDA fetches the metric
Exposes it to Kubernetes
Creates and manages the HPA
Handles provider auth ๐
All without Prometheus Adapter ๐คฏ
๐งญ When to Use What
| If you want to scale onโฆ | Useโฆ |
|---|---|
| CPU or memory | Resource metrics |
| HTTP traffic or app load | Custom metrics |
| Queues, streams, SaaS | External metrics + KEDA |
| Events or scale to zero | KEDA |
๐ฏ Final Takeaway
Kubernetes becomes truly powerful when scaling is driven by real workload signals, not just CPU.
โ๏ธ Resource metrics are the starting point
๐ Custom metrics bring application awareness
๐ External metrics unlock event driven architectures
โก KEDA makes advanced autoscaling simple and production friendly
Once you understand these three metric types, autoscaling stops being magic and becomes a design tool you control ๐ก๐
Happy clustering :)
Top comments (0)