Back to blog
March 24, 2026

Why Your Kubernetes Cluster Costs 3× Too Much

Most Kubernetes clusters waste 50–70% of their compute budget. Here are the three root causes — and what to do about them.

 Why Your Kubernetes Cluster Costs 3× Too Much

You open your cloud bill. The number is higher than last month. Again.

Your traffic didn't double. You didn't launch a new service. Nothing major changed. And yet the bill keeps climbing.

This is one of the most common problems we hear from engineering teams running Kubernetes on any cloud provider — Scaleway, GCP, AWS, it doesn't matter. The platform is powerful, the autoscaler is running, everything looks fine. But the money is bleeding out quietly, month after month.

The short answer: Most Kubernetes clusters waste between 50% and 70% of their actual compute budget. Not because of one big mistake — because of three small ones that compound. Oversized nodes sitting mostly idle. An autoscaler that scales up fast and almost never scales back down. And workload resource requests set by guesswork. When all three co-exist, a cluster that should cost €500/month ends up costing €1,500.

Here's how it happens — and how to diagnose it on your own cluster.


Root Cause #1: Your Nodes Are Oversized and Mostly Empty

When you provision a Kubernetes cluster, you pick a node size. Most teams default to something "comfortable" — a dev1-l on Scaleway, an n2-standard-8 on GCP, whatever feels safe. Then they forget about it.

The problem: node utilization on typical production clusters averages 15–25% of allocated capacity. That means on a node with 8 vCPUs and 32 GB of RAM, you're actively using 2 vCPUs and maybe 6–8 GB. The rest is paid for and idle.

This isn't a Kubernetes problem specifically — it's a provisioning defaults problem. Nodes get sized for peak capacity (or imagined peak capacity), but workloads run at average load most of the time. The gap between peak and average becomes dead compute you're paying for.

The compounding factor: when you have multiple nodes all sitting at 20% utilization, bin-packing them down to fewer, better-used nodes could cut your node count in half with zero impact on performance.

Quick diagnostic: Run this on your cluster and look at the REQUESTS column vs actual node capacity:

kubectl top nodes
kubectl describe nodes | grep -A5 "Allocated resources"

If your requested CPU is under 40% of allocatable CPU on most nodes, you have a packing problem.


Root Cause #2: Your Autoscaler Scales Up Perfectly and Almost Never Scales Down

The Kubernetes Cluster Autoscaler is good at one thing: adding nodes when pods are Pending. It does that reliably. The problem is the other direction.

Scale-down is conservative by design. The default --scale-down-unneeded-time is 10 minutes — a node has to be underutilized for 10 consecutive minutes before it's even considered for removal. In practice, with fluctuating workloads, that 10-minute window resets constantly. Nodes that should be removed stay up for hours.

There are also several conditions that block scale-down entirely:

  • Any pod with a local volume attached

  • Any pod not controlled by a ReplicaSet, Deployment, or StatefulSet

  • Pods with PodDisruptionBudgets configured aggressively

  • System pods (kube-proxy, CNI agents, etc.)

In a typical cluster, 20–40% of nodes have at least one of these blocking conditions at any given time. The autoscaler sees them, decides it can't drain the node safely, and leaves the whole node running.

The result: your cluster scales up to handle a traffic spike on a Tuesday afternoon and never fully recovers to baseline. You pay for the spike capacity indefinitely.

Quick diagnostic: Check how many nodes your autoscaler has actually removed in the last 7 days:

kubectl get events --field-selector reason=ScaleDown -n kube-system --sort-by='.lastTimestamp' | tail -20

If that list is short or empty and you have more than 5 nodes, your autoscaler is effectively a one-way door.


Root Cause #3: Resource Requests Are Set by Guesswork

Every pod in Kubernetes has (or should have) a resources.requests block — the CPU and memory the scheduler uses to decide where to place the pod. Here's the uncomfortable truth: in most codebases, those numbers were written by a developer who guessed.

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "1"
    memory: "1Gi"

Why 500m? Why 512Mi? Usually because it looked reasonable, or because someone copied it from another service, or because the docs had it as an example.

The consequences cut both ways:

Over-requesting is the more common problem. The pod claims 500m CPU but only uses 50m at runtime. The scheduler sees the node as "full" at 60% actual utilization. New pods can't schedule there. A new node gets provisioned. You've just paid for an extra node to host empty capacity.

Under-requesting causes a different problem: pods get scheduled onto nodes that can't actually handle their real load, leading to CPU throttling and out-of-memory kills. Teams respond by bumping requests higher — often way too high — and the cycle continues.

The fix requires actual data: measuring real pod CPU and memory consumption over time (p50, p90, p99) and setting requests to something close to the p90 observed value. Tools like the Kubernetes Vertical Pod Autoscaler can recommend values automatically.

Quick diagnostic:

kubectl top pods -A --sort-by=cpu | head -20

Compare the CPU(cores) column against what your pods actually request in their specs. A 5–10× gap is common. A 20× gap is not rare.


How These Three Problems Stack Into a 3× Bill

Each root cause alone wastes 20–30% of compute spend. Together, they don't add — they multiply.

Consider a simple example:

Scenario Node count Monthly cost Baseline (well-tuned) 4 nodes €400 + Oversized nodes 6 nodes €600 + Autoscaler not scaling down 8 nodes €800 + Bad resource requests (packing blocked) 12 nodes €1,200

That's not a contrived example. It's a pattern we see repeatedly. The cluster looks healthy — all pods running, no alerts firing, uptime 99.9%. Meanwhile, €800/month is funding idle compute.

The frustrating part is that none of this is visible from the outside. Your monitoring shows green. Your autoscaler is "working". Your pods are scheduled. The waste is structural, not symptomatic.


Where to Start

Kubernetes cost optimization doesn't require a dedicated FinOps team or a six-week project. It requires three things:

  1. Visibility — per-namespace, per-workload cost breakdown so you know where the money is actually going

  2. Right-sizing recommendations — based on real observed usage, not guesswork

  3. Autoscaler tuning — particularly around scale-down aggressiveness and node consolidation

The tooling options range from open-source (Kubecost, OpenCost) to managed (cast.ai, Costlyst). What matters most at the start isn't the tool — it's getting actual numbers in front of you so you can make decisions with data instead of estimates.


This Is Exactly Why We Built Costlyst

We built Costlyst after running into all three of these problems on our own infrastructure. The observability was missing. The autoscaler was creating new nodepools for every Pending pod. The resource requests were copied from Stack Overflow examples from 2019.

Costlyst is a Kubernetes cost optimization platform built specifically for SMEs and scale-ups — teams that need the results of a FinOps practice without the headcount. It connects to your cluster, maps spend to workloads, and either suggests or automatically executes the right-sizing and consolidation changes.

No 6-month enterprise onboarding. No dedicated FinOps consultant required.

See how it works →