Blog
Kubernetes cost optimization in 2025: Best practices from Dysnix engineers

Kubernetes cost optimization in 2025: Best practices from Dysnix engineers

8
min read
Olha Diachuk
April 30, 2025

Cost optimisation in Kubernetes has moved well past “set your requests and turn on the HPA”. Maybe 5 years ago, this might have worked for most projects. But now, the appetites and demands for resources (and their economy) are monstrous. 

2024–25 has seen three clear shifts: 

  • Custom and automatic node‑level economics; 
  • FinOps‑style cost accountability fussed with platform tooling;
  • A move to architecture patterns that assume heterogeneous compute, storage and network pricing from day one.

Have questions regarding any of these? Here, my fellow DevOps engineers from Dysnix and I will take you on a pleasant trip, explaining where you can save a dollar or two on your Kubernetes (shortly, K8s) infrastructure using best practices, advanced techniques, and special tools from our experience.

What is Kubernetes cost optimization?

At its core, cost optimization in Kubernetes is about cutting everything that is not used and aligning infrastructure consumption with what brings extra business value, workload patterns, and operational efficiency. 

It’s not a task of the DevOps engineer only, and it would be strange to expect the effect of Kubernetes cost reduction from just one person. This continuous project requires implementing observability, automation, and making the best-fit architectural choices.

Stakeholder category Role Key responsibilities
Technical DevOps/Platform Engineers Configure resources, implement quotas, manage autoscaling
Development Teams Define resource requirements, optimize container practices
SRE Teams Balance costs with reliability, implement efficient scaling
Financial Finance/Procurement Manage cloud budgets, approve spending, implement chargeback
Cloud Cost Analysts Analyze costs, implement tagging, identify optimization opportunities
Business Unit Leaders Align tech investments with outcomes, make cost trade-offs
Leadership & Governance IT Management/CTO Set cloud strategy, balance innovation with costs
Security Teams Ensure optimization doesn't compromise security
Governance Teams Establish resource policies and cost frameworks
Cutting costs is a primary task for many stakeholders

Why Kubernetes cost optimization matters

If you've seen your cloud bill and felt uneasy about your future one at least once, you understand why your infrastructure appetite can financially ruin the project. Kubernetes cost optimization is a business-critical method, not just a technical tweak. Overspending, poor visibility, and risks to both financial and operational health can erode even the best projects.

Here we name more reasons that refer to your personal causes for implementing better Kubernetes cost management:

  • Most companies have unchecked Kubernetes costs

A 2025 Forbes Technology Council report found that 93% of enterprise platform teams struggle with cloud cost management, especially in Kubernetes environments. Nearly half of organizations (49%) actually saw increased cloud spending after adopting Kubernetes, often due to overprovisioning, idle resources, and lack of granular visibility. But it doesn’t make K8s bad.

  • Performance and reliability are at stake

Cost optimization helps to reveal your technical imperfections. For example, overprovisioned clusters can mask performance issues, while underprovisioned ones risk outages. Poor resource management leads to production downtime, erodes customer trust, and causes revenue loss. 

The most successful teams balance cost with performance by continuously monitoring, rightsizing, and automating scaling, never treating optimization as a one-off project or a goal that can be reached at once.

  • Automation and rightsizing are a good start for cutting costs

The 2025 Cast AI Kubernetes Cost Benchmark Report, based on data from over 2,100 organizations, found that clusters running automated rightsizing and spot instance automation achieved a cost reduction of up to 50% compared to static, manually managed clusters.

The report also notes that the average CPU utilization in production clusters is only 37%, meaning most organizations are paying for nearly twice the compute they actually use.
Insightful stats on spot instances from the same report
  • Visibility and accountability drive savings

The CNCF’s 2024 FinOps for Kubernetes analysis highlights that the biggest challenge is that teams lack clear ownership and cost attribution. Mature organizations are now using cost models that break down spend by team, service, and even feature, enabling targeted optimization and accountability.

Visibility and accountability are not about “suspecting” your colleagues; this approach in Kubernetes optimization provides the most detailed and illuminating helicopter view of your inner money flow, including all those nice nodes and pods of your K8s infrastructure. 

Summing up, the trends from both CNCF and Cast AI reports highlight that the most effective cost optimization happens when engineering and finance (FinOps) teams work together. 

This means sharing real-time cost data, setting shared goals (like /requestor/request or /requestor/customer), and making cost a first-class metric alongside latency and uptime.

Best practices in Kubernetes cost reduction

To reach the best results in cost optimization for mature companies, FinOps tooling, scheduling algorithms, and architectural patterns should now interlock:

In the platform dimension, it can be implemented like this:

  • Prometheus or eBPF‑based collectors feed Kubecost, CloudZer, or native GKE/EKS cost APIs; 
  • Policy engines (OpenCost, Fairwinds Insights, in‑house Rego bundles) translate “unit cost” limits into scheduling intents; 
  • Actuators such as Karpenter, Cluster Autoscaler, CAST AI, or Azure Cost‑Optimised Autoscale mutate the cluster to meet both SLOs and $/request targets.

But not all projects need the described approach, as there’s no single remedy for all. In most cases, you need just:

  1. Analyze your architecture decisions for potential cost savings.
  2. Analyze situations of over-provisioning, under-provisioning, and the most resource-intensive processes, focusing specifically on infrastructure aspects.
  3. Cut all deprecated paid services and unused resources.
  4. Check your scaling approach or existing automation.

Having all that on your plate, you’ll be able to make the action plan that will improve your situation.

Architectural methods

Microservices vs. monoliths

Choosing between microservices and monolithic architectures affects cost. Microservices can be scaled independently, allowing you to allocate resources precisely where needed, but they may introduce overhead in networking, service mesh, and inter-service communication. Monoliths might be simpler and cheaper to run at a small scale, but can lead to over-provisioning as you must scale the entire application for a single bottleneck.

Micro vs. mono, multi‑tenant vs. silo, managed vs. self‑hosted — the trade‑off space is now quantifiable.

Pro tip: The real lever is the scaling dimension. A well‑partitioned monolith that exposes a single concurrency hotspot is still cheaper to run than a swarm of nanoservices that thrash the network. Architecture, therefore, needs to be tied to variability, not to size.

Multi-tenancy and namespace design

Deciding whether to run multiple teams or applications in a single cluster (multi-tenancy) or to isolate them in separate clusters impacts cost. Multi-tenancy can improve resource utilization and reduce overhead, but requires careful security and quota management. Poorly designed namespaces or a lack of quotas can lead to resource contention and unexpected cost spikes.

Example of K8s resource quotas | Source

Cluster sizing and federation

Architecting for a single large cluster versus multiple smaller clusters (possibly federated across regions or clouds) affects cost. Multiple clusters can help with fault isolation and compliance, but may increase management overhead and reduce resource sharing efficiency. Federation can enable workload placement in the most cost-effective region or cloud.

Pro tip: A single shared cluster with strict network policies stays cheaper up to ~80 nodes. Beyond that, control-plane costs and noisy-neighbor risk tilt the balance toward a hub-and-spoke model: one “cash register” production cluster per P&L unit, plus a shared dev/staging area.

Use of spot/preemptible instances

Architecting your workloads to tolerate interruptions (stateless, easily restartable jobs) allows you to use spot or preemptible VMs, which are significantly cheaper than on-demand instances. This requires designing for resilience and rapid recovery, often using Kubernetes features like PodDisruptionBudgets and node affinity.

We at Dysnix had a great story involving spot instances that skyrocket the efficiency of a zk-powered project:

zkSync
zkSync is a Layer 2 scaling solution for the Ethereum that leverages zero-knowledge proofs (zk-Rollups) to address Ethereum's scalability limitations.
Before
Idea of the zk-powered product
Request for scalability and cost-efficiency
Requirement for a high load handling
Security concerns
After
Multiple server clusters powered by Kubernetes that scale according to business metrics
The validating core—for off-chain packing of the unlimited flow of transactions
Secure connections with the applications and the blockchain
Read More


Storage and data management choices

Selecting between persistent volumes, object storage, or ephemeral storage impacts both performance and cost. For example, using object storage for logs or backups can be much cheaper than block storage, but may require changes in application logic.

Data type Best storage type in K8s Why so?
Container logs Ephemeral (emptyDir, hostPath) Use ephemeral storage for logs; aggregate with a logging agent (e.g., Fluentd) to persistent external storage.
Application state (cache) In-memory (emptyDir, tmpfs, Redis) Use in-memory or ephemeral storage for cache; do not rely on persistence across pod restarts.
Relational databases PersistentVolume (block/file, SSD) Use high-performance PersistentVolumes (e.g., SSD-backed) with StatefulSets; enable backups and replication.
NoSQL databases PersistentVolume (block/file, SSD) Use PersistentVolumes with fast storage; ensure anti-affinity and backup policies for resilience.
Object storage (Blobs, Media) External (S3, GCS, MinIO) Use external object storage for scalability and durability; mount via CSI or access via API.
Configuration/Secrets Kubernetes Secrets/ConfigMaps Store sensitive data in Secrets, configs in ConfigMaps; avoid mounting as files unless necessary.
User uploads (Documents) PersistentVolume (NFS, block/file) Use networked PersistentVolumes (e.g., NFS, CSI drivers) for shared access; back up regularly.
Streaming data (Kafka, Pulsar) PersistentVolume (block, SSD) Use fast block storage for brokers; ensure high IOPS and redundancy; consider external managed services.
Backups/Snapshots External (Cloud Storage, NFS) Store backups outside the cluster for disaster recovery; automate snapshotting and retention policies.
Metrics/Monitoring data PersistentVolume (SSD, local PV) Use fast local or networked storage for short-term retention; offload to long-term storage externally.


Service mesh and add-ons

Deciding whether to use a service mesh (like Istio or Linkerd) or other add-ons (monitoring, logging, security) introduces both operational benefits and resource overhead. Over-provisioning these components or running them at a cluster-wide scope can inflate costs.

Pro tip: Recent 2025 benchmarks and official documentation show that Istio’s ambient (sidecarless) mode reduces CPU and memory overhead by over 70–90% compared to sidecar mode, with a single ztunnel proxy consuming as little as 0.06 vCPU at 1000 HTTP requests/sec. 

This makes the mesh’s resource consumption negligible relative to application workloads, and ambient mode is now the highest-bandwidth way to achieve secure zero-trust networking in Kubernetes.

Autoscaling strategies

Architectural decisions about how and when to scale—using HPA, VPA, Cluster Autoscaler, or third-party tools—determine how efficiently you match resource supply to demand. For example, event-driven architectures can leverage KEDA for fine-grained, metric-based scaling.

Predictkube Cube

Predictkube is one of the most prominent predictive autoscalers for KEDA made by Dysnix. Enjoy!

Find out more


Application design for cloud-native patterns

Designing stateless applications, using sidecar containers judiciously, and leveraging managed services (like cloud databases or queues) can offload operational burden and optimize costs, but may introduce vendor lock-in or data transfer costs.

Network topology and traffic management

Architecting for minimal cross-zone or cross-region traffic, using internal load balancers, and optimizing ingress/egress rules can reduce data transfer and networking costs, which are often overlooked.

Pro tip: Microfrontends + edge rendering for high-traffic global applications. When latency budgets force workloads into multiple points of presence (POPs), the extra cost of additional clusters is dwarfed by egress savings and CDN origin shielding.

Mircofrontend installation | Source

Non-architectural methods

  • Let’s start with granular resource allocation

Kubernetes allows you to set CPU and memory requests/limits per container. However, deep cost optimization means continuously profiling workloads, using tools like Vertical Pod Autoscaler (VPA) or Goldilocks, to dynamically adjust these values based on real usage patterns, not just static estimates. This prevents both over-provisioning (waste) and under-provisioning (performance issues).

Source
  • Another core component is multi-dimensional autoscaling

Beyond the basic Horizontal Pod Autoscaler (HPA), advanced setups use custom metrics (like queue length, latency, or business KPIs) to drive scaling decisions. Implementation might sound like this: CPU, latency, and business KPI targets in the same ScaledObject, with queue depth as a hard floor and cost as a soft ceiling. 

This ensures that scaling is tightly coupled to actual demand, not just resource consumption, which can be misleading for bursty or event-driven workloads.

  • Node and workload bin-packing works well as a cost optimization method. 

Kubernetes’ scheduler tries to fit pods onto nodes efficiently, but deep cost optimization involves using taints, tolerations, and affinity/anti-affinity rules to ensure high-density packing of compatible workloads. 

This reduces the number of underutilized nodes, directly lowering costs. More mature organizations can use custom schedulers or third-party tools to optimize node lifecycles and beneficial spot/preemptible instance usage.

Example of a mentioned third-party tool, Karpenter | Source

Pro tip: Dynamic pod rightsizing loops work well! Goldilocks and VPA are helpful, but the new pattern is “observe → propose → apply during next GitOps run.” Teams export VPA recommendations nightly, surface them in pull‑request comments, and let the app owner approve or override. Fairwinds’ controller and CloudZero’s PR bot are the popular implementations.

  • Won’t get tired of mentioning cost visibility and chargeback

By integrating tools like Kubecost or Cloud provider billing APIs, teams can attribute costs down to the namespace, deployment, or even label level. This enables proper cost accountability and empowers teams to make data-driven decisions about architecture and scaling.

Pro tip #1: Make a FinOps show‑back to teams via OpenCost. Labels “team=foo,feature=bar” are propagated by admission webhooks; OpenCost emits per‑team cost reports consumed by Jira Automation to comment “your feature cost $X last sprint”. This cultural tooling turns cost optimization into a daily engineering reflex, rather than an end-of-quarter panic.

OpenCost in action | Source

Pro tip #2: QoS‑aware logging budgets with custom scripting. Logging price often eclipses compute, especially with high-traffic microservices and verbose logs. Platform teams now inject a Fluent Bit filter that drops DEBUG below a dynamic “budget” expressed in GiB/hour per namespace. When usage exceeds quota, the log level is coerced to WARN. This turns observability into a predictable line item.

  • Finally, intelligent workload placement across multiple clusters, regions, or even clouds can unlock savings. 

For example, non-critical batch jobs might be scheduled on cheaper spot instances or in regions with lower pricing, while latency-sensitive services stay on premium nodes.

Instead of goodbye: Your K8s cost optimization chores

  1. Enable full cost visibility (OpenCost/Kubecost, cloud provider API, or request custom Dysnix solutions).
  2. Enforce labels and namespaces that map to business units.
  3. Feed real usage into a policy engine that expresses both SLO and unit‑cost budgets.
  4. If you’re into automation, let Karpenter or CAST AI act on those policies at node granularity; HPA/KEDA can act at pod granularity, and VPA or Goldilocks propose container granularity.
  5. Review nightly diff reports in Git; treat cost deltas like failed tests.
  6. Iterate architecture: Choose ARM nodes, spot pools, regional placement, ambient mesh, or serverless control‑planes where they move the $‑per‑request needle.

Despite the general trend of automation and delegating optimization tasks to AI agents and other innovative tools, in practice, our architects and engineers, along with their expertise and analysis, still bring the most value in most cases.

Olha Diachuk
Writer at Dysnix
10+ years in tech writing. Trained researcher and tech enthusiast.
Related articles
Subscribe to the blog
The best source of information for customer service, sales tips, guides, and industry best practices. Join us.
Thanks for subscribing to the Dysnix blog
Now you’ll be the first to know when we publish a new post
Got it
Oops! Something went wrong while submitting the form.
Copied to Clipboard
Paste it wherever you like