Kubernetes management and using K8s clusters: Things we have learned over the last ten years

Kubernetes

DevOps

Services

min read

Olha Diachuk

September 2, 2024

Recently, we celebrated the decade of Kubernetes supremacy. Without this open-source kingdom of containerization, we would never have reached this stage of our development. Frankly, as any modern DevOps engineer would say, we love K8s and its ecosystem.

As Dysnix appeared around the same time, we dealt with managing Kubernetes clusters, observing all stages of K8s development as a tool, platform, and even philosophy. It was one of the solutions that made orchestration more available en masse, taking cloud and other “enterprise” features closer to even solo developers and SMEs.

Dysnix itself is a picturesque example of such an effect. Thanks to our dedication to the community and everyday attempt to level up our technical skills in various fields, we remain the boutique-sized company yet one that powers up with the following giants:
‍

PancakeSwap

DEX handling 100,000 RPS with 62.5x faster response

Nansen

Fail-proof secured blockchain infrastructure

zkSync

ZK-infrastructure with a custom scaling solution

Polygon

High-available RPC endpoint; Listing in GCP marketplace

NFTrends

Full-cycle blockchain development

Wand

Next-gen AI infrastructure

With this article, we want to recall, mention, and celebrate the features and mechanisms of Kubernetes management mixed with our experience. Hang on, and let’s go!

What is Kubernetes management?

K8s management involves orchestrating Kubernetes clusters, deploying applications, managing workloads, and maintaining overall system health. In simple terms, it’s all DevOps and cloud engineers do to ensure your infrastructure utilizes resources with the best possible productivity, maintains security, and stays viable under any workload circumstances and with the latest updates.

Kubernetes v1.31 marks the first release after the project successfully celebrated its first ten years. It introduces a few enhancements in Kubernetes management tools, including significant updates like the general availability of the pod failure policy for jobs and CPU distribution across cores.

It also improves persistent volume handling, introduces autoconfiguration for the node group driver, and introduces read-only volumes based on OCI artifacts in alpha.

We found K8s' history the most exciting during the past “Orchestration Wars” with Docker and other tools. Why has the K-tool won the name of the industry standard at its dawn? There were a couple of good reasons for that:

Docker Swarm was easy to use but lacked advanced features and scalability, making it suitable for smaller setups.
Apache Mesos offered high scalability and resource management but was complex to set up and configure, making it better suited for large, distributed systems.
Kubernetes provided a balance of scalability, flexibility, and advanced orchestration features, along with a fast-growing community, which helped it become the dominant platform.

Kubernetes cluster management was a community child (and, as it seems, one of Google's favorite outsprings). It was much closer to users, listening to their needs and answering them with features. Massive feedback and pull requests from people who used the tool daily led to the flourishing of benefits compared to other instruments.

*Source:* *A Brief History of Kubernetes: From 2013 Till Now*

In 2015-2016, the main competitors of K8s were specialized tools by Docker and Apache. So here’s the difference between them, described shortly:

Docker Swarm was easy to use but lacked advanced features and scalability, making it suitable for smaller setups.
Apache Mesos offered high scalability and resource management but was complex to set up and configure, making it better suited for large, distributed systems.‍
Kubernetes provided a balance of scalability, flexibility, and advanced orchestration features, along with a fast-growing community, which helped it become the dominant platform.

As part of the K8s community, we are proud to have remained passionate investors in our favorite tool for ten years straight 🙂

Today, managing Kubernetes is more than a fascinating task that includes:

Cluster provisioning;
Resource management;
Scaling (custom autoscaling, in Dysnix case);
Monitoring and logging;
Security and compliance;
Backup, self-healing, and disaster recovery;
Cluster upgrades and patching.

We prepared some deeper investigations of these activities and how we perform them.

A closer look at a day of the Kubernetes cluster admin

Manage Kubernetes cluster lifecycle

This activity contains most of our routine; this continuous process includes provisioning, upgrading, scaling, and decommissioning clusters as needed.

*Example of running K8s on AWS, source:* *Running Kubernetes in Production*

Kubernetes has improved in lifecycle management through tools like kubeadm and enhanced APIs that simplify cluster upgrades and version management. For instance, Kubernetes v1.31 introduces improved APIs for upgrading clusters without downtime, a handy feature for enterprises running mission-critical applications. We love it!

Control plane management and automation

We automate everything that must be automated and delete unnecessary complexities in the infrastructure's design and core to save resources and optimize costs.

The control plane is the core of any Kubernetes cluster, responsible for maintaining the desired state of the cluster. Recent releases have simplified the management of the control plane with features like API server autoscaling and improved scheduler performance.

Kubernetes v1.30 introduced optimizations that reduce control plane latency and improve handling large numbers of concurrent API requests, which is essential for high-traffic environments.

Networking and traffic management

*Internet-to-Service Networking. Source:* *Kubernetes Networking Fundamentals*

Effective Kubernetes management also includes robust networking capabilities. Kubernetes offers native tools like kube-proxy and CNI (Container Network Interface) plugins for networking, but newer enhancements focus on better traffic management and ingress control.

Storage management and data persistence

Kubernetes has advanced features like Dynamic Volume Provisioning, Persistent Volume Claims (PVCs), and Storage Classes. Recent improvements in Kubernetes include more robust volume management features, such as VolumeManager reconstruction after kubelet restarts, ensuring data integrity and reducing the risk of data loss during node failures.

Security and compliance management

*A comprehensive approach to container security and components of DevSecOps. Source:* *A layered approach to container and Kubernetes security*

Security is a critical aspect of Kubernetes management. Kubernetes offers built-in security features such as Role-Based Access Control (RBAC), Pod Security Admission, Hierarchical Namespace Controller (HNC), and Network Policies. Recent updates have shifted towards more comprehensive security models, including Structured Authentication Configuration and Modular Authorization, which provide more granular security controls and help meet compliance requirements more effectively.

As you can see, the more you’re into cluster management, Kubernetes offers you more and more possibilities to orchestrate your way.

*Source: A short version of the* *Kubernetes Tool Ecosystem*

Thanks to the variety of these tools, K8s have become the most universal yet comprehensible instrument for orchestrating projects in various environments.

Why use Kubernetes? And when it disappoints you

The main reason behind this “why” is that K8s is so popular, universal, and feature-stuffed that people don’t even bother to look for something other than an industry standard for orchestration. Besides, the uncertainty and insecurity of using some less popular tools can be daunting for big projects that can’t risk their infrastructure. Other reasons for having your cluster in Kubernetes might be as such:

Unlimited scalability

Large-scale applications like Tinder and Pinterest have leveraged Kubernetes to scale to hundreds of nodes and thousands of containers to handle millions of users daily without compromising performance. This level of scalability is hardly imaginable outside the K8s management system.

Goodbye, vendor lock-in!

Kubernetes cluster manager abstracts the underlying infrastructure, which means you can deploy your applications across different environments—whether on-premises, in the cloud, or a hybrid setup. Airbnb, for instance, uses Kubernetes to facilitate continuous delivery, enabling hundreds of engineers to seamlessly deploy new services across various environments.

A great friend of DevOps

Its ability to automate deployments, scaling, updates, and other actions with Kubernetes objects allows development teams to push changes faster and more reliably. This has been a game-changer for companies like The New York Times, where deployment times dropped from 45 minutes to just a few minutes after adopting Kubernetes.

No fish is big enough

Many organizations using microservices architectures benefit from Kubernetes' ability to manage complex service-to-service communications, handle failovers, and distribute resources effectively. It is also well-suited for data-intensive workloads, including AI, machine learning (ML), and big data processing.

This is why organizations like CERN use Kubernetes for scientific computing, where they need to scale resources and optimize infrastructure dynamically for high data throughput.

So, if you have voluminous shifting workloads and non-linear infrastructure, need to set it your way, and don’t mind a bit of complexity that will simplify your life further, K8s is your right choice.

Cases when K8s won’t do the thing

No matter how much we love Kubernetes, there are cases when it brings more problems than benefits. We’ll briefly go through a few examples:

1. Simple monolithic application

*Architecture of a simple monolithic application*

As you know, Kubernetes cluster management tools are quite time-consuming to adopt, implement, optimize, and maintain, which might make them unprofitable for small-scale or simple applications with no gigantic plans for the future. If your project doesn't require features like autoscaling, extreme fault tolerance, or multi-cloud portability, simpler solutions like Docker Swarm or AWS Lambda may suffice.

2. No resources for learning and maintaining K8s

Kubernetes has a steep learning curve and can require significant engineering effort to manage effectively, especially in complex environments. This includes configuring networking, storage, and security policies, which can become cumbersome without experienced DevOps teams. The more resources you need for K8s adoption, the longer the delay for the steady ROI you get.

3. You spend fewer resources than the K8s itself needs

Kubernetes requires substantial infrastructure to run efficiently. The control plane and associated components consume resources, even when the workload is minimal. This makes Kubernetes less suitable for smaller applications where resource optimization and minimal footprint are more important than scalability.

4. You’re already serverless, and it’s enough for your productivity

In some use cases, serverless architectures offer a simpler and more cost-effective alternative to Kubernetes. With serverless computing, developers don’t have to manage servers, as the cloud provider handles the scaling and provisioning of resources automatically.

How does Kubernetes work?

Control Plane and Worker Nodes

Kubernetes operates through a master-worker architecture, where the Control Plane manages the cluster's state, and Worker Nodes run the containerized applications.

The Control Plane manages the cluster's overall state, scheduling workloads (Pods) to the nodes and handling updates and scaling.

Worker Nodes are machines (virtual or physical) where the workloads are run. Each node contains the necessary components to run containerized applications, including:

A container runtime (like Docker or containerd);
The kubelet (an agent that ensures containers are running);
Kube-proxy (a network proxy for managing communication within and outside the cluster).

Pods & Controllers

Pods are the smallest deployable units that represent a group of one or more containers that share the same network namespace and storage volumes.

Kubernetes orchestrates these Pods through various controllers, such as Deployments (for mostly stateless applications), StatefulSets (for stateful applications), and DaemonSets (to ensure a copy of a Pod runs on every node).

You can deploy a web application using a Deployment object, which allows K8s to manage multiple application replicas across different nodes, ensuring high availability and load balancing. If you’re feeling adventurous enough to try advanced controllers, check OpenKruise.io.

Service discovery and networking

Kubernetes uses Services to expose Pods to other Pods within the cluster or to external traffic. Services provide a stable IP address and DNS name for Pods, even as the underlying Pods might be replaced or scaled up/down.

A combination of Cluster IPs, NodePorts, LoadBalancers, and Ingress Controllers manages networking in Kubernetes. These components help manage internal and external traffic, ensuring secure and reliable communication between microservices.

An online store might use a LoadBalancer Service to expose the frontend web application to external customers while using Cluster IP Services for internal communication between the frontend and backend microservices. Integrating a service mesh like Istio or Linkerd can further enhance the setup and traffic management features such as automatic retries, circuit breaking, and secure communication through mutual TLS between microservices, creating a resilient and efficient architecture.

Scheduling and scaling

The Kubernetes Scheduler is responsible for placing Pods on nodes based on resource requirements, policies, and constraints. Kubernetes supports both manual and automatic application scaling.

*For advanced autoscaling,* *PredictKube* *fits just fine*

Horizontal Pod Autoscaling adjusts the number of Pods based on observed CPU utilization or other custom metrics, ensuring optimal resource usage.

A backend service handling user transactions might have a Horizontal Pod Autoscaler configured to maintain CPU usage at 70%, scaling the number of Pods up or down to handle traffic changes without manual intervention.

Persistent storage and data management

Kubernetes supports stateful applications through Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). Storage can be dynamically provisioned using Storage Classes, which define the types of storage (e.g., SSD, HDD) that should be used for persistent volumes.

Recent Kubernetes versions have enhanced storage management with robust volume handling and snapshot functionalities, which help in backup and disaster recovery scenarios.

To ensure data persistence, you may deploy a database service like MySQL using a StatefulSet with PVCs. Even if the Pod is rescheduled to another node, the data remains intact and accessible.

Security and policy management

Kubernetes provides multiple layers of security, including Role-Based Access Control (RBAC), Network Policies, and, more recently, Pod Security Admission.

These features ensure that only authorized users and services can access resources within the cluster and that Pods are secured according to defined policies.

A Network Policy can be used to restrict access so that only specific microservices can communicate with, for example, the database, thereby minimizing the attack surface within the cluster.

Monitoring and logging

Kubernetes integrates well with monitoring tools like Prometheus and Grafana for real-time metrics and alerting.

*NEL monitoring in Grafana. Source:* *Error logging puzzle for DevOps: Cloudflare vs NEL*

The ELK Stack (Elasticsearch, Logstash, Kibana) is commonly used for centralized logging, helping teams troubleshoot issues and understand application performance.

Using Prometheus, a DevOps team can set up alerts to notify them if a critical application’s response time exceeds a certain threshold, allowing for quick remediation actions.

How to manage Kubernetes cluster: Best practices and approaches

We are relatively done with the components, and how they work, so we’re ready to answer another question about how to manage Kubernetes cluster and to make it all work for you.

a. Managed K8s Services

Amazon EKS, Google GKE, and Microsoft AKS are popular managed services for those projects that prefer to offload the operational burden of managing Kubernetes infrastructure. These services handle critical aspects such as:

Control Plane of managed Kubernetes cluster: Managed services handle the control plane's setup, management, and scaling, which includes components like the API server, etcd, and scheduler.
The provider automatically manages regular updates and security patches, reducing administrative overhead and ensuring clusters are up-to-date with the latest features and security fixes.
Managed services integrate seamlessly with the cloud provider's ecosystem, offering additional monitoring, logging, and security tools that are tightly coupled with Kubernetes.

Leveraging managed Kubernetes services for simplified operations is extremely seductive. Still, it also has downsides—you’ll be limited with what your provider offers to you (and what want to keep away from your eyes, like hidden costs), your project won’t be protected from the nature of shared control planes of cloud vendors, etc.

b. Self-managed approach

For organizations that need full control over their environments or have unique security and compliance requirements, self-managing clusters is a viable option. This approach involves using tools like:

Terraform and Ansible: Infrastructure as Code (IaC) tools like Terraform and Ansible automate the deployment and management of Kubernetes clusters, enabling reproducibility and consistency across environments.
Kubernetes Operations: A tool for creating, upgrading, and managing highly available, production-grade clusters on AWS.
Kubeadm: A Kubernetes tool designed to simplify the installation of clusters, focusing on the "kubeadm init" and "kubeadm join" commands.

While self-management provides flexibility, it requires higher expertise and ongoing operational investment to maintain and secure the cluster infrastructure. Yes, depending on your project size and complexity, you’ll need a Kuber-native DevOps and maybe not a single one.

c. Multi-cluster and Multi-cloud management

Tools that support cluster managing across different platforms, such as Rancher, VMware Tanzu, and Red Hat OpenShift, enable centralized management, policy enforcement, and unified monitoring across clusters, providing a consistent operational experience regardless of the underlying infrastructure. Some features to consider:

These tools offer a unified management plane to oversee multiple Kubernetes clusters, providing visibility into cluster health, resource usage, and security compliance.
Implementing consistent security, network, and resource policies across clusters ensures standardization and reduces operational risk.
This approach simplifies deploying applications across multiple clusters, managing updates, and scaling applications to meet demand.

d. GitOps

Adopting GitOps for Continuous Deployment and infrastructure management is what shows the guts of any DevOps!

By using Git repositories as the source of truth for declarative configurations, teams can automate the deployment and management of any resources. Tools like ArgoCD and Flux continuously monitor Git repositories and apply changes to the clusters, ensuring consistency and reducing the risk of manual errors. This approach features the following:

All changes to the infrastructure and applications are made declaratively in Git. This allows teams to use pull requests for changes, providing a clear audit trail and the ability to roll back changes if needed.
Automated syncing and reconciliation: Tools like ArgoCD and Flux continuously monitor Git repositories and automatically apply changes to Kubernetes clusters, ensuring that the cluster's actual state always matches the desired state defined in Git.
Improved security and compliance: By restricting direct access to the Kubernetes clusters and using Git as the single point of control, GitOps reduces the risk of configuration drift and unauthorized changes.

This approach is similar to the self-managed approach but depends on more “ops” tools.

e. Observability and Monitoring

To manage Kubernetes clusters effectively, you’ll need critical insights into cluster performance, resource utilization, and potential bottlenecks. This data-driven approach enables proactive management and faster issue troubleshooting.

Widely used for monitoring and alerting, Prometheus collects metrics from infrastructure components and workloads, while Grafana provides dashboards for visualization.
ELK Stack (Elasticsearch, Logstash, Kibana) is commonly used for centralized logging and analytics, helping teams troubleshoot and analyze cluster and application logs efficiently.
Service Mesh (e.g., Istio, Linkerd) provides advanced networking capabilities, such as traffic management, security, and observability, at the service level, enhancing the reliability and security of microservices deployed on Kubernetes.

Ten years of K8s: Dysnix experience

The history of Dysnix is tightly coupled with Kubernetes; it shaped us from system administrators to DevOps and Cloud Architects, and then we partly continued our evolution in the Web3 realm but never left our favorite tool out of our hands.
‍
K8s helped us implement the infrastructures the way we see them and according to our principles: perfection in details, solving issues by preventing them in design, and thinking about the future of the project we are working on. We improved it as a part of a caring community, and there was tons of satisfaction when our ideas got attention there.