AI-based predictive Kubernetes autoscaling tool

Get a free access
Get a personalized 20-minute tech consultation on how your infrastructure can benefit from unlimited free access to PredictKube.
Compatible with:

Input the data for 1+ week and get proactive autoscaling in Kubernetes for up to 6 hours horizon based on AI prediction

Get the most out of AI-based Kubernetes autoscaler

Kick-off start
Our AI model can begin working with the traffic data for two weeks to provide you with a reliable prediction for Kubernetes autoscaling nodes.
Proactive scaling
With PredictKube, you can complete autoscaling in Kubernetes before the load rises, thanks to forecasts made by our AI model incorporated in the cluster autoscaling tool.
Scaling automation
The predictive autoscaling Kubernetes tool optimizes the number of active nodes preventively, and when the traffic increases—all your nodes are ready.
PreditKube x Google Cloud Case Study: How we accurately forecasted 90% of PancakeSwap's traffic spikes, cut costs by 30%, and reduced peak response time by 62.5x.
Read the Case

Problems PredictKube solves

Overprovisioning and high cloud bills

You overpay to cover any traffic needs you might have to avoid traffic loss. That’s inefficient.

Downtime and high latency

Infrastructure gets overloaded, and your users can’t connect to your product/service. You lose traffic.

Problematic project growth

Make your infrastructure transparent and visible for you, manage it efficiently, and prevent errors.
Power up your K8s infrastructure with game-changing AI autoscaler and solve infra challenges during a free call with Dysnix engineers
Daniel Yavorovych
Co-Founder & CTO at and Dysnix

It’s easy to start using Kubernetes cluster autoscaler right now

Install PredictKube and solve the overprovisioning problem. Get your smartest Kubernetes cluster autoscaler in a few steps:
1.Normal state
helm repo add kedacore https://kedacore.github.io/charts
2.Update Helm Repo
helm repo update
3.Install keda Helm chart
kubectl create namespace keda
helm install kedakedacore/keda --namespace keda
4.Create PredictKube Credentials secret
API_KEY="<change-me>"
kubectl create secret generic predictkube-secrets --from-literal=apiKey=${API_KEY}
5.Get API Key
To make our AI model access your data and make a prediction based on it, please use the API key we'll send to your e-mail.
Number of nodes:
Thank you
We will contact you as soon as possible
Oops! Something went wrong while submitting the form.
6.Configure Predict Autoscaling
tee scaleobject.yaml << EOF
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
 name: keda-trigger-auth-predictkube-secret
spec:
 secretTargetRef:
 - parameter: apiKey
   name: predictkube-secrets
   key: apiKey
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
 name: example-app-scaler
spec:
 scaleTargetRef:
   name: example-app
 pollingInterval:
60
 cooldownPeriod:
300
 minReplicaCount:
3
 maxReplicaCount:
50
 triggers:
 -
type: predictkube
   metadata:
     predictHorizon:
"2h"
     
historyTimeWindow: "7d"  # We recommend using a minimum of a 7-14 day time window as historical data
     prometheusAddress: http://kube-prometheus-stack-prometheus.monitoring:9090
     query: sum(irate(http_requests_total{pod=~"example-app-.*"}[2m]))
     queryStep: "2m" # Note: query step duration for range prometheus queries
     threshold: '2000' # Value to start scaling for
   authenticationRef:
     name: keda-trigger-auth-predictkube-secret
EOF

Under the hood: Tools inside

PredictKube is officially recognized as a KEDA scaler
View the KEDA article

FAQ: All you need to know about Kubernetes autoscaling and our autoscaler

What is Kubernetes autoscaling?

Kubernetes autoscaling automatically adjusts resources for your applications based on demand. 

Imagine an e-commerce site like Amazon. During peak hours, they experience a surge in traffic. With autoscaling, Kubernetes can automatically spin up more resources (like additional pods) to handle the increased load, just like Amazon scales up its servers. This ensures a smooth user experience, avoids the growth of latency, and minimizes delays and errors. 

Conversely, during slower periods, autoscaling can scale down resources to cut costs. But not every project needs autoscaling—still, manual scaling works fine for many companies with a simple pattern of the traffic load.

What are the types of autoscaling available in Kubernetes?

Kubernetes supports three main types of autoscaling:

  • Horizontal Pod Autoscaler (HPA): Adds or removes pods based on metrics like CPU or memory usage. (e.g., automatically scaling web servers during a traffic spike)
  • Vertical Pod Autoscaler (VPA): Adjusts resource requests and limits for individual pods. (e.g., dynamically allocating more CPU to a data processing pod)
  • Cluster Autoscaler: Adds or removes entire nodes in the cluster based on overall resource demands. (e.g., spinning up more servers during peak season for a retail company)

We at Dysnix have developed one more type: predictive autoscaler for k8s, an AI-based product named PredictKube, that scales resources in advance and works based on historical and business metrics data.

How does Horizontal Pod Autoscaling (HPA) work?

With HPA, you can ensure your, for example, game servers keep pace with fluctuating player counts. HPA automatically scales your servers up by adding pods when a surge of players hits, like during a new game release, maintaining a smooth experience. However, it can be susceptible to over-scaling if configured with overly aggressive metrics, wasting resources. Additionally, HPA  may not react quickly enough to sudden spikes, causing temporary lag until new pods spin up. 

For these reasons, it's crucial to carefully configure HPA metrics and consider it as one part of a comprehensive autoscaling strategy.

What metrics can be used with HPA?

HPA can use several types of metrics, including:

Resource Indicators

  • CPU usage: Measures how much processing power your pods are consuming.
  • Memory usage: Tracks how much memory your pods are utilizing.

Custom Metrics

You can define custom application-specific metrics that provide a more granular picture of your workload's health.

  • Request queue length: Monitors the number of requests waiting to be processed by your application.
  • Concurrency level: Tracks the number of concurrent requests your application is handling.

External Indicators

If your application interacts with external services, you can use metrics from those services to trigger scaling events.

  • Database connections: Scales based on the number of connections to your database.
  • External API calls: Adjusts pods based on the volume of calls to an external API.

How does Vertical Pod Autoscaling (VPA) work?

Vertical Pod Autoscaler (VPA) is like having an auto-adjusting budget for your cloud resources. 

Imagine your marketing team runs ad campaigns. With HPA (Horizontal Pod Autoscaler), you'd simply add more servers (like renting more advertising places) if traffic spikes. VPA is more precise. It monitors each campaign's resource usage (like ad spend) and allocates more resources (budget) to high-performing campaigns while scaling back on less effective ones.

This optimizes your spending and ensures each campaign gets the resources it needs to succeed, without unnecessary overflow.

What is Cluster Autoscaling?

Cluster Autoscaler acts as a dynamic resource manager, ensuring your Kubernetes cluster has the right amount of muscle (nodes) to handle fluctuating workloads cost-effectively. Here's an example:

A company runs a machine learning application on Kubernetes. During the training phase, the application requires a lot of CPU and memory resources. Cluster Autoscaler automatically detects this surge in demand and scales the cluster up by adding new nodes. This ensures the training process finishes quickly without resource bottlenecks.

Once the training is complete, the application goes into a prediction phase where resource usage drops significantly. Cluster Autoscaler recognizes this and scales the cluster down by removing unnecessary nodes. This saves the company money on cloud costs associated with unused resources.

What is Kubernetes predictive autoscaling?

Kubernetes itself doesn't have built-in predictive autoscaling, but it can be achieved through external tools and integrations. PredictKube is recognized as one of the most efficient predictive autoscaling tools that’s available in the market today. Here’s how it works, explained in a simplified example:

Imagine a news website that experiences a surge in traffic every election day. Predictive autoscaling tool like PredictKube would analyze past election day data and scale the resources of the website according to the estimated volume of the upcoming traffic spike. Thus, it ensures smooth performance during the event, preventing lags or crashes. 

This proactive approach helps handle traffic spikes efficiently and avoids potential bottlenecks. However, it requires additional setup and integration with external tools like KEDA (Kubernetes Event-driven Autoscaling) or specialized prediction engines. But we at Dysnix can take care of it for you. Contact us for more.

How does predictive autoscaling differ from traditional autoscaling in Kubernetes?

Traditional autoscaling in Kubernetes, like Horizontal Pod Autoscaler (HPA), is reactive. HPA monitors resource metrics like CPU or memory usage and scales pods (application instances) up or down based on predefined thresholds. This ensures your application has enough resources to run smoothly, but it doesn't anticipate future needs. So there might be a moment when your resources haven’t arrived yet, but the need is extreme right here, right now.

Predictive autoscaling, on the other hand, is proactive. It leverages machine learning or statistical models to analyze historical data, such as traffic patterns or seasonal trends. By identifying these patterns, it can predict future resource demands and take action before a surge hits. 

Copied to Clipboard
Paste it wherever you like