Over the past few months, we’ve been adding more and more scalers to KEDA making it easier for users to scale on what they need. Today, we leverage more than 30 scalers out-of-the-box, supporting all major cloud providers & industry-standard tools such as Prometheus that can scale any Kubernetes resource.
But, we are missing a major feature that many modern, distributed applications need - the ability to scale based on HTTP traffic.
It’s time to change this.
Note: You can build your own custom HTTP autoscaling system using the Prometheus scaler per our FAQ. The new HTTP Addon adds first-class, end-to-end support for HTTP.
Introducing our HTTP add-on for KEDA
Autoscaling HTTP is often not as straightforward as other event sources. You don’t know how much traffic will be coming and, given its synchronous nature, supporting scale-to-zero HTTP applications requires an intelligent intermediate routing layer to “hold” the incoming request(s) until new instances of the backend application are created and running.
We’re happy to announce our experimental HTTP add-on for KEDA which is purely focused on solving this problem. This new project introduces an HTTPScaledObject CRD, which you use to autoscale a Kubernetes Deployment, including scale-to-zero.
We’ve taken a “batteries included with reasonable defaults” approach to designing and building the HTTP add-on. This means that you don’t have to run other tools such as Prometheus. At the same time, the system is made up of well-defined, reusable components that can run independently. You can opt-out of the defaults and customize components as you see fit. In many cases, you can even run most of the components independently.
Scaling based on incoming HTTP traffic is different from core KEDA triggers for two reasons:
There is no standard existing API you can call to get a counter or other metric to scale on. For example, we cannot call the Kafka API to get the length of a queue.
You need to set up the infrastructure to route HTTP requests to the server you’re autoscaling
You can see the HTTP add-on as just another trigger in your scaling toolbox.
⚠ Given its experimental state of this project, breaking changes can occur and the HTTP add-on is not supported for production workloads yet.
How does it work?
Since KEDA already does an excellent job listening for events and autoscaling, the HTTP add-on simply adds an HTTP-aware layer on top.
The system is made up of 3 components:
Interceptor: This component accepts HTTP requests into the system, reports pending HTTP request queue metrics to the external scaler, and forwards requests to the target application.
If the application is currently scaled to zero replicas, the interceptor will hold requests until it scales up
External scaler: This is an external push scaler that constantly pings the interceptor to get pending HTTP queue metrics. It transforms these data and sends them directly down to KEDA, which then makes a scaling decision
Operator: This is an Operator that runs for the convenience of the user. It listens for new CRD resources, called HTTPScaledObjects, and creates and configures interceptors and scalers so that your existing Deployment will begin autoscaling according to incoming HTTP traffic.
Seeing it in action
We’ve aimed to make the HTTP add-on approximately as simple to operate and use as KEDA.
There are two major steps to using it:
Install the KEDA HTTP add-on operator
Create a new HTTPScaledObject
Installing the Operator
This is an optional step, but we highly recommend it because the operator makes interacting with the HTTP add-on more convenient. We’ve packaged the operator up in a Helm chart, which you can install by following the instructions below.
The commands below have been tested on Mac and Linux (Ubuntu). They likely work on Windows WSL2, but won’t work on Powershell. You’ll also need to ensure you have access to a Kubernetes cluster with a Kubernetes configuration file properly configured.
First, make sure you’ve set up an environment variable for the namespace you’d like to install everything into:
The HTTPScaledObject CRD instructs the operator to install and configure the interceptor and scaler for a specified Deployment. For this to work, you’ll need to have a Deployment running in the same $NAMESPACE and a Service configured to route traffic to the pods in that Deployment.
We’ve created a sample application here as a Helm chart for you. It comes complete with a properly configured HTTPScaledObject, so when you install the chart, everything will be automatically configured for you and you can skip the rest of this section. If you install this chart, you can skip the rest of this section.
After you have your application set up, copy the following YAML into a file called myautoscaledapp.yaml:
The operator will pick up the CRD and, when it’s done installing and configuring, you’ll see a new Service (among other things!) that’s ready to route HTTP traffic to your Deployment. Send all of your HTTP to the Service, and the Deployment will begin autoscaling!
To clean up, remove the HTTPScaledObject you created, then delete the two Helm charts you installed:
With this release, we’ve built a solid foundation and are now focusing on testing, quality, and expanding the feature set appropriately.
We encourage you to try the software out in your own non-production Kubernetes clusters and share feedback on what you think and we are open to contributions to make it even better.
Today we support autoscaling any HTTP traffic sent to a Kubernetes Services, but we are planning to support more specific traffic patterns including “north-south” with Ingress or the Gateway API and “east-west” with service-to-service communication or service meshes.
In order to achieve this, we are working with the community to support as many existing products that exist today. For example, we will fully rely on the Service Mesh Interface (SMI) specification which has become a solid industry standard for service-mesh workloads.
But why stop there? We are talking to various SIGs to determine if we need a general Traffic Metrics specification that takes the learnings from SMI spec and apply it to all traffic components for a unified traffic metrics approach in Kubernetes.