Configure Scaling Metrics Latest
Autoscaling metric configuration on an InterceptorRoute
The scalingMetric field on an InterceptorRoute determines what metric drives autoscaling.
You can scale based on concurrent request count, request rate, or both.
At least one metric must be set.
Concurrency metric
Scale based on the number of in-flight requests per replica:
apiVersion: http.keda.sh/v1beta1
kind: InterceptorRoute
metadata:
name: my-app
spec:
target:
service: <your-service>
port: <your-port>
scalingMetric:
concurrency:
targetValue: 100
The add-on targets targetValue concurrent requests per replica.
When the total concurrent requests across all replicas exceeds replicas * targetValue, KEDA scales up.
| Field | Required | Description |
|---|---|---|
targetValue | Yes | Target concurrent request count per replica. |
Request rate metric
Scale based on requests per second, averaged over a sliding window:
apiVersion: http.keda.sh/v1beta1
kind: InterceptorRoute
metadata:
name: my-app
spec:
target:
service: <your-service>
port: <your-port>
scalingMetric:
requestRate:
targetValue: 100
window: 1m
granularity: 1s
| Field | Required | Description |
|---|---|---|
targetValue | Yes | Target requests per second per replica. |
window | 1m | Sliding time window over which the average request rate is calculated. |
granularity | 1s | Bucket size within the window. Smaller granularity gives more responsive scaling at the cost of higher sensitivity to bursts. |
Using both metrics
An InterceptorRoute can set both concurrency and requestRate.
KEDA scales to whichever metric demands more replicas.
apiVersion: http.keda.sh/v1beta1
kind: InterceptorRoute
metadata:
name: my-app
spec:
target:
service: <your-service>
port: <your-port>
scalingMetric:
concurrency:
targetValue: 50
requestRate:
targetValue: 200
This is useful when you want to handle both sustained throughput (rate) and bursty traffic (concurrency).
Scaling boundaries and cooldown
Minimum and maximum replica counts and cooldown are set on the KEDA ScaledObject, not the InterceptorRoute:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: my-app
spec:
scaleTargetRef:
name: <your-deployment>
minReplicaCount: 0 # 0 enables scale-to-zero
maxReplicaCount: 10
cooldownPeriod: 300 # seconds before scaling to zero after traffic stops
Setting minReplicaCount: 0 enables scale-to-zero.
The cooldownPeriod controls how long KEDA waits after the last request before scaling the workload down to zero replicas.
What’s Next
- How Scaling Works — the full scaling mechanics, including scale-to-zero and cold starts.
- InterceptorRoute Reference — field details for
scalingMetric,concurrency, andrequestRate.