Warning
You are currently viewing v"2.15" of the documentation and it is not the latest. For the most recent documentation, kindly click here.
Troubleshooting Click here for latest
How to address commonly encountered KEDA issues
Warning
Kubernetes Control plane is unable to communicate to Metric server?
If while setting up KEDA, you get an error: (v1beta1.external.metrics.k8s.io) status FailedDiscoveryCheck
with a message: failing or missing response from https://POD-IP:6443/apis/external.metrics.k8s.io/v1beta1: Get "https://POD-IP:6443/apis/external.metrics.k8s.io/v1beta1": Address is not allowed
.
One of the reason for this can be due to CNI like Cilium or any other.
Find the api service name for the service keda/keda-metrics-apiserver
:
kubectl get apiservice --all-namespaces
Check for the status of the api service found in previous step:
kubectl get apiservice <apiservicename> -o yaml
Example:
kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml
If the status is False
, then there seems to be an issue and network might be the primary reason for it.
In managed Kubernetes services you might solve the issue by updating deployment file of metric-apiserver as below.
dnsPolicy: ClusterFirst
hostNetwork: true
Eg: Modify useHostNetwork in values file.
Why is Kubernetes unable to get metrics from KEDA?
If while setting up KEDA, you get an error: (v1beta1.external.metrics.k8s.io) status FailedDiscoveryCheck
with a message: no response from https://ip:443: Get https://ip:443: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
.
One of the reason for this can be that you are behind a proxy network.
Find the api service name for the service keda/keda-metrics-apiserver
:
kubectl get apiservice --all-namespaces
Check for the status of the api service found in previous step:
kubectl get apiservice <apiservicename> -o yaml
Example:
kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml
If the status is False
, then there seems to be an issue and proxy network might be the primary reason for it.
Find the cluster IP for the keda-metrics-apiserver
and keda-operator-metrics
:
kubectl get services --all-namespaces
In the /etc/kubernetes/manifests/kube-apiserver.yaml
- add the cluster IPs found in the previous step in no_proxy variable.
Reload systemd manager configuration:
sudo systemctl daemon-reload
Restart kubelet:
sudo systemctl restart kubelet
Check the API service status and the pods now. Should work!
In managed Kubernetes services you might solve the issue by updating firewall rules in your cluster.
E.g. in GKE private cluster add port 6443 (kube-apiserver) to allowed ports in master node firewall rules.
Also, if you are using Network Policies in your kube-system
namespace, make sure they don’t block access for the konnectivity agent via port 6443. You can read more about konnectivity service.
In that case, you need to add a similar NetworkPolicy in the kube-system
namespace:
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-from-konnectivity-agent-to-keda
namespace: kube-system
spec:
egress:
- ports:
- port: 6443
protocol: TCP
to:
- ipBlock:
cidr: ${KUBE_POD_IP_CIDR}
podSelector:
matchLabels:
k8s-app: konnectivity-agent
policyTypes:
- Egress
E.g. Make sure the Cluster Security group can reach the Nodegroups on TCP 6443. For example, using the terraform eks module, this is achievable through the addtional nodegroup rules
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "19.5.1"
...
create_node_security_group = true
node_security_group_additional_rules = {
keda_metrics_server_access = {
description = "Cluster access to keda metrics"
protocol = "tcp"
from_port = 6443
to_port = 6443
type = "ingress"
source_cluster_security_group = true
}
}
As of version 19.6.0
of the terraform-aws-modules/eks/aws
module it is enough to have node_security_group_enable_recommended_rules
option enabled(default) to get neccessary security group ingress rule.
Why is my ScaledObject
paused?
When KEDA has upstream errors to get scaler source information it will keep the current instance count of the workload unless the fallback
section is defined.
This behavior might feel like the autoscaling is not happening, but in reality, it is because of problems related to the scaler source.
You can check if this is your case by reviewing the logs from the KEDA pods where you should see errors in both our Operator and Metrics server. You can also check a status of the ScaledObject (READY
and ACTIVE
condition) by running following command:
$ kubectl get scaledobject MY-SCALED-OBJECT
Troubleshoot KEDA errors using profiling
In Golang we have the possibility to profile specific actions in order to determine what causes an issue.
For example, if our keda-operator
pod is keeps getting OOM after a specific time, using profilig we can profile the heap and see what operatios taking all of this space.
Golang support many profiling options like heap, cpu, goroutines and more… (for more info check this site https://pkg.go.dev/net/http/pprof).
In KEDA we provide the option to enable profiling on each component separately by enabling it using the Helm chart and providing a port (if not enabled then it won’t work).
profiling:
operator:
enabled: false
port: 8082
metricsServer:
enabled: false
port: 8083
webhooks:
enabled: false
port: 8084
If not using the Helm chart then you can enable the profiling on each on of components by specifying the following argument in the respective container
--profiling-bind-address=":8082"
and it will be exposed on the port you specified.
After enabling it you can port-forward or expose the service and use tool like go tool pprof in order to get profiling data.
For more info look at this document https://go.dev/blog/pprof.
Why does Google Kubernetes Engine (GKE) 1.16 fail to fetch external metrics?
If you are running Google Kubernetes Engine (GKE) version 1.16, and are receiving the following error:
unable to fetch metrics from external metrics API: <METRIC>.external.metrics.k8s.io is forbidden: User "system:vpa-recommender" cannot list resource "<METRIC>" in API group "external.metrics.k8s.io" in the namespace "<NAMESPACE>": RBAC: clusterrole.rbac.authorization.k8s.io "external-metrics-reader" not found
You are almost certainly running into a known issue.
The workaround is to recreate the external-metrics-reader
role using the following YAML:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: external-metrics-reader
rules:
- apiGroups:
- "external.metrics.k8s.io"
resources:
- "*"
verbs:
- list
- get
- watch
The GKE team is currently working on a fix that they expect to have out in version >= 1.16.13.
Why does KEDA operator error with NoCredentialProviders
If you are running KEDA on AWS using IRSA or KIAM for pod identity and seeing the following error messages:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal KEDAScalersStarted 31s keda-operator Started scalers watch
Normal KEDAScaleTargetDeactivated 31s keda-operator Deactivated apps/v1.Deployment default/my-event-based-deployment from 1 to 0
Normal ScaledObjectReady 13s (x2 over 31s) keda-operator ScaledObject is ready for scaling
Warning KEDAScalerFailed 1s (x2 over 31s) keda-operator NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
And the operator logs:
2021-11-02T23:50:29.688Z ERROR controller Reconciler error {"reconcilerGroup": "keda.sh", "reconcilerKind": "ScaledObject", "controller": "scaledobject", "name": "my-event-based-deployment-scaledobject", "namespace": "default"
, "error": "error getting scaler for trigger #0: error parsing SQS queue metadata: awsAccessKeyID not found"}
This means hat the KEDA operator is not receiving valid credentials, even before attempting to assume the IAM role associated with the scaleTargetRef
.
Some things to check:
keda-operator
deployment has the iam.amazonaws.com/role
annotation under deployment.spec.template.metadata
not deployment.metadata
- if using KIAMkeda-operator
serviceAccount is annotated eks.amazonaws.com/role-arn
- if using IRSAkiam-server
logs, successful provisioning of credentials looks like:
kube-system kiam-server-6bb67587bd-2f47p kiam-server {"level":"info","msg":"found role","pod.iam.role":"arn:aws:iam::1234567890:role/my-service-role","pod.ip":"100.64.7.52","time":"2021-11-05T03:13:34Z"}
.
grep
to filter the kiam-server
logs, searching for the keda-operator
pod ip.Why is Helm not able to upgrade to v2.2.1 or above?
Our initial approach to manage CRDs through Helm was not ideal given it didn’t update existing CRDs.
This is a known limitation of Helm:
There is no support at this time for upgrading or deleting CRDs using Helm. This was an explicit decision after much community discussion due to the danger for unintentional data loss. Furthermore, there is currently no community consensus around how to handle CRDs and their lifecycle. As this evolves, Helm will add support for those use cases.
As of v2.2.1 of our Helm chart, we have changed our approach so that we automatically managing the CRDs through our Helm chart.
Due to this transition, it can cause upgrade failures if you started using KEDA before v2.2.1 and will cause errors during upgrades such as the following:
Error: UPGRADE FAILED: rendered manifests contain a resource that already exists. Unable to continue with update: CustomResourceDefinition “scaledobjects.keda.sh” in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key “app.kubernetes.io/managed-by”: must be set to “Helm”; annotation validation error: missing key “meta.helm.sh/release-name”: must be set to “keda”; annotation validation error: missing key “meta.helm.sh/release-namespace”: must be set to “keda”
In order to fix this, you will need to manually add the following attributes to our CRDs:
app.kubernetes.io/managed-by: Helm
labelmeta.helm.sh/release-name: keda
annotationmeta.helm.sh/release-namespace: keda
annotationFuture upgrades should work seamlessly.
Why is KEDA API metrics server failing when Istio is installed?
While setting up KEDA, you get an error: (v1beta1.external.metrics.k8s.io) status FailedDiscoveryCheck
and you have Istio installed as service mesh in your cluster.
This can lead to side effects like not being able to delete namespaces in your cluster. You will see an error like:
NamespaceDeletionDiscoveryFailure - Discovery failed for some groups, 1 failing: unable to retrieve the complete list of server APIs: external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request
In the following we assume that KEDA is installed in the namespace keda
.
Find the api service name for the service keda/keda-metrics-apiserver
:
kubectl get apiservice --all-namespaces
Check for the status of the api service found in previous step:
kubectl get apiservice <apiservicename> -o yaml
Example:
kubectl get apiservice v1beta1.external.metrics.k8s.io -o yaml
If the status is False
, then there seems to be an issue with the KEDA installation.
Check if Istio is installed in your cluster:
kubectl get svc -n istio-system
If Istio is installed you get a result like:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
istio-ingressgateway LoadBalancer 100.65.18.245 34.159.50.243 15021:31585/TCP,80:31669/TCP,443:30464/TCP 3d
istiod ClusterIP 100.65.146.141 <none> 15010/TCP,15012/TCP,443/TCP,15014/TCP 3d
Check the KEDA namespace labels:
kubectl describe ns keda
If Istio injection is enabled there is no label stating istio-injection=disabled
.
In this setup the sidecar injection of Istio prevents the api service of KEDA to work properly.
To prevent the side-car injection of Istio we must label the namespace accordingly. This can be achieved via setting the label istio-injection=disabled
to the namespace:
kubectl label namespace keda istio-injection=disabled
Check that the label is set via kubectl describe ns keda
Install KEDA into the namespace keda
and re-check the status of the api service which should now have the status True
.