Rancher v2.5 provisioned Kubernetes clusters, without a worker role node, display "Cluster health check failed: cluster agent is not ready" error

Table of Contents


When provisioning or updating a Rancher-provisioned Kubernetes cluster in Rancher v2.5.x, such that the cluster does not have a node with the worker role, the cluster will enter an Error status, displaying the message Cluster health check failed: cluster agent is not ready. By comparison, in Rancher v2.4.x, the cluster status would show Active in this scenario.


Root cause

Rancher v2.5.x implements an additional cluster health check to ensure that the Pod for the cluster-agent Deployment in the cattle-system namespace of the downstream cluster is ready and successfully connected to the Rancher server. The cluster-agent Pod will use cluster DNS to resolve the Rancher server hostname. As a result, in the instance that there is no node with the worker role, CoreDNS Pods will be unable to schedule and the cluster-agent will thus be unable to resolve the Rancher hostname, causing this check to fail.


Provision a node in the cluster with the worker role, to ensure that CoreDNS Pods can be successfully scheduled.

