Rancher v2.5 provisioned Kubernetes clusters, without a worker role node, display "Cluster health check failed: cluster agent is not ready" error

Follow
Table of Contents

Issue

When provisioning or updating a Rancher-provisioned Kubernetes cluster in Rancher v2.5.x, such that the cluster does not have a node with the worker role, the cluster will enter an Error status, displaying the message Cluster health check failed: cluster agent is not ready. By comparison, in Rancher v2.4.x, the cluster status would show Active in this scenario.

Pre-requisites

Root cause

Rancher v2.5.x implements an additional cluster health check to ensure that the Pod for the cluster-agent Deployment in the cattle-system namespace of the downstream cluster is ready and successfully connected to the Rancher server. The cluster-agent Pod will use cluster DNS to resolve the Rancher server hostname. As a result, in the instance that there is no node with the worker role, CoreDNS Pods will be unable to schedule and the cluster-agent will thus be unable to resolve the Rancher hostname, causing this check to fail.

Resolution

Provision a node in the cluster with the worker role, to ensure that CoreDNS Pods can be successfully scheduled.

Was this article helpful?
1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.