Upgrade to Rancher v2.2.4 fails for instances managing OpenStack CloudProvider enabled clusters with a Loadbalancer config: 'cannot unmarshal number into Go value of type string'

Follow
Table of Contents

Issue

Upon attempting to upgrade to Rancher v2.2.4, where the Rancher instance manages an, OpenStack Cloud Provider enabled, Kubernetes cluster with a Loadbalancer config, the Rancher server fails to start. Logs for the Rancher pods show error messages of the format:

E0606 07:39:20.296926       8 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v3.Cluster: json: cannot unmarshal number into Go value of type string

Pre-requisites

  • Upgrading Rancher to v2.2.4
  • A Rancher launched, OpenStack Cloud Provider enabled, Kubernetes cluster with a Loadbalancer config.

Root cause

In order to resolve Rancher/14577, the monitor-delay and monitor-timeout parameters for OpenStack cluster loadbalancer healthchecks were set from an integer type to a string, in Rancher v2.2.4.

As the default in the Rancher API framework had configured these values to 0, upon upgrade to Rancher v2.2.4 an error occurs attempting to unmarshal these integer values of 0 to a string type. If these had been manually set to a non-zero integer value, resulting in kubelet failures in the OpenStack cluster itself previously, these will now result in failure of the Rancher pods themselves.

Resolution

You can apply a one time fix, to workaround this issue, by manually editing the monitor-delay and monitor-timeout values of the cluster Custom Resource of affected clusters, via kubectl run against the Rancher management cluster.

Using your RKE generated kube config, perform the following operations:

  1. Identify affected clusters by running kubectl get clusters and checking for those with a spec.rancherKubernetesEngineConfig.cloudProvider.openstackCloudProvider.loadBalancer definition.

  2. For affected clusters run kubectl edit <cluster name>, where <cluster name> is the metadata.name value for the cluster and update the spec.rancherKubernetesEngineConfig.cloudProvider.openstackCloudProvider.loadBalancer.monitor-delay and spec.rancherKubernetesEngineConfig.cloudProvider.openstackCloudProvider.loadBalancer.monitor-timeout fields to a quoted string. Example: if it was 30, change it to "30s", if it was 0, change it to "".

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.