Upon attempting to upgrade to Rancher v2.2.4, where the Rancher instance manages an, OpenStack Cloud Provider enabled, Kubernetes cluster with a Loadbalancer config, the Rancher server fails to start. Logs for the Rancher pods show error messages of the format:
E0606 07:39:20.296926 8 reflector.go:134] github.com/rancher/norman/controller/generic_controller.go:175: Failed to list *v3.Cluster: json: cannot unmarshal number into Go value of type string
- Upgrading Rancher to v2.2.4
- A Rancher launched, OpenStack Cloud Provider enabled, Kubernetes cluster with a Loadbalancer config.
In order to resolve Rancher/14577, the
monitor-timeout parameters for OpenStack cluster loadbalancer healthchecks were set from an integer type to a string, in Rancher v2.2.4.
As the default in the Rancher API framework had configured these values to 0, upon upgrade to Rancher v2.2.4 an error occurs attempting to unmarshal these integer values of 0 to a string type. If these had been manually set to a non-zero integer value, resulting in kubelet failures in the OpenStack cluster itself previously, these will now result in failure of the Rancher pods themselves.
You can apply a one time fix, to workaround this issue, by manually editing the
monitor-timeout values of the
cluster Custom Resource of affected clusters, via
kubectl run against the Rancher management cluster.
Using your RKE generated kube config, perform the following operations:
Identify affected clusters by running
kubectl get clustersand checking for those with a
For affected clusters run
kubectl edit <cluster name>, where
<cluster name>is the
metadata.namevalue for the cluster and update the
spec.rancherKubernetesEngineConfig.cloudProvider.openstackCloudProvider.loadBalancer.monitor-timeoutfields to a quoted string. Example: if it was 30, change it to "30s", if it was 0, change it to "".