Attempting to provision a Kubernetes cluster with the vSphere node-driver, in a Rancher v2.x environment, prior to v2.3.3, using a HTTP proxy configuration results in an error of the following format:
Error creating machine: Error in driver during machine creation: Put https://172.16.2.13:443/guestFile?id=1600&token=528090dd-cf9d-3973-b08b-d1782fd80bd21600: Unable to connect
In addition, the Rancher logs show an error message of the following format:
... 2019/12/06 10:23:51 [INFO] [node-controller-docker-machine] (vsphere-all1) Waiting for VMware Tools to come online... 2019/12/06 10:25:49 [INFO] [node-controller-docker-machine] (vsphere-all1) Provisioning certs and ssh keys... 2019/12/06 10:27:35 http: TLS handshake error from 127.0.0.1:41746: EOF 2019/12/06 10:28:03 [INFO] [node-controller-docker-machine] The default lines below are for a sh/bash shell, you can specify the shell you're using, with the --shell flag. 2019/12/06 10:28:03 [INFO] [node-controller-docker-machine] 2019/12/06 10:28:04 [INFO] Generating and uploading node config vsphere-all1 2019/12/06 10:28:04 [ERROR] NodeController c-f6xbs/m-fsl6t [node-controller] failed with : Error creating machine: Error in driver during machine creation: Put https://172.16.2.13:443/guestFile?id=1600&token=528090dd-cf9d-3973-b08b-d1782fd80bd21600: Unable to connect ...
- A Rancher v2.x instance, prior to Rancher v2.3.3.
- A HTTP Proxy configured on Rancher, per the documentation for a single node or High Availability (HA) install of Rancher, in which the vSphere datacenter ESXi hosts are not reachable via the proxy.
- A Rancher provisioned Kubernetes cluster, using the the vSphere node-driver.
- The IP Range containing the ESXi hosts within the vSphere datacenter configured in CIDR notation within the Rancher NO_PROXY configuration.
This issue was caused by the Go version used to build the docker-machine driver that provides the Rancher node driver capabilities, including the vSphere node driver.
Support for NO_PROXY entries in CIDR notation was introduced in Go v1.10.x; however, the docker-machine version in Rancher v2.x, prior to v2.3.3, was built using an earlier version of Go.
As a result, NO_PROXY entries in CIDR notation did not take effect during cluster provisioning via node drivers, even though these same NO_PROXY entries were observed by the Rancher server itself, built with a later version of Go.
To workaround this issue in Rancher v2.x versions before v2.3.3, you should ensure that the vSphere server address, and all ESXi hosts within the vSphere datacenter in which you are provisioning the cluster, are listed as individual IPs within the Rancher NO_PROXY configuration.
This issue was tracked in Rancher GitHub issue #21674 and a fix, bumping the Go version of the docker-machine driver to v1.12.9, was released in Rancher v2.3.3. Users can therefore upgrade to Rancher v2.3.3, or above, to resolve this issue.