Issue
Upon launching a vSphere Node Driver cluster in Rancher v2.x, nodes within the cluster are stuck in provisioning, with the message Waiting for SSH to be available
. Logging into the nodes via SSH and checking the auth log directly reveals failed SSH connection attempts for a missing docker
user.
Pre-requisites
- A Rancher v2.x provisioned vSphere cluster, using the vSphere Node Driver.
Root cause
When provisioning a vSphere Node Driver cluster Rancher v2.x uses cloud-init to generate an ssh-keypair for the user docker
and copy this into the Virtual Machine on initial boot.
In some Linux distributions, including Ubuntu Server 18.04, the standard OS installation process generates a cloud-init configuration. Installation of the OS is performed during the intitial setup of the VM Templates, prior to cluster provisioning via Rancher, and this existing cloud-init configuration within the Template can intefere with Rancher's ability to insert its own cloud-init.
Resolution
Convert the Template back to a VM and run:
sudo cloud-init clean
This command will clean the Template of any existing cloud-inits, once complete you can convert the VM back to a template to try again.
Comments
I'm experiencing this exact problem but the recommended solution isn't working for me. Is the node driver cloud-init meant to add the docker user as well as copy over the keys? I'm using a template built on Ubuntu 20.04 LTS
Same. This does not resolve the issue for me. 20.04 packer template built using the autoinstall method instead of preseed files.
Please sign in to leave a comment.