Rancher v2.x provisioned vSphere cluster nodes stuck in provisioning, with "Waiting for SSH to be available", as a result of pre-existing cloud-init configuration in VM Template

Follow
Table of Contents

Issue

Upon launching a vSphere Node Driver cluster in Rancher v2.x, nodes within the cluster are stuck in provisioning, with the message Waiting for SSH to be available. Logging into the nodes via SSH and checking the auth log directly reveals failed SSH connection attempts for a missing docker user.

Pre-requisites

Root cause

When provisioning a vSphere Node Driver cluster Rancher v2.x uses cloud-init to generate an ssh-keypair for the user docker and copy this into the Virtual Machine on initial boot.

In some Linux distributions, including Ubuntu Server 18.04, the standard OS installation process generates a cloud-init configuration. Installation of the OS is performed during the intitial setup of the VM Templates, prior to cluster provisioning via Rancher, and this existing cloud-init configuration within the Template can intefere with Rancher's ability to insert its own cloud-init.

Resolution

Convert the Template back to a VM and run:

sudo cloud-init clean

This command will clean the Template of any existing cloud-inits, once complete you can convert the VM back to a template to try again.

Was this article helpful?
1 out of 1 found this helpful

Comments

2 comments
  • I'm experiencing this exact problem but the recommended solution isn't working for me. Is the node driver cloud-init meant to add the docker user as well as copy over the keys? I'm using a template built on Ubuntu 20.04 LTS

    1
    Comment actions Permalink
  • Same. This does not resolve the issue for me. 20.04 packer template built using the autoinstall method instead of preseed files.

    0
    Comment actions Permalink

Please sign in to leave a comment.