Dear Rancher Customer,
Recently our team has been made aware of PLEG health issues caused by a recent containerd.io package version.
This can manifest in a few ways, we usually see all Docker commands failing, but in recent cases, the Docker commands with a specific security flag fail and hang. We have isolated this to containerd.io package version 1.4.4-x, and has been logged in this GitHub issue.
How do I know if I am impacted?
Customers running RKE could be impacted by this. This is impacting customers running Docker 19.03 and 20.10 where containerd.io is using 1.4.4 -x. Nodes running containerd.io 1.4.4 may experience containers hanging on initialization after a certain number of containers with
no-new-privileges are started.
Often this has come as a result of upgrading Docker with Rancher 2.5.6. The symptoms include PLEG timeout errors in the Rancher UI, CoreDNS pods failing to start, and
docker inspect commands to hang on certain containers.
As the issue relates to the specific runc version (1.0.0-rc93) bundled with containerd.io, the following can be a basic test to identify if the node is running the affected runc build:
runc --version | grep -q 1.0.0-rc93 && echo "AFFECTED" || echo "NOT AFFECTED"
Is there a workaround?
Yes, currently our team recommends that you take the following step:
Downgrade or install the containerd.io package to a 1.4.3-x version. There is no need to modify privileges on CoreDNS pods, once downgraded to 1.4.3 you should pin that version to not auto-update. Please ensure your team is aware of CVE-2021-21334 in 1.4.3-x.
As examples of downgrading the containerd.io package on affected nodes:
apt install containerd.io=1.4.3-1
yum downgrade containerd.io-1.4.3-3.1.el7
As needed, drain and cordon the node, followed by restarting the Docker daemon.
For the most accurate steps, we recommend you consult the documentation for your OS on downgrading and version pinning for the specific package manager and Linux distribution.
For customers who have not upgraded their Rancher clusters to 2.5.6+, we recommend that you hold off on upgrading until this is resolved upstream. If you need to upgrade to Rancher 2.5.6+, you should be safe to upgrade to Rancher when using the above process to install and pin the containerd.io package to a 1.4.3-x version.
In the meantime, if you have any questions, please reach out to your Customer Success Manager or Rancher Support via a Support Ticket.
Simply submit a request via this portal referencing this article and we will track and respond to your question as a Support Ticket.
Rancher Support Team