Overlay connectivity broken after a node reboot until flannel is restarted

Follow
Table of Contents

Issue

After rebooting a Kubernetes node, you may notice that pod to pod network connectivity(via the overlay network) does not function correctly until you restart the canal workload on that node in Kubernetes.

Pre-requisites

  • Kubernetes cluster running canal or flannel as the CNI
  • Linux nodes running Systemd v242 or higher

Root Cause

This is caused by a race condition between flannel and systemd-networkd that is being tracked in this upstream issue.
This doesn't appear to affect Ubuntu 20.04, due to it's use of netplan to manage networking configuration.

Workaround

Either restart canal on the node (kubectl delete pod -n kube-system canal-XXXX) as needed or change the MACAddressPolicy for the flannel interfaces on your nodes to none:

cat<<'EOF'>/etc/systemd/network/10-flannel.link
[Match]
OriginalName=flannel*

[Link]
MACAddressPolicy=none
EOF

Resolution

At present there is no resolution and this bug is still open upstream.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.