How to migrate etcd data directory to a dedicated filesystem?

Follow
Table of Contents

Task

When running large Rancher installations or large clusters, it may be necessary to reduce IO contention on the disks for etcd. By default, etcd data is stored in folder /var/lib/etcd, which is most likely stored on the root file system. To avoid sharing the disk IOPS with other system components, it might be a good idea to migrate the etcd data directory to a dedicated file system to improve performance.

Pre-requisites

  • RKE cluster
  • Root access to all etcd nodes.
  • A new file system with at least 2GB free, but we recommend 8GB or higher. Please work with your systems team to create and mount the file system.
  • Etcd backups should be configured and verified.
  • Schedule at least an hour of downtime during your change management maintenance window.
  • It is highly recommended to pause/halt any new deployments and CI/CD jobs during this change window.

Resolution

Before making any changes, please take an etcd snapshot using one of the following:

For new clusters

For a new cluster, please see our installation documentation

NOTE: Please make sure you have a file system mounted to "/var/lib/etcd/" before creating the cluster.

For existing clusters

Option A - In-place migration

  • SSH into the first etcd node and become root.
  • Stop etcd container
    docker update --restart=no etcd && docker stop etcd
  • Verify etcd is stopped, and there are no open files.
    lsof | grep '/var/lib/etcd/'
  • Move etcd data to a temporary location
    mv /var/lib/etcd /var/lib/etcd_tmp
  • Create a new file system and mount it to "/var/lib/etcd." Please work with your systems team for this step.
  • Verify new file systems
    df -H /var/lib/etcd
  • Move etcd data from temporary location to new file system
    rsync -av --progress /var/lib/etcd_tmp/ /var/lib/etcd/
  • Restart etcd
    docker update --restart=yes etcd && docker start etcd
  • Verify etcd health
    docker exec -it etcd member list
  • Repeat the process until all etcd nodes have been updated.
  • Once all nodes have been updated, please cleanup the temporary data.
    rm -rf /var/lib/etcd_tmp/

Option B - Rolling replacement

  • Create a new node with the dedicated file system mount at "/var/lib/etcd/."
  • Join the new nodes to the existing cluster.
  • Waiting for cluster upgrade to finish.
  • Verify etcd health
    docker exec -it etcd member list
  • Remove old nodes from the cluster using documentation
  • Repeat the process until all etcd nodes have been replaced.

Further reading

For additional disk tuning, please see etcd

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.