Time drift between nodes in a Kubernetes cluster can create a range of issues, from a difficulty to correlate application log message timestamps across nodes, to a loss of etcd quorum (given the time sensitive nature of the consensus algorithm used in etcd).
Using Rancher, you can monitor the state and processes of your cluster nodes, Kubernetes components, and software deployments through integration with Prometheus, a leading open-source monitoring solution.
This article details how to monitor time drift, via the Network Time Protocol (NTP), on Linux nodes within Rancher Kubernetes Engine (RKE) or Rancher v2.x provisioned clusters.
- A Rancher v2.x instance, starting at v2.2.0 and above
- A Rancher Kubernetes Engine (RKE) CLI or Rancher v2.x provisioned Kubernetes cluster with Cluster Monitoring enabled, with Monitoring Version 0.2.0+
- ntp configured on Linux nodes in the cluster (refer to the documentation for your Linux distribution on enabling and configuring ntp)
Enable the NTP collector on the Node Exporter DaemonSet
- Within the Rancher UI cluster view for the relevant cluster, navigate to Tools -> Monitoring
- In the bottom-right corner of the form, click
Show advanced options
- Configure the variable
Configure an alert for NTP time drift
- Within the Rancher UI cluster view for the relevant cluster, navigate to Tools -> Alerts
- On the
A set of alerts for nodeAlert Group click
Add Alert Rule
- Set Name to
Node NTP time drift equal to or greater than 1 second
- Configure a Notifier for the
A set of alerts for nodeAlert Group, by clicking the elipses for this Alert Group, and configuring the desired notifier in the
Alertsection at the bottom of the form.