How to setup Nodelocal DNS cache on Rancher 2.x

Follow
Table of Contents

Why use Nodelocal DNS cache?

Like many applications in a containerised architecture, CoreDNS or kube-dns runs in a distributed fashion. In certain circumstances, DNS reliability and latency can be impacted with this approach. The causes of this relate notably to conntrack race conditions or exhaustion, cloud provider limits, and the unreliable nature of the UDP protocol.

A number of workarounds exist, however long term mitigation of these and other issues has resulted in a redesign of the Kubernetes DNS architecture, and the result being the Nodelocal DNS cache project.

The Nodelocal DNS cache is currently a beta feature, and should be used with caution. It is highly recommended to perform these install steps in a development environment, with adequate testing before installing in other environments.

Requirements

  • A Kubernetes cluster of v1.15 or greater created by Rancher v2.x or RKE
  • A Linux cluster, Windows is currently not supported
  • Access to the cluster

Installing

Installing involves two main steps, both steps should be non-invasive, pods that are currently running will not be modified. The DNS configuration will take effect for pods started after the install is complete.

Using a Rancher version after v2.4.x, or RKE version after v1.1.0

Update the cluster using 'Edit as YAML' in the Rancher UI. With RKE, edit the cluster.yaml file instead.

Note: Updating the cluster using the below will deploy the node-local-dns Deployment, and restart the kubelet container on each node.

As in the documentation, update or add the dns.nodelocal using the following as an example:

  dns:
  [..]
    nodelocal:
      ip_address: "169.254.20.10"

New pods created after the change will configure the node-local-dns link-local address as the nameserver in /etc/resolv.conf.

Using a Rancher version before v2.4.x, or RKE version before v1.1.0

Installing the YAML manifest by navigating to the cluster, and clicking the Launch kubectl button in the Rancher UI. This command can also be run from a terminal where a kubeconfig for the cluster is currently configured.

Environment variables are replaced before applying the manifest, one assumption is that the cluster service discovery domain name is cluster.local (default), adjust the command if needed.

curl -sL https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml \
  | sed -e 's/__PILLAR__DNS__DOMAIN__/cluster.local/g' \
  | sed -e "s/__PILLAR__DNS__SERVER__/$(kubectl get service --namespace kube-system kube-dns -o jsonpath='{.spec.clusterIP}')/g" \
  | sed -e 's/__PILLAR__LOCAL__DNS__/169.254.20.10/g' \
  | kubectl apply -f -

Ensure the node-local-dns pods start successfully, a pod should start on each control plane and worker node.

kubectl get -n kube-system pod -l k8s-app=node-local-dns

Option A - Configure the Kubelet

By default, the Kubelet will configure the /etc/resolv.conf of pods with the kube-dns Service ClusterIP as the nameserver. Configuring all new pods to query node-local-dns will require updating the Kubelet arguments.

Note: Updating the arguments using the below will restart the kubelet container on each node.

  • If the cluster was provisioned by Rancher, edit the cluster in the UI and click on Edit as YAML.
  • If the cluster was provisioned by RKE, edit the cluster.yml file directly.

Update the kubelet service with the cluster-dns argument and IP Address. Click save, or run an rke up to put this change into effect.

services:
  kubelet:
    extra_args:
      cluster-dns: "169.254.20.10"

New pods created after the change will configure the node-local-dns link-local address as the nameserver in /etc/resolv.conf.

Option B - Configure Workloads

Alternatively, node-local-dns can be configured on a per-workload basis by updating the workload with a dnsConfig.

  • If using the Rancher UI, edit the workload, navigate to Show advanced options > Networking > DNS Name Server IP Addresses and add 169.254.20.10.
  • If configuring by YAML, add the following to the pod spec:
    spec:
      dnsConfig:
        nameservers:
        - 169.254.20.10

Testing

Once implemented, start a new pod to test DNS queries.

kubectl run --restart=Never --rm -it --image=tutum/dnsutils dns-test -- dig google.com

You should expect to see 169.254.20.10 as the server, and a successful answer to the query.

To verify a pod or container is using node-local-dns by checking the /etc/resolv.conf file, for example:

kubectl exec -it <pod name> -- grep nameserver /etc/resolv.conf
nameserver 169.254.20.10

Removing Nodelocal DNS cache

To remove from a cluster, the reverse steps are needed. Note: pods created with the node-local-dns nameserver in /etc/resolv.conf will need to be restarted after removing to use the kube-dns service as a nameserver again.

Using a Rancher version after v2.4.x, or RKE version after v1.1.0

Remove the dns.nodelocal configuration from the cluster YAML

Using a Rancher version before v2.4.x, or RKE version before v1.1.0

  1. Remove the Kubelet configuration (Option A), or remove the dnsConfig from workloads (Option B).

  2. If Option A was taken, delete any pods in workloads that were started since the Kubelet configuration change so that they are started with the kube-dns ClusterIP again.

  3. Remove the node-local-dns objects with the following command:

    curl -sL https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml | kubectl delete -f -

Note: it is important to perform these steps in order, and only complete step 3 once the pods using node-local-dns have been started with the kube-dns ClusterIP configured in /etc/resolv.conf again.

Troubleshooting

In no specific order, the following can help understand a DNS issue further.

Check all kube-dns and node-local-dns objects

Ensure there are no obvious issues with scheduling CoreDNS and node-local-dns pods in the cluster.

kubectl get all -n kube-system -l k8s-app=node-local-dns
kubectl get all -n kube-system -l k8s-app=kube-dns

All node-local-dns and kube-dns pods should be ready and running, the kube-dns Service should exist. Check the events if needed to locate any warning or failed event messages.

kubectl describe ds -n kube-system -l k8s-app=node-local-dns
kubectl describe rs -n kube-system -l k8s-app=kube-dns

Check the logs and ConfigMap of kube-dns and node-local-dns pods

kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=node-local-dns
kubectl get configmap -n kube-system coredns -o yaml
kubectl get configmap -n kube-system node-local-dns -o yaml

Enable logging and perform a DNS test

Note, query logging can increase the log output from CoreDNS, enabling this temporarily while investigating is suggested.

  • Run a DaemonSet to perform queries from a pod running on each node in the cluster

Ask questions to further eliminate the issue

  • Is it only DNS that is affected, or is all connectivity affected?
  • Are internal, external or all DNS queries failing? * Does the issue occur when resolving outside of the cluster? * node-local-dns will perform external lookups on behalf of pods * kube-dns will be used to resolve internal lookups: * node-local-dns will cache successful queries (30s), and negative queries (5s) by default
  • Are all nodes and workloads experiencing the issue, or a specific node or workload? * Nodes use the upstream DNS configured in /etc/resolv.conf, queries failing from a node could indicate the issue is with upstream DNS
  • What is the error reported by applications? * If logs are aggregated, queries can be performed on the logs to identify timelines and impact
  • Is the issue intermittent or constantly occuring? * If the issue is intermittent, configure monitoring or a loop to identify when the issue occurs, when it does - are internal, external or all queries affected?
Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.