kube-apiserver "socket: too many open files" error messages

Follow
Table of Contents

Issue

During normal operation of a Kubernetes cluster, you may experience intermittent stability issues and the kube-apiserver logs may contain messages of the following format:

  • clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://x.x.x.x:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp x.x.x.x:2379: socket: too many open files". Reconnecting...
  • clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://x.x.x.x:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context canceled". Reconnecting...
  • clientconn.go:1208] grpc: addrConn.createTransport failed to connect to {https://x.x.x.x:2379 <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: context deadline exceeded". Reconnecting...

Root Cause

These symptoms can be caused by the kube-apiserver being blocked by configuration that limits the number of files a process can have open. This limit could also affect other components and OS services.

This is typically a result of restrictive ulimits, or a high number of open connections.

Below is a non-exhaustive list of places where the number of open files ulimit can be set for a Docker container.

System ulimits (/etc/security/limits.conf):

This file defines the persisted configuration for the system-wide ulimits, such as file size limits, and how much memory can be used by the different components of the process, including the stack, data and text segments.

The limit of interest is the nofile limit, which defines the number of files a process can have open at any given time. This can be set per user, or for all users(*) and there are two limits to define:

  • Soft limit - These limits are ones that the user can move up or down within the range permitted by any pre-existing hard limits. A user can modify the soft limit by running the command ulimit -n X where X is the desired new value.
  • Hard limit - These limits are set by the superuser and enforced by the Kernel. Users cannot exceed this.

The nofile hard limit for the current user can be seen by running ulimit -Hn and the soft limit can be seen by running ulimit -Sn.

More info on limits.conf can be found here.

Systemd configuration

By design, systemd will ignore ulimits set via /etc/security/limits.conf, and instead apply its own limits. These can be configured per-service or system-wide.
The system-wide systemd nofile limit is defined in /etc/systemd/system.conf as DefaultLimitNOFILE=X:Y. Where X is the soft limit and Y is the hard limit.

It is possible to set nofile for a specific service, either by defining LimitNOFILE within the service file itself or creating an override file. For example, defining it directly within the docker systemd service file (/lib/systemd/system/docker.service):

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
BindsTo=containerd.service
After=network-online.target firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket

[Service]
Type=notify
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always

LimitNOFILE=infinity

Or creating a systemd override file (/etc/systemd/system/docker.d/override.conf):

[Service]
LimitNOFILE=infinity

Note: The docker.d directory name may be slightly different between Linux distributions. It is usually recommended to create an override, as this will persist through system updates.

Note: On older versions of systemd, LimitNOFILE=infinity results in a limit of 65535. This is fixed as part of this commit which was merged in systemd v234. More info is available here.

Docker daemon configuration

It is possible to configure Docker to enforce its own open file limits on specific containers through the command line flags --default-ulimit nofile=X:Y.

This can be applied to all containers by specifying the limit within the /etc/docker/daemon.json configuration file:

{
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 64000,
      "Soft": 64000
}

Resolution

If you have any non-default configuration that is applying nofile restrictions on either docker, or containers, revert these to the default configuration, or increase the limits and re-test.

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.