Missing cert when using additionalTrustedCAs

Follow
Table of Contents

Issue

When the option additionalTrustedCAs=true is set on the Rancher Helm chart installation, the Rancher Deployment can get stuck updating if the secret is missing.

Note publicly signed certificates from a root certificate authority (CA) like Comodo, Digicert, GeoTrust, etc.; do not need this setting. You only need this setting if you want Rancher to trust a private certificate authority. For example, if you're using Windows Enterprise Root Authority (CA) server. This setting is also limited to only the Rancher server pods; other Rancher deployed services like Prometheus are unaffected by this setting.

Example error message

Warning  FailedMount  8m19s                kubelet            Unable to attach or mount volumes: unmounted volumes=[tls-ca-additional-volume], unattached volumes=[rancher-token-zxwjx tls-ca-additional-volume audit-log]: timed out waiting for the condition
Warning  FailedMount  6m4s                 kubelet            Unable to attach or mount volumes: unmounted volumes=[tls-ca-additional-volume], unattached volumes=[audit-log rancher-token-zxwjx tls-ca-additional-volume]: timed out waiting for the condition
Warning  FailedMount  96s (x2 over 3m50s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[tls-ca-additional-volume], unattached volumes=[tls-ca-additional-volume audit-log rancher-token-zxwjx]: timed out waiting for the condition
Warning  FailedMount  6s (x13 over 10m)    kubelet            MountVolume.SetUp failed for volume "tls-ca-additional-volume" : secret "tls-ca-additional" not found

Pre-requisites

  • Running Rancher v2.x in HA
  • kubectl access to the Rancher local cluster
  • The additional root certificate authorities should be in PEM format.

Resolution

1) Verify secret is missing.

```bash
kubectl -n cattle-system get secret tls-ca-additional
```

Example output:

```bash
Error from server (NotFound): secrets "tls-ca-additional" not found
```

2) If the secret exists, you should back up it using the following command then delete it.

```bash
kubectl -n cattle-system get secret tls-ca-additional -o yaml > tls-ca-additional-bak.yaml
kubectl -n cattle-system delete secret tls-ca-additional
```

3) Verify ca-additional.pem is formatted correctly. It should look like the following.

Example:

```bash
-----BEGIN CERTIFICATE-----
...
...
(Your Root certificate in PEM format)
...
...
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
...
...
(Additional Root certificate(s) in PEM format)
...
...
-----END CERTIFICATE-----
```

4) Deploy / Re-Deploy secret.

```bash
kubectl -n cattle-system create secret generic tls-ca-additional --from-file=ca-additional.pem
```

5) Delete failing Rancher pods. NOTE: The Rancher pods will self resolve, but it might take some time (5~15mins).

```bash
for pod in `kubectl -n cattle-system get pod --no-headers -l app=rancher -o name | grep -v Running`
do
  kubectl -n cattle-system delete $pod
done
```

6) Verify new Rancher pods come online successfully.

```bash
kubectl -n cattle-system rollout status deploy/Rancher
kubectl -n cattle-system get pod -l app=rancher
```

Further reading

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.