CronJobs fail to run in a Kubernetes v1.14 cluster, with more than 500 Job resources: "expected type *batchv1.JobList, got type *internalversion.List"

Follow
Table of Contents

Issue

In a Kubernetes v1.14 cluster, CronJobs fail to run when there are more than 500 Job resources in the cluster. The Kubernetes controller manager logs show errors of the format {"log":"E0818 18:25:50.081946 1 cronjob_controller.go:117] expected type *batchv1.JobList, got type *internalversion.List\n","stream":"stderr","time":"2019-08-18T18:25:50.082127727Z"}.

Pre-requisites

  • A Kubernetes cluster, running Kubernetes v1.14, from v1.14.0 - v1.14.6
  • More than 500 Job resources in the cluster

Workaround

To mitigate the issue you should ensure that there are fewer than 500 Job resources in the cluster, you can view all Jobs with kubectl get jobs --all-namespaces -o wide.

You should aim to delete completed Jobs to reduce the total number below 500. You can also check and adjust the configured job history limits for CronJobs to reduce the number of Jobs maintained for completed CronJobs per the Kubernetes documentation.

Resolution

The issue was tracked in Kubernetes GitHub Issue #77465 and a fix was released in Kubernetes v1.14.7.

A Kubernetes v1.14 patch release of v1.14.7 or above, including this fix, is available in Rancher v2.2, starting with v2.2.9 (v1.14.8), and v2.3, starting with v2.3.0 (v1.14.7). Similarly the fix is available via the RKE CLI in v0.2, starting with v0.2.9 (v1.14.8), and v0.3, starting with v0.3.0 (v1.14.7).

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.