mantainance task pods - juancamilocc/virtual_resources GitHub Wiki

Cleanup Pods Maintenance Task

In this guide, you will learn how to implement a CronJob to clean up evicted, succeeded, and failed pods in a Kubernetes environment. This task can be included in your daily checks and contributes to automated cluster health verification processes. All files can be found in this repository.

Bash executable

The following script will be used in the CronJob. You can adjust the number of minutes used as a filter — I recommend keeping this value low.

#!/bin/bash
now=$(date +%s)

# Search for failed, evicted, or succeeded pods older than 5 minutes
kubectl get pods -A -o json | jq -r --argjson now "$now" '
  .items[]
  | select(
      .status.phase == "Failed" or
      .status.reason == "Evicted" or
      (
        .status.phase == "Succeeded" and
        (($now - ((.metadata.creationTimestamp | fromdateiso8601) | tonumber)) > 300)
      )
    )
  | "\(.metadata.namespace) \(.metadata.name)"
' | while read -r namespace name; do
    kubectl -n "$namespace" delete pod "$name"
  done

RBAC Permissions

Define the RBAC permissions required for the CronJob to list and delete pod resources.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: cleanup-pods-sa
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: cleanup-pods-role  
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["list", "get", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: cleanup-pods-rolebinding
subjects:
- kind: ServiceAccount
  name: cleanup-pods-sa
  namespace: default 
roleRef:
  kind: ClusterRole
  name: cleanup-pods-role
  apiGroup: rbac.authorization.k8s.io

Apply the configuration.

kubectl apply -f rbac.yaml

Cronjob Manifest

apiVersion: batch/v1
kind: CronJob
metadata:
  name: cleanup-pods
  namespace: default
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 0
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccount: cleanup-pods-sa
          containers:
          - command:
            - /bin/bash
            - -c
            - |
              now=$(date +%s)
              kubectl get pods -A -o json | jq -r --argjson now "$now" '
                .items[]
                | select(
                    .status.phase == "Failed" or
                    .status.reason == "Evicted" or
                    (
                      .status.phase == "Succeeded" and
                      (($now - ((.metadata.creationTimestamp | fromdateiso8601) | tonumber)) > 300)
                    )
                  )
                | "\(.metadata.namespace) \(.metadata.name)"
              ' | while read -r namespace name; do
                  kubectl -n "$namespace" delete pod "$name"
                done
              sleep 60s 
            image: bitnami/kubectl:latest
            imagePullPolicy: IfNotPresent
            name: kubectl
            resources: {}
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          schedulerName: default-scheduler
          securityContext: {}
          terminationGracePeriodSeconds: 30
  schedule: 30 7 * * *
  successfulJobsHistoryLimit: 0
  suspend: false

NOTE: Adjust the schedule field according to your needs. The default configuration runs every day at 7:30 AM.

Apply it.

kubectl apply -f cronjob.yaml

Testing the Cronjob

To test it, you can launch a manual job, as follows.

kubectl create job --from=cronjobs/cleanup-pods cleanup-pods
kubectl get pods
# cleanup-pods-57b74           1/1     Running   0          4s

Inspect the logs.

kubectl logs cleanup-pods-57b74 -f
# pod "pod-7d6f8b76bc-6tdmk" deleted
# pod "pod-7d6f8b76bc-kpsmw" deleted
# pod "pod-7d6f8b76bc-ms6zx" deleted

Conclusions

This process enables the automated cleanup of evicted, failed, and old succeeded pods through a Kubernetes CronJob, helping maintain a cleaner and healthier cluster without manual intervention. It relies on a script that identifies and removes pods based on their status and age, supported by the required RBAC permissions that allow the CronJob to list and delete pod resources. Once configured, the CronJob can be scheduled according to operational needs and manually tested to ensure proper functionality before being used in production.