How to configure liveness and rediness - cniackz/public GitHub Wiki

Objective:

To be able to talk to /minio/health/live and see if MinIO Server is alive.

User Story:

This is specially useful when a node goes down and MinIO Server is death. If Pod is death we want to know that so that the other servers can work without it.

  • if there is no liveness configured then it won't work so bringing down a pod would still route the request to downed pod

Tenant Specification:

spec:
  liveness:
    initialDelaySeconds: 60
    periodSeconds: 20
    tcpSocket:
      port: 9000
  readiness:
    tcpSocket:
      port: 9000
    initialDelaySeconds: 60
    periodSeconds: 20

StatefulSet Values:

          livenessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 20
            successThreshold: 1
            tcpSocket:
              port: 9000
            timeoutSeconds: 1
          readinessProbe:
            failureThreshold: 3
            initialDelaySeconds: 60
            periodSeconds: 20
            successThreshold: 1
            tcpSocket:
              port: 9000
            timeoutSeconds: 1

MinIO Pods:

      livenessProbe:
        tcpSocket:
          port: 9000
        initialDelaySeconds: 60
        timeoutSeconds: 1
        periodSeconds: 20
        successThreshold: 1
        failureThreshold: 3
      readinessProbe:
        tcpSocket:
          port: 9000
        initialDelaySeconds: 60
        timeoutSeconds: 1
        periodSeconds: 20
        successThreshold: 1
        failureThreshold: 3
  • As per Harsha, we shouldn't be using Readiness anymore just Liveness and sometimes the cure is to remove all probes to get pods back, we observed an issue with a sts where having a single node down due to maintenance and then back put all pods to crashloop because liveness were restarting all pods over and over, the fix was to remove the probes and let minio run once minio was running we put liveness back but I believe proper values for readiness and liveness should also works and this is why I put those above in case I find similar case again where people is not using the operator, then I know the k8s value for the sts in regards with minio.

Verification:

  • the cordoned pod must not have an entry
  • liveness failed pod must not have an entry
$ kubectl get endpoints myminio-hl -n tenant-lite -o json | jq -r .subsets[].addresses[].nodeName
kind-worker2
kind-worker
kind-worker4
kind-worker3

Tain a node:

kubectl taint nodes kind-worker key1=value1:NoSchedule
kubectl taint nodes kind-worker key1=value1:NoSchedule-

Conclusion:

  • mc can get frozen if no readiness and liveness is added, we saw that already in OpenShift and can be reproduced in kind/k8s
  • I think Readiness does the trick:
# // Readiness Probe for container readiness. Container will be removed from service endpoints if the probe fails.