Push MinIO to a breaking point and make sure things recover appropriately

Objective:

To push MinIO to a breaking point and make sure things recover appropriately.

Steps:

In a real k8s cluster, deploy a Tenant
Once deployed, limit the resources of the tenant so that we can easily break it with less transactions (optional for testing):

apiVersion: minio.min.io/v2
kind: Tenant
spec:
  pools:
  - name: pool-0
    resources:
      limits:
        cpu: "1"
        memory: 1Gi
      requests:
        cpu: "1"
        memory: 1Gi

Deploy as many Ubuntu Pods as needed in the tenant's namespace:

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
  labels:
    app: ubuntu
spec:
  volumes:
    - name: socket
      hostPath:
        path: /run/containerd/containerd.sock
  containers:
  - volumeMounts:
      - mountPath: /run/containerd/containerd.sock
        name: socket
        readOnly: false
    image: ubuntu
    command:
      - "sleep"
      - "604800"
    imagePullPolicy: IfNotPresent
    name: ubuntu
  restartPolicy: Always

On each Ubuntu pod, install needed packages and then stress the system by copying several files at the same time:

Use the IP Address of one MinIO Pod, so that we break one in particular, or use the API URL to break them all.

apt update
apt install -y wget
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/mc
mc alias set myminio https://192.168.93.73:9000 minio minio123 --insecure
mc mb myminio/testing --insecure
touch a.txt
echo "a" > a.txt
for i in {1..1000}
do
  while true; do mc cp a.txt myminio/testing/a.txt --insecure; done &
done

Notice how the container restarts due to the load coming from the ubuntu pods:

$ k get pods -n tenant-lite -o wide
NAME               READY   STATUS    RESTARTS      AGE     IP                NODE          NOMINATED NODE   READINESS GATES
myminio-pool-0-0   2/2     Running   2 (35s ago)   15m     192.168.93.88     minio-k8s17   <none>           <none>
myminio-pool-0-1   2/2     Running   0             41m     192.168.239.250   minio-k8s18   <none>           <none>
myminio-pool-0-2   2/2     Running   0             41m     192.168.13.242    minio-k8s20   <none>           <none>
myminio-pool-0-3   2/2     Running   0             41m     192.168.177.29    minio-k8s19   <none>           <none>
ubuntu             1/1     Running   0             4m35s   192.168.93.86     minio-k8s17   <none>           <none>
ubuntu2            1/1     Running   0             4m28s   192.168.93.81     minio-k8s17   <none>           <none>

And check how the cpu and memory are getting used to the limit:

$ kubectl top pod myminio-pool-0-0 -n tenant-lite
NAME               CPU(cores)   MEMORY(bytes)   
myminio-pool-0-0   997m         643Mi

After the load is stopped, container will re-start and it will get back to normal with no more restarts as the stress load has stopped.

At this point, you can delete the pod to clear the restarts or just keep it as is.

$ k get pods -n tenant-lite -o wide
NAME               READY   STATUS    RESTARTS      AGE   IP                NODE          NOMINATED NODE   READINESS GATES
myminio-pool-0-0   2/2     Running   3 (56s ago)   17m   192.168.93.88     minio-k8s17   <none>           <none>
myminio-pool-0-1   2/2     Running   0             43m   192.168.239.250   minio-k8s18   <none>           <none>
myminio-pool-0-2   2/2     Running   0             43m   192.168.13.242    minio-k8s20   <none>           <none>
myminio-pool-0-3   2/2     Running   0             44m   192.168.177.29    minio-k8s19   <none>           <none>

k8s error message for this will be: Back-off restarting failed container
Application errors when pod is restarting and not reachable will be: dial tcp 192.168.93.73:9000: connect: connection refused

So we restart the pod for clearing the containers restart and then make sure we can still communicate normally:

$ k get pods -n tenant-lite -o wide
NAME               READY   STATUS    RESTARTS   AGE   IP                NODE          NOMINATED NODE   READINESS GATES
myminio-pool-0-0   2/2     Running   0          43s   192.168.93.87     minio-k8s17   <none>           <none>
myminio-pool-0-1   2/2     Running   0          46m   192.168.239.250   minio-k8s18   <none>           <none>
myminio-pool-0-2   2/2     Running   0          46m   192.168.13.242    minio-k8s20   <none>           <none>
myminio-pool-0-3   2/2     Running   0          46m   192.168.177.29    minio-k8s19   <none>           <none>

apt update
apt install -y wget
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/mc
mc alias set myminio https://minio.tenant-lite.svc.cluster.local minio minio123 --insecure

And so we are:

root@ubuntu:/# mc admin info myminio
●  myminio-pool-0-0.myminio-hl.tenant-lite.svc.cluster.local:9000
   Uptime: 2 minutes 
   Version: 2023-03-24T21:41:23Z
   Network: 4/4 OK 
   Drives: 2/2 OK 
   Pool: 1

●  myminio-pool-0-1.myminio-hl.tenant-lite.svc.cluster.local:9000
   Uptime: 47 minutes 
   Version: 2023-03-24T21:41:23Z
   Network: 4/4 OK 
   Drives: 2/2 OK 
   Pool: 1

●  myminio-pool-0-2.myminio-hl.tenant-lite.svc.cluster.local:9000
   Uptime: 48 minutes 
   Version: 2023-03-24T21:41:23Z
   Network: 4/4 OK 
   Drives: 2/2 OK 
   Pool: 1

●  myminio-pool-0-3.myminio-hl.tenant-lite.svc.cluster.local:9000
   Uptime: 48 minutes 
   Version: 2023-03-24T21:41:23Z
   Network: 4/4 OK 
   Drives: 2/2 OK 
   Pool: 1

Pools:
   1st, Erasure sets: 1, Drives per erasure set: 8

2 B Used, 1 Bucket, 1 Object
8 drives online, 0 drives offline

Conclusion:

You can push MinIO to the limit by simply performing lot of transactions at the same time, when proper load is used, container will restart and things are going to be back to normal.

Push MinIO to a breaking point and make sure things recover appropriately - cniackz/public GitHub Wiki

Objective:

Steps:

Conclusion:

⚠️ GitHub.com Fallback ⚠️

Push MinIO to a breaking point and make sure things recover appropriately - cniackz/public GitHub Wiki

Objective:

Steps:

Conclusion:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️