Push MinIO to a breaking point and make sure things recover appropriately - cniackz/public GitHub Wiki
To push MinIO to a breaking point and make sure things recover appropriately.
- In a real k8s cluster, deploy a Tenant
- Once deployed, limit the resources of the tenant so that we can easily break it with less transactions (optional for testing):
apiVersion: minio.min.io/v2
kind: Tenant
spec:
pools:
- name: pool-0
resources:
limits:
cpu: "1"
memory: 1Gi
requests:
cpu: "1"
memory: 1Gi
- Deploy as many Ubuntu Pods as needed in the tenant's namespace:
apiVersion: v1
kind: Pod
metadata:
name: ubuntu
labels:
app: ubuntu
spec:
volumes:
- name: socket
hostPath:
path: /run/containerd/containerd.sock
containers:
- volumeMounts:
- mountPath: /run/containerd/containerd.sock
name: socket
readOnly: false
image: ubuntu
command:
- "sleep"
- "604800"
imagePullPolicy: IfNotPresent
name: ubuntu
restartPolicy: Always
- On each Ubuntu pod, install needed packages and then stress the system by copying several files at the same time:
Use the IP Address of one MinIO Pod, so that we break one in particular, or use the API URL to break them all.
apt update
apt install -y wget
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/mc
mc alias set myminio https://192.168.93.73:9000 minio minio123 --insecure
mc mb myminio/testing --insecure
touch a.txt
echo "a" > a.txt
for i in {1..1000}
do
while true; do mc cp a.txt myminio/testing/a.txt --insecure; done &
done
- Notice how the container restarts due to the load coming from the ubuntu pods:
$ k get pods -n tenant-lite -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myminio-pool-0-0 2/2 Running 2 (35s ago) 15m 192.168.93.88 minio-k8s17 <none> <none>
myminio-pool-0-1 2/2 Running 0 41m 192.168.239.250 minio-k8s18 <none> <none>
myminio-pool-0-2 2/2 Running 0 41m 192.168.13.242 minio-k8s20 <none> <none>
myminio-pool-0-3 2/2 Running 0 41m 192.168.177.29 minio-k8s19 <none> <none>
ubuntu 1/1 Running 0 4m35s 192.168.93.86 minio-k8s17 <none> <none>
ubuntu2 1/1 Running 0 4m28s 192.168.93.81 minio-k8s17 <none> <none>
- And check how the cpu and memory are getting used to the limit:
$ kubectl top pod myminio-pool-0-0 -n tenant-lite
NAME CPU(cores) MEMORY(bytes)
myminio-pool-0-0 997m 643Mi
- After the load is stopped, container will re-start and it will get back to normal with no more restarts as the stress load has stopped.
At this point, you can delete the pod to clear the restarts or just keep it as is.
$ k get pods -n tenant-lite -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myminio-pool-0-0 2/2 Running 3 (56s ago) 17m 192.168.93.88 minio-k8s17 <none> <none>
myminio-pool-0-1 2/2 Running 0 43m 192.168.239.250 minio-k8s18 <none> <none>
myminio-pool-0-2 2/2 Running 0 43m 192.168.13.242 minio-k8s20 <none> <none>
myminio-pool-0-3 2/2 Running 0 44m 192.168.177.29 minio-k8s19 <none> <none>
- k8s error message for this will be:
Back-off restarting failed container
- Application errors when pod is restarting and not reachable will be:
dial tcp 192.168.93.73:9000: connect: connection refused
- So we restart the pod for clearing the containers restart and then make sure we can still communicate normally:
$ k get pods -n tenant-lite -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
myminio-pool-0-0 2/2 Running 0 43s 192.168.93.87 minio-k8s17 <none> <none>
myminio-pool-0-1 2/2 Running 0 46m 192.168.239.250 minio-k8s18 <none> <none>
myminio-pool-0-2 2/2 Running 0 46m 192.168.13.242 minio-k8s20 <none> <none>
myminio-pool-0-3 2/2 Running 0 46m 192.168.177.29 minio-k8s19 <none> <none>
apt update
apt install -y wget
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin/mc
mc alias set myminio https://minio.tenant-lite.svc.cluster.local minio minio123 --insecure
And so we are:
root@ubuntu:/# mc admin info myminio
● myminio-pool-0-0.myminio-hl.tenant-lite.svc.cluster.local:9000
Uptime: 2 minutes
Version: 2023-03-24T21:41:23Z
Network: 4/4 OK
Drives: 2/2 OK
Pool: 1
● myminio-pool-0-1.myminio-hl.tenant-lite.svc.cluster.local:9000
Uptime: 47 minutes
Version: 2023-03-24T21:41:23Z
Network: 4/4 OK
Drives: 2/2 OK
Pool: 1
● myminio-pool-0-2.myminio-hl.tenant-lite.svc.cluster.local:9000
Uptime: 48 minutes
Version: 2023-03-24T21:41:23Z
Network: 4/4 OK
Drives: 2/2 OK
Pool: 1
● myminio-pool-0-3.myminio-hl.tenant-lite.svc.cluster.local:9000
Uptime: 48 minutes
Version: 2023-03-24T21:41:23Z
Network: 4/4 OK
Drives: 2/2 OK
Pool: 1
Pools:
1st, Erasure sets: 1, Drives per erasure set: 8
2 B Used, 1 Bucket, 1 Object
8 drives online, 0 drives offline
You can push MinIO to the limit by simply performing lot of transactions at the same time, when proper load is used, container will restart and things are going to be back to normal.