503 when node is down - cniackz/public GitHub Wiki

Objective:

To test MinIO when Node is down, we shouldn't get 503 in a put operation when node is down. So we are going to reproduce the failure in version RELEASE.2023-01-02T09-40-09Z

Links:

Steps:

  1. Deploy tenant:
createcluster
installoperator
installtenant
k apply -f ~/ubuntu.yaml -n tenant-lite
  1. Change image to RELEASE.2023-01-02T09-40-09Z in the tenant spec

  2. Make sure you are using RELEASE.2023-01-02T09-40-09Z by looking at the pod image. Normally I delete the statefulset to get new servers with proper image.

  3. In Ubuntu pod put mc and perform some puts, they should work, also register the cluster:

apt update
apt install -y wget
apt install -y curl
apt install -y vim
wget https://dl.min.io/client/mc/release/linux-amd64/mc
chmod +x mc
mv mc /usr/local/bin
mc alias set monmar1323508pm/ https://minio.tenant-lite.svc.cluster.local console console123
touch a.txt
echo "a" > a.txt
mc mb monmar1323508pm/bucket
mc cp a.txt monmar1323508pm/bucket
mc license register --api-key <token> monmar1323508pm

Expected to see this working:

root@ubuntu:/# mc cp a.txt myminio/bucket
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59 B/s 0sroot@ubuntu:/# 
  1. Then cordon one MinIO Node via Lens but select a node where ubuntu pod is not running so that you can still use ubuntu pod for experimenting:
Screenshot 2023-03-13 at 12 44 04 PM

In terminal:

Cesars-MacBook-Pro:~ cniackz$ kubectl cordon kind-worker
node/kind-worker cordoned
  1. Verify one MinIO Server is down due to missing node:
0/4 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 4 node(s) had volume node affinity conflict. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
  1. From Ubuntu pod, try doing the put again.
root@ubuntu:/# mc cp a.txt myminio/bucket
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0 B/s
  • As a result, I can't put files is getting stuck, but am I getting 503?..., well I am getting this:
root@ubuntu:/# mc admin trace myminio
2023-03-13T16:53:26.366 [200 OK] s3.GetBucketLocation storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/?location=  10.244.4.9        771µs       ↑ 77 B ↓ 128 B
2023-03-13T16:53:26.376 [200 OK] s3.HeadBucket storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/ 10.244.4.9        879µs       ↑ 77 B ↓ 0 B
2023-03-13T16:53:26.380 [200 OK] s3.HeadBucket storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/ 10.244.4.9        594µs       ↑ 77 B ↓ 0 B
2023-03-13T16:53:26.387 [404 Not Found] s3.GetBucketObjectLockConfig storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/?object-lock=  10.244.4.9        172µs       ↑ 77 B ↓ 330 B
2023-03-13T16:53:26.393 [200 OK] s3.HeadBucket storage-lite-pool-0-3.storage-lite-hl.tenant-lite.svc.cluster.local:9000/bucket/ 10.244.4.9 
  • But I am not getting the 503, how can I reproduce it then?...

  • By the way I cannot put but I still can read:

root@ubuntu:/# mc ls myminio/bucket
[2023-03-13 17:03:48 UTC]     2B STANDARD a.txt
  1. In affected versions, you will see the lock to those files you are trying to cp while node is down:
root@ubuntu:/# mc support top locks monmar1323508pm2
Time                  Type    Resource
12 minutes            WRITE   .minio.sys/leader.lock
1 minutes             WRITE   bucket/e.txt
1 minutes             WRITE   .minio.sys/buckets/bucket/.usage-cache.bin
1 minutes             WRITE   bucket/c.txt
38 seconds            WRITE   bucket/a.txt <-------------------------------------- A lock here
30 seconds            WRITE   .minio.sys/buckets/.usage-cache.bin
  1. Uncordon the node and repeat the cp
$ kubectl uncordon kind-worker

As a result

a. The cp will fail due to the lock b. If using February version, there will be no lock and cp will go thru.

root@ubuntu:/# mc cp a.txt monmar1323508pm2/bucket/z.txt
/a.txt:                   2 B / 2 B ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47 B/s 0sroot@ubuntu:/# 

Conclusion:

Newer versions, will not create the lock anymore on those files you copy during the cordon allowing cp to work as expected and not getting any stuck or 503.

Actions:

  • AR Cesar: Tomorrow Tue Mar 14 I will talk to Kannappan, expose these scenarios manually tested and then test will be created accordingly.
⚠️ **GitHub.com Fallback** ⚠️