Healing experiments SNMD | MNMD - allanrogerr/public GitHub Wiki
Local SNMD
Start server
CI=on minio server /tmp/xl{1...4} &
Setup alias
mc alias set local http://127.0.0.1:9000 minioadmin minioadmin
Make bucket
mc mb local/test
Copy file
mc cp logfile local/test
Setup trace
mc admin trace --call heal local &
Corrupt file
echo "1234" > /tmp/xl3/test/logfile/e9ee25f4-c1ab-4bea-8b39-1bb4c1c91ae8/part.1
Immediately observe trace
mc cat local/test/logfile | md5sum
687618d6ba27d4914092a3bb181ddbef -
2024-03-29T16:01:37.727 [HEALING] heal.Object 127.0.0.1:9000 test/logfile 3.646916ms
tail -1 /tmp/xl3/test/logfile/e9ee25f4-c1ab-4bea-8b39-1bb4c1c91ae8/part.1
2023-10-15 06:32:33.117 PDT [63830] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.105 s, sync=0.002 s, total=0.114 s; sync files=2, longest=0.002 s, average=0.001 s; distanc%
Datacenter MNMD
Server started with systemd:
MINIO_VOLUMES="https://node{5...7}.lab.min.dev:19000/home/allan/disk{0...1}/minio"
Setup alias
mc alias set myminio https://node5.lab.min.dev:19000 minioadmin minio-secret-key-minioadmin
Make bucket
mc mb myminio/test
Copy file
mc cp out.log myminio/test
Setup trace
mc admin trace --call heal myminio &
Multiple tests Attempt 1.- Corrupt object. Observe no trace. Object remains corrupted in drive
sudo echo "123" > /home/allan/disk0/minio/test/out.log/972648ef-f79f-4835-8462-e35d8f302f85/part.1
mc cat myminio/test/out.log | md5sum
tail -1 /home/allan/disk0/minio/test/out.log/972648ef-f79f-4835-8462-e35d8f302f85/part.1
123
Attempt 2.- Delete part object. Observe no trace. Object remains corrupted in drive
sudo rm /home/allan/disk0/minio/test/out.log/972648ef-f79f-4835-8462-e35d8f302f85/part.1
mc cat myminio/test/out.log | md5sum
tail -1 /home/allan/disk0/minio/test/out.log/972648ef-f79f-4835-8462-e35d8f302f85/part.1
tail: cannot open '/home/allan/disk0/minio/test/out.log/972648ef-f79f-4835-8462-e35d8f302f85/part.1' for reading: No such file or directory
Attempt 3.- Delete entire object. Observe trace. Object finally healed in drive
sudo rm -rf /home/allan/disk0/minio/test/out.log
mc cat myminio/test/out.log | md5sum
ubuntu@acme-client:~$ 2024-03-29T23:31:38.265 [HEALING] heal.Object node7.lab.min.dev:19000 test/out.log 81.861269ms
tail -1 /home/allan/disk0/minio/test/out.log/972648ef-f79f-4835-8462-e35d8f302f85/part.1
?p?J?)?^?Di??`?A˴??ܔk%A?c??ߎo螻?S????B/3\)6CedU?????nC???allan@minio-pg6:~$
Attempt 4.- Run inspect and xlmeta. Enable bitrot see https://gist.github.com/allanrogerr/6c5fb33ef96310c3e80995cf3bc60824 and https://github.com/allanrogerr/public/wiki/mc-support-inspect-testing Identify distribution and data disks a.- append character to part.1
sudo echo "123" >> /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1
Observe no heal
mc cat myminio/test/log.out | md5sum
b.- prepend character above part.1
printf '%s\n%s\n' "123" "$(cat /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1)" > /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1
Observe no heal
mc cat myminio/test/log.out | md5sum
c.- prepend character at start of part.1
printf '%s%s' "123" "$(cat /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1)" > /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1
Observe no heal
mc cat myminio/test/log.out | md5sum
d.- add character within part.1
vi /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1
Observe no heal
mc cat myminio/test/log.out | md5sum
d.- Clear part.1
echo "123" > /home/allan/disk0/minio/test/log.out/78dd295c-0b0c-4ecc-b7be-97508a664693/part.1
Observe no heal
mc cat myminio/test/log.out | md5sum