Create Raid Array and SSD Health Check - chunxxc/GPU-Server-Handbook GitHub Wiki

It is software RAID on dgx-station, mainly managed by mdadm

Currently, there are 5 identical SSDs on the dgx-station, among which No1 is the OS disk (/ and /boot/efi), cards No2-4 the storage disk (/raid). The later 3 cards form a Raid-0 array (/dev/md0). Card No5 is settled as the mirror of No1:

  • /sda1 <=> /sde1 (UEFI boot path)
  • /sda2 + /sde2 -> /dev/md123 (root path)
  • /sdb + /sdc + /sdd -> /dev/md0 (raid storage)

How to create new RAID 1 with existing disk

I(Xuechun) decided to have one RAID1 on / and two identical EFI boot sections so that the server is still bootable without data loss if one of the root disks fails. This favorable guarantee comes with a price of series grub2 configuration (which double the admin workload). Luckily I find some "guides" here:https://www.howtoforge.com/how-to-set-up-software-raid1-on-a-running-system-incl-grub2-configuration-ubuntu-10.04. and https://askubuntu.com/questions/1066028/install-ubuntu-18-04-desktop-with-raid-1-and-lvm-on-machine-with-uefi-bios/

The first thing is to shut down the station and gets the spare SSD card physically mounted; (don't forget the 4 tiny screws in the package and bring a screwer!)

(current root disk /dev/sda and new disk /dev/sde)

1. Partition the new card. Create identical partition on /sde with /sda and set EFI,RAID type

This step will erase all data on /sde

$lsblk (check if the new disk is here)
$sudo sfdisk -d /dev/sda | sfdisk --force /dev/sde (copy partition)
$sudo fdisk /dev/sde

The last line usually is not needed. But anyway, it will open a dialog where press 't' to modify the partition type, then chose '1' ( and later 2) for partition index, then chose 1 for EFI type (later 'fd00' for RAID type). Finally, type 'w' to write the partition and exit. You can then use $lsblk -f to verify the partitions.

2. Create RAID 1 with the pre-existing disk set as 'missing' to prevent data loss

$sudo mdadm --create /dev/md123 --level=123 --raid-devices=2 missing /dev/sde2 #(no need to use metadata=0.9)
$sudo mkfs.ext4 /dev/md123

Now you should see /dev/md123 existing. So update the mdadm config by

$sudo bash
$root@dgx: mdadm --examine --scan >> /etc/mdadm/mdadm.conf 
$^D

Remember to vim delete duplicate enties in mdadm.conf

You might see the symbolic link /dev/md/123 instead of /dev/md123. Their UUID should be consistent with disk-UUID The block/raid UUID is used in /etc/fstab To make sure every raid is assembled at boot time, you probably will need:

$sudo dpkg-reconfigure mdadm    # Choose "all" disks to start at boot
$sudo update-initramfs -u       # Updates the existing initramfs

3. Reconfigure Grub2 and / mount point (DO NOT REBOOT AT THIS STEP)

Backup /boot/grub/ and /boot/efi on local machine in case run into grub-rescue

There are two grub.cfg, one is located EFI/ubuntu/grub.cfg which point to root/boot/grub/grub.cfg. So the key point is to set the new root path as /dev/md123 instead of the old /dev/sda2 (all in UUID).

For EFI grub we can just text edit it. But we can not edit the latter .cfg directly but to add menuentry through /etc/grub.d/09_swraid1_setup (essentially any number before 10) with content (see https://wiki.gentoo.org/wiki/GRUB2/Advanced_storage for explain):

#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
menuentry 'DGX OS Desktop GNU/Linux Raid' --class dgx --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-4383e10f-3612-417e-99c4-bcd9b160b834' {
        recordfail
        insmod gzio
        insmod part_gpt
        insmod mdraid1x
        insmod ext2
        set root='(md123)'
        if [ x$feature_platform_search_hint = xy ]; then
                search --no-floppy --fs-uuid --set=root --hint='mduuid/e66dc590fc6bdd490b5ae2e27c468e38' 4383e10f-3612-417e-99c4-bcd9b160b834 
        else
                search --no-floppy --fs-uuid --set=root 4383e10f-3612-417e-99c4-bcd9b160b834
        fi
        linux   /boot/vmlinuz-4.15.0-124-generic domdadm root=UUID=4383e10f-3612-417e-99c4-bcd9b160b834 ro cgroup_enable=memory swapaccount=1 rd.driver.blacklist=nouveau nouveau.modeset=0 elevator=deadline transparent_hugepage=madvise quiet splash $vt_handoff
        initrd  /boot/initrd.img-4.15.0-124-generic
}

where the 'mduuid/...' and 'fs_uuid' can be find by $sudo grub-probe -d /dev/md123 -t bios_hints(or fs_uuid)
And then

$sudo update-grub
$sudo update-initramfs -u

4. Mount new raid and copy the old data

The next step will erase all data in the raid array/new card

$lsblk -f (this will list all file system type in all card, which in our case is vfat and ext4)
$sudo mkfs.vfat /dev/sde1
$sudo mkfs.ext4 /dev/md123

Now temporally mount them and copy the pre-existing disk (remember to exclude /raid/*!) into it (use 'rsync' or 'cp' or 'dd'?)

$sudo mkdir /mnt/new_root
$sudo mount /dev/md123 /mnt/new_root
$sudo cp -dpRx / /mnt/new_root
or $sudo rsync -aAXHv --exclude={"/tmp/*","/run/*","/mnt/*","/media/*","/lost+found","/raid/*"} / /mnt/new_root
  • This will take 12h++ since we are talking about 2TB... go home and relax!

5. Reboot the system and hope it works

Before $sudo reboot, think hardly that you have edited fstab and grub.cfg and mdadm.conf with them existing in the new root raid already.

It might show some weird failure warning since now it is booting from a degraded Raid array.

6. Add the original disk and wait again for mdadm finish synicing

First, fdisk the /dev/sda2 to fd00 Linux Raid, then

$sudo mdadm --add /dev/md123 /dev/sda2
  • ! Sync root could take 5 hours but silently. Remember to monitor with cat /proc/mdstat until it is finished

Finally, you can finish the root raid1 setup by deleting the 09_swraid1_setup and run

$sudo update-grub
$sudo update-initramfs -u
$sudo reboot

To avoid the waiting time when grub asks you to choose the customized Linux section (since now the default is raid1).

7. Add new entry to the boot chain

Copy /dev/sda1 to /dev/sde1
Check the current boot entries and then add /dev/sde as new entry "ubuntu2"

$efibootmgr -v
$sudo efibootmgr -c -d /dev/sde -p 1 -L "ubuntu2" -l \\EFI\\ubuntu\\grubx64.efi

Then you can modify the boot order as you wish.

Somehow the new netries always disappear... need to check next time.

If bootloader fails and system no longer bootable (This happened twice already)

If the system gets into the BIOS, this means grub is not properly installed. Use a usb-linux-drive (yes, a good admin should always have a spare linux-drive) to boot from and then properly install the grub (since the UUID will change you will probably need to run the grub shell anyway).

If the system gets in to grub shell, there are feel limited command you can use:
Use 'ls' to list all visible blocks (this will show you the md as well but do not use it). Usually, you should find (hd0,2) is the original root. Verify by display the content $ls (hd0,2)/ and then you can navigate to the kernel by

$set prefix=(hd0,2)/boot/grub
$set root=(hd0,2)
$insmod linux
$insmod normal
$normal

Then you should be able to choose boot from the DGX OS. Welcome back.

Commandline tool to check SSD health and recover

$lsblk
$sudo mdadm -D /dev/md0

These two commands show useful information about the current Raid array. In rare cases, you will need to stop the raid and completely rebuild it (data will be lost if operating under Raid0)

$sudo mdadm --stop
$sudo configure_raid_array.py -r

Most commonly, the Raid array is broken due to SSD failure. If this is the case, do not hesitate to contact the suppliers for replacement, Joakim will happily pay for it.

$sudo hdfparm -I /dev/*
$sudo smartctl -a /dev/*
$sudo hdfparm -I /dev/*
$sudo smartctl -a /dev/*

smartctl is also usefull to monitoring the general health status, e.g. temperature, of the SSDs.

Ressurect SSD with ERRORMODE firmware (ONLY IF NO WARRANTY)

Now with SAMSUNG MZ7LM1T9HMJP-00005, if you can still communicate via SATA and it shows 'ERRORMODE' on firmware and 1Gb size(instead of 1.9Tb) from SMART data, there is a way to save the SSD but the data will be lost. Given the SSD is worth 6000+SEK, it will be useful.

basically it follows the guide here: https://blog.muwave.de/2019/09/samsung-ssd-resurrection/