Filesystem and Storage Setup - LCAS/Cluster_wiki GitHub Wiki

Abstract:

This tutorial describes the filesystem and storage setup for the CLAS computing cluster, where the goal is to have equal access to various storage types replicated across all nodes using network storage.

Overview

All nodes have a small local SSD drive for the operating system, libraries etc. Furthermore, the goal is to have three types of storage mapped at different locations:

/home/ slower storage for large amounts of data - replicated along all nodes
/work/scratch/ short term faster storage - replicated along all nodes
/work/local/ local very fast storage - not replicated, deleted daily Storage and user accounts are replicated among all nodes

Implementation

we use a single server with 4 2TB hdd's in a raid 5 for /home/
4 256GB SSDs in a logical volume (no raid redundancy) for /work/scratch/ whereas both are then replicated along the compute nodes using nfs
the /work/local/ is just a folder on each of the compute node
for filesystem permissions, we need consistent user accounts across all nodes. instead of a ldap setup, we just replicate user accounts by copying files in /etc manually from the main node to all other nodes
there are a couple of scripts for automating things (user folder setup, account replication) in /home/root/of the root node.

Raid setup:

Note: The following assumes a raid setup using intel rapit storage technology, which is not really a hardware raid, but a firmware raid. This is basically a software raid where configuration happens in bios, but it still needs a linux driver (mdadm) to work

Go to bios -> set sata controller to raid mode, reboot
Enter bios again, go to intel rapid storage, configure and create raid 5 (default options are fine, although i used 64k blocks)
Enter ubuntu, install dmraid so that it picks up the raid volume
sudo apt install mdadm
after this, the gnome-disks utility should show the raid and allow you to create and format the partition. Instead of doing this, we could also use linux LVM for setting up software raid similarily to the method described below (might be easier if we ever need to do this again in the future)

SSDs Setup as Logical drive (LVM)

Create partition layout:
- for each /dev/sd* in (/dev/sda /dev/sdb /dev/sdc /dev/sdd) execute fdisk /dev/sd* and enter the following

g # creates a new partition layout
n # creates a new partition
p # marks partition as primary
t
30 # changes type of partition to linux LVM
w # commit changes and exit

initialize partitions for use with LVM: pvcreate /dev/sdb1 /dev/sdc1 /dev/sdd1
pvdisplay should now list these partitiosn
create logical disk group called scratch: vgcreate scratch /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
create logical volume called scratch lvcreate -l 100%FREE -n scratch scratch the volume/partition should now show up under /dev/scratch/scratch and can be formatted as ext4

Setup users:

create user on the main node (currently lch01), setup group permissions (e.g. adduser mike sudo)
copy user configurations to other nodes:

scp /etc/{passwd,group,subuid,gshadow,shadow} lch02:/etc/
scp /etc/{passwd,group,subuid,gshadow,shadow} lch03:/etc/
scp /etc/{passwd,group,subuid,gshadow,shadow} lch04:/etc/

theres also a script for that: /root/replicate_user_permissions.sh in the end we would only want to allow user login to the login nodes. we can still achieve that by limiting ssh access to the compute nodes to root only and blocking all other incoming and all outgoing network traffic.

TODO

module isn't setup yet on all nodes, we should try to replicate that from the tu darmstadt cluster
currently all users can access all nodes via ssh (since all user accounts are replicated), in the end we want slurm to take care of job scheduling and not allow users to access the nodes manually backup filesystem for login node and compute nodes
implement a way to manage installed libraries and packages on all nodes (modules is just for loading libraries, doesn't install them) ...