google - tobigithub/cloud-computing GitHub Wiki

Google Cloud Platform (GCP)

Instructions for launching a slurm cluster on GCP LINK(https://cloud.google.com/solutions/deploying-slurm-cluster-compute-engine). This requires activation of the free cloud shell (upper right corner). Additional tools such as Midnight Commander and key bindings have to be configured in the cloud shell. Appropriate ssh keys may have to be available.

Clone the appropriate github code from the google cloud shell

git clone https://github.com/SchedMD/slurm-gcp.git

>ls -l
>drwxr-xr-x 6 tkind tkind   4096 May  5 13:15 slurm-gcp

Define configurations for the SLURM deployment. Zones are important in terms of price and also availability of services. Not all zones have all available computational capacity. California has higher energy prices and is therefore more expensive on a daily basis. Recommendations for the US are us-central1-a, us-central1-b. Check zones

# a unique name for your cluster deployment, ie "small-deployment"
export CLUSTER_DEPLOY_NAME="may-deployment"

# a unique name for your cluster, ie "small-cluster"
export CLUSTER_NAME="small-cluster"

# the region where you deploy the cluster, ie "us-central1"
export CLUSTER_REGION="us-central1"

#  the zone where you deploy the cluster, ie "a", "b", "c"
export CLUSTER_ZONE="b"

Switch to slurm-gcp and copy the YAML configuration file to a new file (may-deployment.yaml)

cd slurm-gcp
cp slurm-cluster.yaml ${CLUSTER_DEPLOY_NAME}.yaml

>ls
>CONTRIBUTING.md  etc  LICENSE  may-deployment.yaml  README.md  
>scripts  slurm-cluster.yaml  slurm.jinja  slurm.jinja.schema  tf

Check the quotas in the google cloud terminal, for example with a limit of 10 high-memory-nodes, one can not exceed the limit unless a quota increase is requested. Depending on the account details or limits in the specific region this might be declined.

# us-central1	a, b, c, f	Council Bluffs, Iowa, USA
# us-east1	b, c, d	Moncks Corner, South Carolina, USA
# us-east4	a, b, c	Ashburn, Northern Virginia, USA
# us-west1	a, b, c	The Dalles, Oregon, USA
# us-west2	a, b, c	Los Angeles, California, USA
# us-west3	a, b, c	Salt Lake City, Utah, USA
# us-west4	a, b, c	Las Vegas, Nevada, USA

# output current quotas to the console
gcloud compute regions describe us-central1

# save current quotas for regions of interest in date stamped file
gcloud compute regions describe us-west1 > us-west1-may5-2020.txt
gcloud compute regions describe us-central1 > us-central1-may5-2020.txt

Edit the new YAML file (here may-deployment.yaml) for the environment. The types are defined in the file slurm.jinja.schema. For price savings and definitions check the preemptible instance guide LINK(https://cloud.google.com/compute/docs/instances/preemptible). The [default_users] field may have to be modified.

# show current edits
cat may-deployment.yaml

# edit the following variables
cluster_name            : small-cluster
zone                    : us-central1-b
controller_machine_type : n1-standard-4
login_machine_type      : n1-standard-4
compute_image_machine_type  : n1-highcpu-96
controller_disk_size_gb   : 20

partitions :
  - name              : work
    machine_type      : n1-highcpu-96
    max_node_count    : 10
    zone              : us-central1-b

Check in the cloud shell that the current configuration is OK.

gcloud config get-value core/project

Deploy the modified YAML script in the cloud shell. Verify that the environmental variables are set correctly (env).

gcloud deployment-manager deployments \
    --project="$(gcloud config get-value core/project)" \
    create $CLUSTER_DEPLOY_NAME \
    --config ${CLUSTER_DEPLOY_NAME}.yaml

which should bring up a message similar to:

Your active configuration is: [cloudshell-XXX]
The fingerprint of the deployment is 'XXX'
Waiting for create [operation-XXX]...done.
Create operation operation-XXX completed successfully.
NAME                                      TYPE                   STATE       ERRORS  INTENT
small-cluster-all-internal-firewall-rule  compute.v1.firewall    IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-allow-iap                   compute.v1.firewall    IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-compute-0-image             compute.v1.instance    IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-controller                  compute.v1.instance    IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-login0                      compute.v1.instance    IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-network                     compute.v1.network     IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-us-central1                 compute.v1.subnetwork  IN_PREVIEW  []      CREATE_OR_ACQUIRE
small-cluster-us-central1-router          compute.v1.router      IN_PREVIEW  []      CREATE_OR_ACQUIRE

Check the current deployment in the "Deployment Manager". Be aware that a cluster deployment will take a while and can also silently fail. So inside the deployment manager we select the activated slurm setup and individually can investigate. If the slurm cluster is not deployed yet (--preview) then we need to deploy it in the Deployment Manager or via the cloud shell (see deployment functions.

Config
View
Imports
etc/cgroup.conf.tpl
etc/compute-fluentd.conf.tpl
etc/controller-fluentd.conf.tpl
etc/slurm.conf.tpl
etc/slurmdbd.conf.tpl

gcloud compute ssh ${CLUSTER_NAME}-login0 --zone ${CLUSTER_REGION}-${CLUSTER_ZONE}

perform work

sinfo

Delete cluster

gcloud deployment-manager deployments delete slurm

Mounting SSDs on local VMs
Mounting local NVME or SCSI SSDs on VMs requires some legwork LINK(https://cloud.google.com/compute/docs/disks/local-ssd) including formatting and mounting the disks so it can be used. Some SSDs are utilized as scratch space, so they may be erased when the VM is powered down. Here backup options may help.

Show existing drives and folders

>df -h
tkind@n1-highcpu-96:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             43G     0   43G   0% /dev
tmpfs           8.5G  1.3M  8.5G   1% /run
/dev/sda1       9.6G  1.5G  8.1G  15% /
tmpfs            43G     0   43G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            43G     0   43G   0% /sys/fs/cgroup
/dev/sda15      105M  3.6M  101M   4% /boot/efi
/dev/loop0       94M   94M     0 100% /snap/core/9066
/dev/loop1       55M   55M     0 100% /snap/core18/1754
/dev/loop2       99M   99M     0 100% /snap/google-cloud-sdk/129
tmpfs           8.5G     0  8.5G   0% /run/user/1001

Show all block devices

>lsblk
tkind@n1-highcpu-96:~$ lsblk
NAME    MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0     7:0    0 93.9M  1 loop /snap/core/9066
loop1     7:1    0   55M  1 loop /snap/core18/1754
loop2     7:2    0 98.4M  1 loop /snap/google-cloud-sdk/129
sda       8:0    0   10G  0 disk 
├─sda1    8:1    0  9.9G  0 part /
├─sda14   8:14   0    4M  0 part 
└─sda15   8:15   0  106M  0 part /boot/efi
nvme0n1 259:0    0  375G  0 disk

format the disk with the zfs or ext4 filesystem (see benchmark)

tkind@n1-highcpu-96:~$ sudo mkfs.ext4 -F /dev/nvme0n1
mke2fs 1.44.1 (24-Mar-2018)
Discarding device blocks: done                            
Creating filesystem with 98304000 4k blocks and 24576000 inodes
Filesystem UUID: 1c52a58d-19bd-4dc8-a69b-40dabb08e0eb
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968
Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done

Create a directory to mount, here the name is "local-ssd"

sudo mkdir -p /mnt/disks/local-ssd

Mount the disk (here nvme0n1) to the directory (/mnt/disks/local-ssd) and confirm with df -H or df -h

sudo mount /dev/nvme0n1 /mnt/disks/local-ssd

tkind@n1-highcpu-96:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             43G     0   43G   0% /dev
tmpfs           8.5G  1.3M  8.5G   1% /run
/dev/sda1       9.6G  1.5G  8.1G  15% /
tmpfs            43G     0   43G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            43G     0   43G   0% /sys/fs/cgroup
/dev/sda15      105M  3.6M  101M   4% /boot/efi
/dev/loop0       94M   94M     0 100% /snap/core/9066
/dev/loop1       55M   55M     0 100% /snap/core18/1754
/dev/loop2       99M   99M     0 100% /snap/google-cloud-sdk/129
tmpfs           8.5G     0  8.5G   0% /run/user/1001
/dev/nvme0n1    369G   69M  350G   1% /mnt/disks/local-ssd

Check NVME SSD speed using hdparm (requires sudo)

tkind@n1-highcpu-96:/mnt/disks/local-ssd$ sudo hdparm -t /dev/nvme0n1
/dev/nvme0n1:
 Timing buffered disk reads: 2114 MB in  3.00 seconds = 704.12 MB/sec

Check disk throughput and latency

# check throughput for standard tmp drive
>dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.92531 s, 108 MB/s

# check latency for standard tmp drive
>dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 1.66345 s, 308 kB/s 

# check throughput for NVME SSD 
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ sudo  dd if=/dev/zero of=/mnt/disks/local-ssd/test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.76973 s, 285 MB/s

# check latency for NVME SSD (which is lower than the standard disk, mmmh)
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ sudo dd if=/dev/zero of=/mnt/disks/local-ssd/test2.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 7.3838 s, 69.3 kB/s

# remove test files
rm test1.img 
rm test2.img

Most importantly, allow access for all users

sudo chmod a+w /mnt/disks/local-ssd

Create fstab entry so the SSD can be mounted on restart without fail

tkind@n1-highcpu-96:/mnt/disks/local-ssd$ cat /etc/fstab
LABEL=cloudimg-rootfs   /        ext4   defaults        0 0
LABEL=UEFI      /boot/efi       vfat    defaults        0 0
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ 


echo UUID=`sudo blkid -s UUID -o value /dev/md0` /mnt/disks/local-ssd ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab

tkind@n1-highcpu-96:/mnt/disks/local-ssd$ cat /etc/fstab
LABEL=cloudimg-rootfs   /        ext4   defaults        0 0
LABEL=UEFI      /boot/efi       vfat    defaults        0 0
UUID= /mnt/disks/local-ssd ext4 discard,defaults,nofail 0 2

Links:
GCP Showcase - Drug virtual screening with 16,000 CPU cores

Slurm on Google Cloud
Google Code Labs - Deploy an Auto-Scaling HPC Cluster with Slurm
Google Code Labs - Building Federated HPC Clusters with Slurm
Easy SLurm on GCP - GCP SLURM deployment blog post
ShedMD GCP - The developer of SLURM for GCP on Github
SLURM GCP - Discussion of SLURM on Google Cloud
Fluid Dynamics Slurm - Fluid Dynamics supported SLURM deployment on GCP
CloudyCluster - System for deployment of millions of CPU cores on the Google Cloud
CloudyCluster Guide - Deployment guide for Omnibond CloudyCluster

SLURM and SLURM array jobs
SLURM array - submitting a large number of jobs to slurm on FASRC cluster
Array jobs at PITT.edu - Submitting multiple jobs with arrays and wraps
Array jobs at Tntech - Submitting Groups of HPC Jobs with Job Arrays
Array job at SchedMD - Slurm Array job definitions at SchedMD
Array jobs at KU - SLURM array job examples at KU.edu
Array jobs at UFL - SLURM array job examples at UFL

Videos: