google - tobigithub/cloud-computing GitHub Wiki
Google Cloud Platform (GCP)
Instructions for launching a slurm cluster on GCP LINK(https://cloud.google.com/solutions/deploying-slurm-cluster-compute-engine). This requires activation of the free cloud shell (upper right corner). Additional tools such as Midnight Commander and key bindings have to be configured in the cloud shell. Appropriate ssh keys may have to be available.
- Clone the appropriate github code from the google cloud shell
git clone https://github.com/SchedMD/slurm-gcp.git
>ls -l
>drwxr-xr-x 6 tkind tkind 4096 May 5 13:15 slurm-gcp
- Define configurations for the SLURM deployment. Zones are important in terms of price and also availability of services. Not all zones have all available computational capacity. California has higher energy prices and is therefore more expensive on a daily basis. Recommendations for the US are us-central1-a, us-central1-b. Check zones
# a unique name for your cluster deployment, ie "small-deployment"
export CLUSTER_DEPLOY_NAME="may-deployment"
# a unique name for your cluster, ie "small-cluster"
export CLUSTER_NAME="small-cluster"
# the region where you deploy the cluster, ie "us-central1"
export CLUSTER_REGION="us-central1"
# the zone where you deploy the cluster, ie "a", "b", "c"
export CLUSTER_ZONE="b"
- Switch to slurm-gcp and copy the YAML configuration file to a new file (may-deployment.yaml)
cd slurm-gcp
cp slurm-cluster.yaml ${CLUSTER_DEPLOY_NAME}.yaml
>ls
>CONTRIBUTING.md etc LICENSE may-deployment.yaml README.md
>scripts slurm-cluster.yaml slurm.jinja slurm.jinja.schema tf
- Check the quotas in the google cloud terminal, for example with a limit of 10 high-memory-nodes, one can not exceed the limit unless a quota increase is requested. Depending on the account details or limits in the specific region this might be declined.
# us-central1 a, b, c, f Council Bluffs, Iowa, USA
# us-east1 b, c, d Moncks Corner, South Carolina, USA
# us-east4 a, b, c Ashburn, Northern Virginia, USA
# us-west1 a, b, c The Dalles, Oregon, USA
# us-west2 a, b, c Los Angeles, California, USA
# us-west3 a, b, c Salt Lake City, Utah, USA
# us-west4 a, b, c Las Vegas, Nevada, USA
# output current quotas to the console
gcloud compute regions describe us-central1
# save current quotas for regions of interest in date stamped file
gcloud compute regions describe us-west1 > us-west1-may5-2020.txt
gcloud compute regions describe us-central1 > us-central1-may5-2020.txt
- Edit the new YAML file (here may-deployment.yaml) for the environment. The types are defined in the file slurm.jinja.schema. For price savings and definitions check the preemptible instance guide LINK(https://cloud.google.com/compute/docs/instances/preemptible). The [default_users] field may have to be modified.
# show current edits
cat may-deployment.yaml
# edit the following variables
cluster_name : small-cluster
zone : us-central1-b
controller_machine_type : n1-standard-4
login_machine_type : n1-standard-4
compute_image_machine_type : n1-highcpu-96
controller_disk_size_gb : 20
partitions :
- name : work
machine_type : n1-highcpu-96
max_node_count : 10
zone : us-central1-b
- Check in the cloud shell that the current configuration is OK.
gcloud config get-value core/project
- Deploy the modified YAML script in the cloud shell. Verify that the environmental variables are set correctly (env).
gcloud deployment-manager deployments \
--project="$(gcloud config get-value core/project)" \
create $CLUSTER_DEPLOY_NAME \
--config ${CLUSTER_DEPLOY_NAME}.yaml
which should bring up a message similar to:
Your active configuration is: [cloudshell-XXX]
The fingerprint of the deployment is 'XXX'
Waiting for create [operation-XXX]...done.
Create operation operation-XXX completed successfully.
NAME TYPE STATE ERRORS INTENT
small-cluster-all-internal-firewall-rule compute.v1.firewall IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-allow-iap compute.v1.firewall IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-compute-0-image compute.v1.instance IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-controller compute.v1.instance IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-login0 compute.v1.instance IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-network compute.v1.network IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-us-central1 compute.v1.subnetwork IN_PREVIEW [] CREATE_OR_ACQUIRE
small-cluster-us-central1-router compute.v1.router IN_PREVIEW [] CREATE_OR_ACQUIRE
- Check the current deployment in the "Deployment Manager". Be aware that a cluster deployment will take a while and can also silently fail. So inside the deployment manager we select the activated slurm setup and individually can investigate. If the slurm cluster is not deployed yet (--preview) then we need to deploy it in the Deployment Manager or via the cloud shell (see deployment functions.
Config
View
Imports
etc/cgroup.conf.tpl
etc/compute-fluentd.conf.tpl
etc/controller-fluentd.conf.tpl
etc/slurm.conf.tpl
etc/slurmdbd.conf.tpl
- login into controller node (check the controller name and the region and zone)
gcloud compute ssh ${CLUSTER_NAME}-login0 --zone ${CLUSTER_REGION}-${CLUSTER_ZONE}
- perform work
sinfo
- Delete cluster
gcloud deployment-manager deployments delete slurm
Mounting SSDs on local VMs
Mounting local NVME or SCSI SSDs on VMs requires some legwork LINK(https://cloud.google.com/compute/docs/disks/local-ssd) including formatting and mounting the disks so it can be used. Some SSDs are utilized as scratch space, so they may be erased when the VM is powered down. Here backup options may help.
- Show existing drives and folders
>df -h
tkind@n1-highcpu-96:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 43G 0 43G 0% /dev
tmpfs 8.5G 1.3M 8.5G 1% /run
/dev/sda1 9.6G 1.5G 8.1G 15% /
tmpfs 43G 0 43G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 43G 0 43G 0% /sys/fs/cgroup
/dev/sda15 105M 3.6M 101M 4% /boot/efi
/dev/loop0 94M 94M 0 100% /snap/core/9066
/dev/loop1 55M 55M 0 100% /snap/core18/1754
/dev/loop2 99M 99M 0 100% /snap/google-cloud-sdk/129
tmpfs 8.5G 0 8.5G 0% /run/user/1001
- Show all block devices
>lsblk
tkind@n1-highcpu-96:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 93.9M 1 loop /snap/core/9066
loop1 7:1 0 55M 1 loop /snap/core18/1754
loop2 7:2 0 98.4M 1 loop /snap/google-cloud-sdk/129
sda 8:0 0 10G 0 disk
├─sda1 8:1 0 9.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
nvme0n1 259:0 0 375G 0 disk
- format the disk with the zfs or ext4 filesystem (see benchmark)
tkind@n1-highcpu-96:~$ sudo mkfs.ext4 -F /dev/nvme0n1
mke2fs 1.44.1 (24-Mar-2018)
Discarding device blocks: done
Creating filesystem with 98304000 4k blocks and 24576000 inodes
Filesystem UUID: 1c52a58d-19bd-4dc8-a69b-40dabb08e0eb
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
- Create a directory to mount, here the name is "local-ssd"
sudo mkdir -p /mnt/disks/local-ssd
- Mount the disk (here nvme0n1) to the directory (/mnt/disks/local-ssd) and confirm with df -H or df -h
sudo mount /dev/nvme0n1 /mnt/disks/local-ssd
tkind@n1-highcpu-96:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 43G 0 43G 0% /dev
tmpfs 8.5G 1.3M 8.5G 1% /run
/dev/sda1 9.6G 1.5G 8.1G 15% /
tmpfs 43G 0 43G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 43G 0 43G 0% /sys/fs/cgroup
/dev/sda15 105M 3.6M 101M 4% /boot/efi
/dev/loop0 94M 94M 0 100% /snap/core/9066
/dev/loop1 55M 55M 0 100% /snap/core18/1754
/dev/loop2 99M 99M 0 100% /snap/google-cloud-sdk/129
tmpfs 8.5G 0 8.5G 0% /run/user/1001
/dev/nvme0n1 369G 69M 350G 1% /mnt/disks/local-ssd
- Check NVME SSD speed using hdparm (requires sudo)
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ sudo hdparm -t /dev/nvme0n1
/dev/nvme0n1:
Timing buffered disk reads: 2114 MB in 3.00 seconds = 704.12 MB/sec
- Check disk throughput and latency
# check throughput for standard tmp drive
>dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 9.92531 s, 108 MB/s
# check latency for standard tmp drive
>dd if=/dev/zero of=/tmp/test2.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 1.66345 s, 308 kB/s
# check throughput for NVME SSD
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ sudo dd if=/dev/zero of=/mnt/disks/local-ssd/test1.img bs=1G count=1 oflag=dsync
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 3.76973 s, 285 MB/s
# check latency for NVME SSD (which is lower than the standard disk, mmmh)
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ sudo dd if=/dev/zero of=/mnt/disks/local-ssd/test2.img bs=512 count=1000 oflag=dsync
512000 bytes (512 kB, 500 KiB) copied, 7.3838 s, 69.3 kB/s
# remove test files
rm test1.img
rm test2.img
- Most importantly, allow access for all users
sudo chmod a+w /mnt/disks/local-ssd
- Create fstab entry so the SSD can be mounted on restart without fail
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ cat /etc/fstab
LABEL=cloudimg-rootfs / ext4 defaults 0 0
LABEL=UEFI /boot/efi vfat defaults 0 0
tkind@n1-highcpu-96:/mnt/disks/local-ssd$
echo UUID=`sudo blkid -s UUID -o value /dev/md0` /mnt/disks/local-ssd ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
tkind@n1-highcpu-96:/mnt/disks/local-ssd$ cat /etc/fstab
LABEL=cloudimg-rootfs / ext4 defaults 0 0
LABEL=UEFI /boot/efi vfat defaults 0 0
UUID= /mnt/disks/local-ssd ext4 discard,defaults,nofail 0 2
Links:
GCP Showcase - Drug virtual screening with 16,000 CPU cores
Slurm on Google Cloud
Google Code Labs - Deploy an Auto-Scaling HPC Cluster with Slurm
Google Code Labs - Building Federated HPC Clusters with Slurm
Easy SLurm on GCP - GCP SLURM deployment blog post
ShedMD GCP - The developer of SLURM for GCP on Github
SLURM GCP - Discussion of SLURM on Google Cloud
Fluid Dynamics Slurm - Fluid Dynamics supported SLURM deployment on GCP
CloudyCluster - System for deployment of millions of CPU cores on the Google Cloud
CloudyCluster Guide - Deployment guide for Omnibond CloudyCluster
SLURM and SLURM array jobs
SLURM array - submitting a large number of jobs to slurm on FASRC cluster
Array jobs at PITT.edu - Submitting multiple jobs with arrays and wraps
Array jobs at Tntech - Submitting Groups of HPC Jobs with Job Arrays
Array job at SchedMD - Slurm Array job definitions at SchedMD
Array jobs at KU - SLURM array job examples at KU.edu
Array jobs at UFL - SLURM array job examples at UFL
Videos: