Switching to Slurm - statnet/computing GitHub Wiki
This page describes how to transition to the newer generation of Hyak. Most of this information is taken from:
Mox scheduler on Hyak wiki
https://wiki.cac.washington.edu/display/hyakusers/Mox_scheduler
Mox overview
https://wiki.cac.washington.edu/display/hyakusers/Hyak+HOWTO
https://wiki.cac.washington.edu/display/hyakusers/Hyak+mox+Overview
Using R on Slurm
http://www.arc.ox.ac.uk/content/running-r
https://wiki.cac.washington.edu/display/hyakusers/Hyak+R+programming
https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html
PBS to Slurm
https://hpc.nih.gov/docs/pbs2slurm.html
https://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html
Here are 6 main differences from ikt:
- Mox is an entirely separate cluster. They share nothing with one another.
- You only get what you ask for, regardless of the resources available on the node. If you ask for 1 CPU, you'll only get one. If you ask for 1GB of RAM, you'll only get 1GB.
- An allocation won't get the same set of nodes all the time, just access to the particular number of nodes to which they're entitled.
- No occasional preemption in ckpt (formerly bf queue) for the moment.
- Preempted jobs get 10s to do something smart before being killed and requeued.
- Please report any problems to [email protected] with Hyak as the first word in the subject. Please also let us know you're using mox not ikt.
Logging in to Hyak
Old: ssh [email protected]
New: ssh [email protected]
Common functions
https://slurm.schedmd.com/rosetta.pdf
See jobs running
All
Old: showq
New: squeue
Allocation
squeue -p csde
Personal
squeue -u kweiss2
Backfill
squeue -p ckpt
Full Hyak allocation
hyakalloc
hyakalloc xyz
Exit a mode
Old: logout
New: exit
Cancel a job
Single job (1234): scancel 1234
All jobs: scancel -u kweiss2
Copying files from ikt (old) to mox (new) on Hyak
You can copy files at high speed without a password between the Hyak systems using commands like the ones below. Here ikt is hyak classic and mox is hyak nextgen. Below xyz (csde) is your group name and abc (kweiss2) is your userid. (If you are using a non-default PATH environment variable then you can find hyakbbcp at this location /sw/local/bin/hyakbbcp .)
From ikt to mox:
File: ikt1$ hyakbbcp myfile mox1.hyak.uw.edu:/gscratch/xyz/abc/mydir
Directory: ikt1$ hyakbbcp -r mydirectory mox1.hyak.uw.edu:/gscratch/xyz/abc/mydir
For me, this would be:
ikt1$ hyakbbcp myfile mox1.hyak.uw.edu:/gscratch/csde/kweiss2/sti
ikt1$ hyakbbcp -r sti mox1.hyak.uw.edu:/gscratch/csde/kweiss2/sti
Submit Jobs
Build
Interactive build node: srun -p build --time=2:00:00 --mem=100G --pty /bin/bash
Interactive build node in own group: srun -p xyz -A xyz --time=2:00:00 --mem=100G --pty /bin/bash
Multiple nodes: srun -N 2 -p xyz -A xyz --time=2:00:00 --mem=100G --pty /bin/bash
Find names of allocated nodes: scontrol show hostnames
Batch
sbatch -p xyz -A xyz myscript.slurm
Using R
Open up an interactive build node:
srun -p build --time=2:00:00 --mem=100G --pty /bin/bash
Find available modules and load:
module avail
module load r_3.3.3
Access R: R
Update packages: update.packages()
, choose a CRAN mirror, and then say yes to all of the options