Using the Cluster GPU - dkoes/docs GitHub Wiki

Cluster Etiquette

When running on the cluster, remember that this is a SHARED resource. As such, you don't want to be hogging the queue.

This is typically accomplished by using a % modifier on job arrays, which sets the maximum number of jobs from said array that can be running on the cluster. Typical restrictions are 20 for GPU jobs.

Another snag concerns network load. As an example lots of rsync commands for large files will bog down the server (there is only so much bandwidth afterall). Try not to have more than 20 jobs rsyncing large files.

Launch an interactive session

srun --gres=gpu:1 --pty -p dept_gpu /bin/bash -i

This will request one GPU from the dept_gpu queue.

Launch a batch job

You should only use interactive jobs for debugging in testing. For actually running time consuming jobs you should launch a batch job as a bash script.

sbatch --gres=gpu:1 -p dept_gpu myscript.slurm

Additional configuration can be done by setting special variables within the slurm script (see below and the department cluster documentation for examples).

Queues

dept_gpu All departmental GPU nodes. Anyone can use.

any_gpu Job will be scheduled on any available GPU (including nodes in group queues), but has a time limit of 24 hours. Jobs can be preempted (killed and restarted) if the node is needed by a job in the regular queue. Use this is you have a lot of short jobs or jobs with frequent checkpoints.

Node Features

The following aren't queues, but node features that can be specified as constraints, even using boolean operators (e.g. -C "C6&M12")

M12: GPU memory >=12 GB (excludes 11GB cards)

gtx1080Ti, Volta, TitanV, etc: Specific class of video card.

Resource Requests

You can specify a specific node. Make sure that node is in the queue you are submitting to. If you specify a list of nodes your job will be allocated to all of them (not just one of them - so don't do this).

-w n154

You can also exclude hosts:

-x n079,n072

If your job requires a lot of memory, you should request a memory reservation. This prevents it from being scheduled on a node without enough memory and getting killed by the out of memory (oom) killer. If you job uses more than the requested amount of memory, it will get killed (even if there is enough memory available on the machine), so be conservative in your estimate of how much is needed.

--mem 4G

Array Jobs

Array jobs launch many jobs using the same script, changing only the SLURM_ARRAY_TASK_ID variable. These are good ways to launch large numbers of jobs without overburdening the queue system. You might use this to run three replicates of an MD simulation using the same run script. It also supports an easy pattern from running many commands listed in a text file (one per a line):

SLURM Script

#!/bin/bash
#SBATCH --job I_forgot_to_name_my_job 
#SBATCH --partition=dept_gpu
#SBATCH --nodes=1
#SBATCH --gres=gpu:1

echo Running on `hostname`
echo workdir $SLURM_SUBMIT_DIR
echo ld_library_path $LD_LIBRARY_PATH

cd $SLURM_SUBMIT_DIR

#the following sets up your environment for running caffe/gnina
export PATH=/net/pulsar/home/koes/dkoes/local/bin:$PATH
export LD_LIBRARY_PATH=/net/pulsar/home/koes/dkoes/local/lib
export PYTHONPATH=/net/pulsar/home/koes/dkoes/local/python

module load cuda

#if necessary make a scr drive and copy files to it

cmd=`sed -n "${SLURM_ARRAY_TASK_ID}p" your_cmds.txt`
eval $cmd

Command file, `your_cmds.txt`:

One command per line. Line numbers are 1-indexed.

python train.py --do-stuff
python train.py --do-stuff
...

Launch jobs 1-24 while only letting a maximum of 5 run at a time of an array job script

sbatch -a 1-24%5 your_array_job_script.slurm

Can specify individual jobs sbatch -a 12,14 your_array_job_script.slurm

Updating the maximum number of running jobs from an array

scontrol update ArrayTaskThrottle=<my maximum number of jobs> JobId=<my array id>

MD Simulation script

The following script assumes a simulation was prepared using prepareamber.py conventions and the base name of the simulation is identical to the directory it is stored in.

#!/bin/bash

#SBATCH --job I_forgot_to_name_my_jobs
#SBATCH --nodes=1
#SBATCH --partition=dept_gpu
#SBATCH --gres=gpu:1


echo Running on `hostname`
echo workdir $SLURM_SUBMIT_DIR
echo ld_library_path $LD_LIBRARY_PATH

cd $SLURM_SUBMIT_DIR

#scratch drive folder to work in
SCRDIR=/scr/${SLURM_JOB_ID}

module load amber/22

#if the scratch drive doesn't exist (it shouldn't) make it.
if [[ ! -e $SCRDIR ]]; then
        mkdir $SCRDIR
fi

chmod +rX $SCRDIR

echo scratch drive ${SCRDIR}

cp $SLURM_SUBMIT_DIR/*.in ${SCRDIR}
cp $SLURM_SUBMIT_DIR/*.prmtop ${SCRDIR}
cp $SLURM_SUBMIT_DIR/*_md2.rst ${SCRDIR}
cp $SLURM_SUBMIT_DIR/*.inpcrd ${SCRDIR}

cd ${SCRDIR}

#setup to copy files back to working dir on exit
trap "mv *md3.nc $SLURM_SUBMIT_DIR" EXIT

#run the MD, default to name of directory  CHANGE THIS IF YOUR DIRECTORY ISN'T YOUR PREFIX
prefix=${SLURM_SUBMIT_DIR##*/}
pmemd.cuda -O -i ${prefix}_md3.in -o $SLURM_SUBMIT_DIR/${prefix}_md3.out -p ${prefix}.prmtop -c ${prefix}_md2.rst -r ${prefix}_md3.rst -x ${prefix}_md3.nc -inf $SLURM_SUBMIT_DIR/mdinfo

It is common to want to run multiple simulations of the same system. This can be easily accomplished by using an array job (sbatch -a 1-3) and modifying the last line above to:

pmemd.cuda -O -i ${prefix}_md3.in -o $SLURM_SUBMIT_DIR/${prefix}_${SLURM_ARRAY_TASK_ID}_md3.out -p ${prefix}.prmtop -c ${prefix}_md2.rst -r ${prefix}_${SLURM_ARRAY_TASK_ID}_md3.rst -x ${prefix}_${SLURM_ARRAY_TASK_ID}_md3.nc -inf $SLURM_SUBMIT_DIR/mdinfo.${SLURM_ARRAY_TASK_ID}

Note this only has the desired result if the amber configuration sets initial velocities randomly (irest=0).

Checking job status

alias q='squeue -o "%10i %11P %12j %8u %2t %10M %9l %3D %3C %R"'
alias qd='squeue -u dkoes -o "%10i %11P %12j %8u %2t %10M %9l %3D %3C %R"'
alias qg='squeue -o "%10i %11P %12j %8u %2t %10M %9l %3D %3C %R" -p dept_gpu,any_gpu'
alias g='~dkoes/git/scripts/slurm_gpus.py'
alias gd='~dkoes/git/scripts/slurm_gpus.py dept_gpu'

These are my aliases to get nicely formatted and filtered queue info. q shows all the jobs in the q, qd shows only mine, and qg shows only the GPU jobs.


/net/pulsar/home/koes/dkoes/git/scripts/slurm_gpus.py

Shows per-GPU status. Can limit output to a provided queue (e.g. slurm_gpus.py dept_gpu)

Checking output

Upon completion anything printed to stdout/stderr should be copied into a file slurm-<jobid>.out. This is the first place to look for errors when jobs end abruptly.

Running Caffe

You will need to setup your local environment:

module load cuda
export PYTHONPATH=$PYTHONPATH:/net/pulsar/home/koes/dkoes/local/lib/python3.6/site-packages
export LD_LIBRARY_PATH=/net/pulsar/home/koes/dkoes/local/lib:$LD_LIBRARY_PATH

Running pytorch on g019 (A100 node)

Node g019 contains four NVIDIA A100 GPUs that are very fast and have 40gb of memory each. However, they use a newer CUDA architecture (sm_80) that requires CUDA 11. If you try to run pytorch using these GPUs when you have an older version of CUDA loaded, you will get the following error message:

A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

Then, get the approprate command from the pytorch website to update to the latest stable version for CUDA 11.1. As of writing this, that command is:

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html