Running Jobs - calab-ntu/gpu-cluster GitHub Wiki
This page covers the following topics:
Common Commands
- Submit a job:
qsub YOUR_JOB_SCRIPT, which will return a uniqueJOB_ID(e.g.,7235). See qsub documentation for details. - Query jobs:
qstat [-n],showq, node - Delete a job:
qdel JOB_ID - Remove a job dependency:
qalter -W depend=afterany JOB_ID - Interactive queue:
Eureka:qsub -X -I -lnodes=1:ppn=16 -lwalltime=4:0:0Spock:qsub -X -I -lnodes=1:ppn=32 -lwalltime=4:0:0
- Query node status:
pbsnodes NODE_NAME
Example Job Scripts
Eureka
#!/bin/bash
#PBS -N YOUR_JOB_NAME
#PBS -M YOUR_EMAIL
#PBS -m abe
#PBS -q workq
#PBS -k n
#PBS -l walltime=4:00:00
#PBS -l nodes=1:ppn=16
cd $PBS_O_WORKDIR
mpirun -map-by ppr:16:socket:pe=1 ./YOUR_EXECUTABLE 1>>stdout 2>>stderr
Spock
#!/bin/bash
#PBS -N YOUR_JOB_NAME
#PBS -M YOUR_EMAIL
#PBS -m abe
#PBS -q workq
#PBS -k n
#PBS -l walltime=4:00:00
#PBS -l nodes=1:ppn=32
cd $PBS_O_WORKDIR
mpirun -map-by ppr:32:socket:pe=1 ./YOUR_EXECUTABLE 1>>stdout 2>>stderr
echo "Terminating CUDA MPS server"
mpirun -map-by ppr:1:node:pe=1 kill_nvidia_MPS_local.sh
PBS -N→ Your job namePBS -M→ Your email addressPBS -m→ Email notificationPBS -q→ Job queue (see Job Queues)PBS -k→ Whether or not to retain standard output and/or standard error on the execution hostPBS -l walltime→ Total wall timePBS -l nodes=X:ppn=Y→ Numbers of nodes (nodes=X) and processors per node (ppn=Y)ppr:n→ LaunchnMPI processes on each nodepe=m→ Bindmthreads per MPI process
Caution:
- Always use
-ppn=16on eureka and-ppy=32on spock to request the entire node to avoid job conflicts. - Ensure
n*m=16on eureka andn*m=32on spock.
Example job scripts for running GAMER:
Job Queues
Use #PBS -q queue_name to specify a queue.
| Queue Name | Property |
|---|---|
| workq | Daily usage (default queue) |
| stableq | Simulations requiring high stability |
| unstableq | Unstable nodes (use with care) |
| allq | All nodes |
| rtx3080tiq | A single node (eureka32) with an NVIDIA GeForce RTX 3080 Ti GPU (set GPU_ARCH=AMPERE in GAMER) |
| gtx1080tiq | A single node (eureka33) with two NVIDIA GeForce GTX 1080 Ti GPUs (set GPU_ARCH=PASCAL in GAMER) |
| maintenanceq | Maintenance only |
Monitoring Jobs
node: Monitor node