How to Submit Jobs - ucsf-wynton/tutorials GitHub Wiki

For any non-trivial computational tasks, you will want to submit them to the queueing system as jobs. The job scheduler on Wynton will automatically find and allocate compute nodes for your job. Note: whether you submit them from a login node or a development node, jobs are always run on compute nodes.

There area a number of parameters that you should specify when submitting a job, including the maximum memory and maximum runtime. Sometimes it is necessary to specify the current working directory as well. Note: if your lab or institute has purchased dedicated compute nodes your account should automatically be associated with a dedicated queue called the member.q, which will ensure that your jobs run promptly.

Example job submission

qsub -l h_rt=00:01:00 -l mem_free=1G my_job.sge
  • -l h_rt = maximum runtime (hh:mm:ss or seconds)
  • -l mem_free = maximum memory (K|M|G)

Checking on a job

  • Current status: qstat or qstat -j 191442 (replace job id)
  • After successful job: grep "usage" my_job.sge.o2854740 (replace output file name)
  • After failed job: tail -100000 /opt/sge/wynton/common/accounting | qacct -f - -j 191442 (replace job id)

Additional tips...

  • For intensive jobs during busy times, you can reserve resources for your job as soon as they become available by including this parameter, -R y
  • Compute nodes DO NOT have access to the internet, i.e., you can not run jobs that include steps like downloading files from online resources. Dev nodes DO have access to the internet. If your script or pipeline requires access to the internet, consider splitting up the work (this is how nextflow works, by the way): Run a script on a dev node that retrieves online files and then submits jobs to be run on compute nodes. Viola!
  • Similiar to the previous point, you can run cron jobs on a dev node to periodically download files separate from compute-heavy jobs that can be submitted to compute nodes.

Resource Availability

Check the status of Wynton HPC and the availability of compute nodes: Wynton Status Charts

To check dedicated compute resources...

  • Availability of cores for your member.q, e.g., gladstone
qquota -u "*" | grep gladstone
  • Availability of disk space for your group members, e.g., gladstone
beegfs-ctl --getquota --gid gladstone

Scheduling Jobs on Gladstone Clusters

There are two job schedulers you might come across on Wynton:

  • SGE/UGE (until Q3 2020) ← current option on Wynton
  • Slurm (starting Q3 2020)

Basic scheduler commands:

Action SGE/UGE Slurm
Job submission qsub [script_file] sbatch [script_file]
Job deletion qdel [job_id] scancel [job_id]
Job status by job qstat -j job_id squeue [job_id]
Job status by user qstat [-u user_name] squeue -u [user_name]
Job hold qhold [job_id] scontrol hold [job_id]
Job release (from job on hold) qrls [job_id] scontrol release [job_id]

Job specific commands:

Action SGE/UGE Slurm
Script directive #$ #SBATCH
queue -q [queue] -p [queue]
count of nodes N/A -N [min[-max]]
CPU count -pe [PE] [count] -n [count]
Wall clock limit -l h_rt=[seconds] -t [min] OR -t [days-hh:mm:ss]
Standard out file -o [file_name] -o [file_name]
Standard error file -e [file_name] -e [file_name]
Combine STDOUT & STDERR files -j yes (use -o without -e)
Copy environment -V --export=[ALL|NONE|variables]
Event notification -m abe --mail-type=[events]
send notification email -M [address] --mail-user=[address]
Job name -N [name] --job-name=[name]
Restart job -r [yes|no] --requeue OR --no-requeue (NOTE: configurable default)
Set working directory -wd [directory] Alternatively set current working directory as directory: -cwd --workdir=[dir_name]
Resource sharing -l exclusive --exclusive OR--shared
Memory size -l mem_free=[memory][K|M|G] --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T]
Job dependancy -hold_jid [job_id | job_name] --depend=[state:job_id]
Generic Resources -l [resource]=[value] --gres=[resource_spec]

Example Submissions

I want to submit a script, "my_script.sh", to the queue "my_lab", requesting 4GB of memory and 10 cores. I expect my job to run for no more than 20 minutes, and I need 5GB of scratch space.

login_server $ qsub my_script.sh
login_server $ qsub -q my_lab -l mem_free=4g -l h_rt=00:20:00 -l scratch=5g -pe smp 10 my_script.sh

Breaking that down:

How to troubleshoot

  • Check the error log: unless otherwise specified, this will be in your directory you launched your job from and will be formatted as job script name followed by .e
  • Check the output log: unless otherwise specified, this will be in your directory you launched your job from and will be formatted as job script name followed by .o
  • Send an email to [email protected] for quick responses to Wynton questions.
⚠️ **GitHub.com Fallback** ⚠️