How to Submit Jobs - ucsf-wynton/tutorials GitHub Wiki
For any non-trivial computational tasks, you will want to submit them to the queueing system as jobs. The job scheduler on Wynton will automatically find and allocate compute nodes for your job. Note: whether you submit them from a login node or a development node, jobs are always run on compute nodes.
There area a number of parameters that you should specify when submitting a job, including the maximum memory and maximum runtime. Sometimes it is necessary to specify the current working directory as well. Note: if your lab or institute has purchased dedicated compute nodes your account should automatically be associated with a dedicated queue called the member.q, which will ensure that your jobs run promptly.
qsub -l h_rt=00:01:00 -l mem_free=1G my_job.sge
-
-l h_rt
= maximum runtime (hh:mm:ss or seconds) -
-l mem_free
= maximum memory (K|M|G)
- Current status:
qstat
orqstat -j 191442
(replace job id) - After successful job:
grep "usage" my_job.sge.o2854740
(replace output file name) - After failed job:
tail -100000 /opt/sge/wynton/common/accounting | qacct -f - -j 191442
(replace job id)
- For intensive jobs during busy times, you can reserve resources for your job as soon as they become available by including this parameter,
-R y
- Compute nodes DO NOT have access to the internet, i.e., you can not run jobs that include steps like downloading files from online resources. Dev nodes DO have access to the internet. If your script or pipeline requires access to the internet, consider splitting up the work (this is how nextflow works, by the way): Run a script on a dev node that retrieves online files and then submits jobs to be run on compute nodes. Viola!
- Similiar to the previous point, you can run cron jobs on a dev node to periodically download files separate from compute-heavy jobs that can be submitted to compute nodes.
Check the status of Wynton HPC and the availability of compute nodes: Wynton Status Charts
To check dedicated compute resources...
- Availability of cores for your member.q, e.g.,
gladstone
qquota -u "*" | grep gladstone
- Availability of disk space for your group members, e.g.,
gladstone
beegfs-ctl --getquota --gid gladstone
There are two job schedulers you might come across on Wynton:
- SGE/UGE (until Q3 2020) ← current option on Wynton
- Slurm (starting Q3 2020)
Basic scheduler commands:
Action | SGE/UGE | Slurm |
---|---|---|
Job submission | qsub [script_file] | sbatch [script_file] |
Job deletion | qdel [job_id] | scancel [job_id] |
Job status by job | qstat -j job_id | squeue [job_id] |
Job status by user | qstat [-u user_name] | squeue -u [user_name] |
Job hold | qhold [job_id] | scontrol hold [job_id] |
Job release (from job on hold) | qrls [job_id] | scontrol release [job_id] |
Job specific commands:
Action | SGE/UGE | Slurm |
---|---|---|
Script directive | #$ | #SBATCH |
queue | -q [queue] | -p [queue] |
count of nodes | N/A | -N [min[-max]] |
CPU count | -pe [PE] [count] | -n [count] |
Wall clock limit | -l h_rt=[seconds] | -t [min] OR -t [days-hh:mm:ss] |
Standard out file | -o [file_name] | -o [file_name] |
Standard error file | -e [file_name] | -e [file_name] |
Combine STDOUT & STDERR files | -j yes | (use -o without -e) |
Copy environment | -V | --export=[ALL|NONE|variables] |
Event notification | -m abe | --mail-type=[events] |
send notification email | -M [address] | --mail-user=[address] |
Job name | -N [name] | --job-name=[name] |
Restart job | -r [yes|no] | --requeue OR --no-requeue (NOTE: configurable default) |
Set working directory | -wd [directory] Alternatively set current working directory as directory: -cwd | --workdir=[dir_name] |
Resource sharing | -l exclusive | --exclusive OR--shared |
Memory size | -l mem_free=[memory][K|M|G] | --mem=[mem][M|G|T] OR --mem-per-cpu=[mem][M|G|T] |
Job dependancy | -hold_jid [job_id | job_name] | --depend=[state:job_id] |
Generic Resources | -l [resource]=[value] | --gres=[resource_spec] |
I want to submit a script, "my_script.sh", to the queue "my_lab", requesting 4GB of memory and 10 cores. I expect my job to run for no more than 20 minutes, and I need 5GB of scratch space.
login_server $ qsub my_script.sh
login_server $ qsub -q my_lab -l mem_free=4g -l h_rt=00:20:00 -l scratch=5g -pe smp 10 my_script.sh
Breaking that down:
-
qsub
= submit jobs -
-q my_lab
= selects the specified queue: 'my_lab' -
-l [resources]
= specific resources requested, 4G of memory available, 20 minute run time, 5G of /scratch available on node -
-pe [Parallel Environment] [Core count]
= requesting 10 cores- Note: Wynton has multiple parallel environments, read more about them here: https://ucsf-hpc.github.io/wynton/scheduler/submit-jobs.html#parallel-processing-on-a-single-machine
-
my_script.sh
= your job
- Check the error log: unless otherwise specified, this will be in your directory you launched your job from and will be formatted as job script name followed by .e
- Check the output log: unless otherwise specified, this will be in your directory you launched your job from and will be formatted as job script name followed by .o
- Send an email to [email protected] for quick responses to Wynton questions.