2.3 Run Batch Jobs - bu-rcs/SA-Biostatistics GitHub Wiki
Interactive jobs are suitable for testing and playing around with your code and software. But when you want to
- Run code that exceeds workstation capability (RAM, Network, Disk)
- Run code that runs for long periods of time (hours, days, weeks)
- Run code in highly parallelized formats (use 100 machines simultaneously)
- Might want to do all of those things 1000 times ...
, then submitting batch jobs is the right way to go.
Note: The limitations below apply to the shared SCC resources. Limitations on buy-In nodes are defined by their owners.
Limit | Description |
---|---|
12 hours default wall clock | Jobs on the batch nodes have a default wall clock limit of 12 hours but this can be increased, depending on the type of job. Use the qsub option -l h_rt=HH:MM:SS to ask for a higher limit. |
720 hours -serial job 720 hours – omp (1 node) job 120 hours – mpi job 48 hours – GPU job |
Single processor (serial) and omp (multiple cores all on one node) jobs can run for 720 hours, MPI jobs for 120 hours, and jobs using GPUs are limited to 48 hours. |
1000 cores | An individual user is also only allowed to have 1000 shared cores maximum simultaneously in the run state. This limit does not affect job submission. For example, you can have 1000 1-core job or 250 4-core job running simultaneoulsy. Use -pe omp N to request multiple cores (slots). |
Non-interactive batch jobs are submitted with the qsub command. The general form of the command is:
scc % qsub [options] command [arguments]
For example, to submit the printenv command to the batch system, execute:
scc % qsub printenv
Your job #jobID ("printenv") has been submitted
The output message of the qsub command will print the #jobID, which you can use to monitor the job's status within the queue.
You can use qstat command to monitor the job status.
scc % qstat -u userID
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1000001 0.10000 printenv userID r 05/19/2014 09:17:53 [email protected] 1
1000002 0.00000 printenv userID qw 05/19/2014 09:14:53 16
While the job is running the batch system creates stdout and stderr files in the job's working directory, which are named after the job with the extension ending in the job ID, for the above example printenv.o#jobID and printenv.e#jobID.
When running a program that requires arguments and passes additional options to the batch system, it quickly becomes useful to save them in a script file and submit this script as an argument to the qsub command. For example, using the following qsub_regular.sh script together with the Rcode_regular.R script will execute a simple R job:
To submit this qsub_regular.sh file to the batch system, execute:
scc % qsub qsub_regular.sh
Your job #jobID ("jobname") has been submitted
If you submit many jobs at the same time that are largely identical, you should submit them as array jobs. An array job executes multiple independent copies of the same job script. These multiple copies are referred to as "tasks" and are scheduled independently as resources become available, i.e. the tasks are not scheduled all at once. The number of tasks to be executed is set using the -t start-end[:step] option to the qsub command, where start is the index of the first task (it has to be 1 or more, it can not be 0), end is the index number of the last task, and step is an optional step size (step size defaults to 1 if unspecified). Here's an example of using this command:
scc % qsub -t 1-25 myscript.sh
The above command will submit an array job consisting of 25 tasks, numbered from 1 to 25. Since the step size was not specified, the default step size of 1 will be used. Each task will independently execute the myscript.sh job file. The batch system sets the SGE_TASK_ID environment variable, which can be used inside the script to pass the task ID to the program.
Running the qsub_arrayjob.sh
[aaa@scc4 ~]$ qsub qsub_arrayjob.sh
Your job-array 9586347.1-5:1 ("jobname") has been submitted
You can see we have submitted 5 tasks and they will run separately.
[aaa@scc4 ~]$ qstat -u userID
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
9586347 0.30000 jobname userID r 09/26/2021 21:38:27 [email protected] 1 1
9586347 0.30000 jobname userID r 09/26/2021 21:38:27 [email protected] 1 2
9586347 0.30000 jobname userID r 09/26/2021 21:38:27 [email protected] 1 3
9586347 0.30000 jobname userID r 09/26/2021 21:38:27 [email protected] 1 4
9586347 0.30000 jobname userID r 09/26/2021 21:38:27 [email protected] 1 5
Finally you will get 5 separate output files, each file contains summary stats for one of the column in iris dataset.
Attention: For Windows users, you may need to run dos2unix command on .sh before running it in order to convert DOS/MAC to UNIX text file, otherwise the job will fail. Text files on Windows are formatted with 2 hidden characters at the end of each line, while text files on Linux and OSX are formatted with on. You can check to see if a text file you transferred from Windows has the correct line endings with the file
command:
# CRLF line terminators indicates it's a Windows formatted text file.
[aaa@scc4 install]$ file my_script.sh
my_script.sh: ASCII text, with CRLF line terminators
[aaa@scc4 install]$ dos2unix my_script.sh
dos2unix: converting file my_script.sh to Unix format ...
# If file says it's ASCII text it's in Linux format:
[aaa@scc4 install]$ file my_script.sh
my_script.sh: ASCII text
Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/
Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/tracking-jobs/
Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/advanced-batch/
Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/batch-script-examples/