2.3 Run Batch Jobs - bu-rcs/SA-Biostatistics GitHub Wiki

Why running batch jobs?

Interactive jobs are suitable for testing and playing around with your code and software. But when you want to

  • Run code that exceeds workstation capability (RAM, Network, Disk)
  • Run code that runs for long periods of time (hours, days, weeks)
  • Run code in highly parallelized formats (use 100 machines simultaneously)
  • Might want to do all of those things 1000 times ...

, then submitting batch jobs is the right way to go.

Job run time limits

Note: The limitations below apply to the shared SCC resources. Limitations on buy-In nodes are defined by their owners.

Limit Description
12 hours default wall clock Jobs on the batch nodes have a default wall clock limit of 12 hours but this can be increased, depending on the type of job. Use the qsub option -l h_rt=HH:MM:SS to ask for a higher limit.
720 hours -serial job
720 hours – omp (1 node) job
120 hours – mpi job
48 hours – GPU job
Single processor (serial) and omp (multiple cores all on one node) jobs can run for 720 hours, MPI jobs for 120 hours, and jobs using GPUs are limited to 48 hours.
1000 cores An individual user is also only allowed to have 1000 shared cores maximum simultaneously in the run state. This limit does not affect job submission. For example, you can have 1000 1-core job or 250 4-core job running simultaneoulsy. Use -pe omp N to request multiple cores (slots).

How to submit a batch Job

Non-interactive batch jobs are submitted with the qsub command. The general form of the command is:

scc % qsub [options] command [arguments]

For example, to submit the printenv command to the batch system, execute:

scc % qsub printenv
Your job #jobID ("printenv") has been submitted

The output message of the qsub command will print the #jobID, which you can use to monitor the job's status within the queue.

You can use qstat command to monitor the job status.

scc % qstat -u userID
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1000001 0.10000 printenv userID       r     05/19/2014 09:17:53 [email protected]               1
1000002 0.00000 printenv userID       qw    05/19/2014 09:14:53                                    16

While the job is running the batch system creates stdout and stderr files in the job's working directory, which are named after the job with the extension ending in the job ID, for the above example printenv.o#jobID and printenv.e#jobID.

Example of submitting a regular R job

When running a program that requires arguments and passes additional options to the batch system, it quickly becomes useful to save them in a script file and submit this script as an argument to the qsub command. For example, using the following qsub_regular.sh script together with the Rcode_regular.R script will execute a simple R job:

Rcode_regular.R

qsub_regular.sh

To submit this qsub_regular.sh file to the batch system, execute:

scc % qsub qsub_regular.sh
Your job #jobID ("jobname") has been submitted

How to submit an array job

If you submit many jobs at the same time that are largely identical, you should submit them as array jobs. An array job executes multiple independent copies of the same job script. These multiple copies are referred to as "tasks" and are scheduled independently as resources become available, i.e. the tasks are not scheduled all at once. The number of tasks to be executed is set using the -t start-end[:step] option to the qsub command, where start is the index of the first task (it has to be 1 or more, it can not be 0), end is the index number of the last task, and step is an optional step size (step size defaults to 1 if unspecified). Here's an example of using this command:

scc % qsub -t 1-25 myscript.sh

The above command will submit an array job consisting of 25 tasks, numbered from 1 to 25. Since the step size was not specified, the default step size of 1 will be used. Each task will independently execute the myscript.sh job file. The batch system sets the SGE_TASK_ID environment variable, which can be used inside the script to pass the task ID to the program.

Example of passing SGE_TASK_ID to R script

Rcode_arrayjob.R

qsub_arrayjob.sh

Running the qsub_arrayjob.sh

[aaa@scc4 ~]$ qsub qsub_arrayjob.sh
Your job-array 9586347.1-5:1 ("jobname") has been submitted

You can see we have submitted 5 tasks and they will run separately.

[aaa@scc4 ~]$ qstat -u userID
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
9586347 0.30000 jobname    userID         r     09/26/2021 21:38:27 [email protected]             1 1
9586347 0.30000 jobname    userID         r     09/26/2021 21:38:27 [email protected]             1 2
9586347 0.30000 jobname    userID         r     09/26/2021 21:38:27 [email protected]             1 3
9586347 0.30000 jobname    userID         r     09/26/2021 21:38:27 [email protected]             1 4
9586347 0.30000 jobname    userID         r     09/26/2021 21:38:27 [email protected]             1 5

Finally you will get 5 separate output files, each file contains summary stats for one of the column in iris dataset.

Attention: For Windows users, you may need to run dos2unix command on .sh before running it in order to convert DOS/MAC to UNIX text file, otherwise the job will fail. Text files on Windows are formatted with 2 hidden characters at the end of each line, while text files on Linux and OSX are formatted with on. You can check to see if a text file you transferred from Windows has the correct line endings with the file command:

# CRLF line terminators indicates it's a Windows formatted text file.
[aaa@scc4 install]$ file my_script.sh 
my_script.sh: ASCII text, with CRLF line terminators
[aaa@scc4 install]$ dos2unix my_script.sh 
dos2unix: converting file my_script.sh to Unix format ...
# If file says it's ASCII text it's in Linux format:
[aaa@scc4 install]$ file my_script.sh 
my_script.sh: ASCII text





Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/

Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/tracking-jobs/

Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/advanced-batch/

Ref: https://www.bu.edu/tech/support/research/system-usage/running-jobs/batch-script-examples/

⚠️ **GitHub.com Fallback** ⚠️