Slurm: Job Arrays - nthu-ioa/cluster GitHub Wiki

(See the SLURM page for general information about the Slurm batch queue system)


Slurm job arrays are useful for running many similar instances of a single program. Each instance has a unique 'index' that it can access through environment variables and wildcards in sbatch commands.

Grouping jobs into arrays is more efficient and easier to control than using your own loop to submit multiple individual instances of a jobscript. For example, you can easily cancel all the jobs at once.

The basic syntax for a 10-element array job is:

#SBATCH --array=0-9

Of course, you could also call sbatch as sbatch --array=0-9, rather than put this line in the jobscript. The indices can be anything you like, for example:

sbatch --array=88,99,888,999

Access the job ID and array index within other sbatch commands using wildcards. For example, to write a separate log file for each element of the array, give the logs unique names as follows:

#SBATCH -o job.%A.%a.out
#SBATCH -e job.%A.%a.err

%A is the job array's master job allocation number (same for all elements). %a is the job array ID (index) number (different for each element).

For the definitions of other wildcards, see "filename pattern" on this page: https://slurm.schedmd.com/sbatch.html.

Environment variables (e.g. SLURM_ARRAY_JOB_ID, SLURM_ARRAY_TASK_ID) provide these numbers outside sbatch, so you can pass them, for example, as arguments to your code. See https://slurm.schedmd.com/job_array.html for details, especially the section "Job ID and Environment Variables".

TIP: You can limit your array jobs to only run a given number of elements simultaneously (this will be much appreciated by other users if you are spawning 1000s of small, short-running jobs that would fill the queue). Add %N to the end of the array range, where N is the number of simultaneous jobs to allow. For example

--array=100-200%4

allows only 4 jobs in the array to run simultaneously.

See also: