Slurm specific options - MorrellLAB/sequence_handling GitHub Wiki

Slurm Specific Features

sequence_handling is now compatible with the Slurm workload manager. Many of the handlers (e.g., Adapter_Trimming, Read_Mapping, SAM_Processing, Haplotype_Caller, Genomics_DB_Import, Genotype_GVCFs, etc. utilize job arrays on HPC systems (e.g., MSI) that support Slurm to speed up the overall processing time. For these handlers, if some job array indices fail/time out, we can identify which ones failed/timed out, increase the resource request in the Config file appropriately, and resubmit them easily using the -t flag.

This get_re-run_array_indices.sh helper script takes in a Slurm JobID number and generates a formatted list of job array indices to be re-run. Here's an example. Let's say we ran the Read_Mapping handler on 50 samples (array indices 0-49 was originally run), but some of them timed out and need to be re-run with increased walltime. We can get a list of formatted array indices to be re-run with the following helper script and give it the JobID number.

# Run from inside the main sequence_handling directory
# 146744177 is the Slurm JobID for this example
./HelperScripts/get_re-run_array_indices.sh 146744177

If the following array indices timed out 1 2 3 5 6 7 9 11 12 13, the command run above would output the following formatted "re-run" array indices:

1-3,5-7,9,11-13

The user can then take this formatted array indices, increase the walltime in their config file, and re-submit the handler adding the -t flag followed by the formatted "re-run" array indices.

./sequence_handling Read_Mapping /path/to/Config -t 1-3,5-7,9,11-13