Submitting DBM jobs efficiently with Niagara running on reduced capacity - CoBrALab/documentation GitHub Wiki
Purpose: This page is a general guide on how to test, time, and schedule DBM jobs on Niagara. The gist is, by breaking your jobs into smaller “chunks”, you can often take advantage of backfill scheduling to reduce queue wait times. This applies all the time (and is especially useful while Niagara is running on reduced capacity).
-
Start an Interactive Debug Session
Command:
debugjob
This will allocate you one machine for one hour in interactive mode. You can experiment with time and memory usage without waiting in the normal job queue. Use this to run quick tests to see how long a single registration step might take. It eliminates guesswork when setting walltime or memory limits for your real jobs.
-
Set Up Your Environment Load Modules or Activate Environment using commands like:
module load cobralab/2019b
orsource /path/to/your/env/bin/activate
If needed, configure any paths or environment variables for CobraLab tools. Check for the existence of the scripts needed for your DBM. -
Manually Run One Registration
First, identify your:
- Fixed image (fixed_image.nii.gz or your template)
- Moving image (subject_image.nii.gz)
- Masks (e.g., starting_target_mask.nii.gz, subject_mask.nii.gz), if you use masks to refine registration.
Then, run (example command; adjust to suit your DBM pipeline.):
/usr/bin/time -v \ antsRegistration_affine_SyN.sh \ --fixed-mask starting_target_mask.nii.gz \ --moving-mask subject_mask.nii.gz \ subject.nii.gz dorr.nii.gz \ /tmp/prefix
After the run, the parameter time -v
or Elapsed (wall clock) time
will provide elapsed (wall-clock) time and peak memory usage.
- Check Elapsed Time & Memory
- Review the
time -v
output: Elapsed (wall clock) time
: Approx. how long one registration takes.Maximum resident set size
: Peak memory usage.
-
Adjust Qbatch Environment Variables & Chunksize Qbatch can submit multiple registrations in batches. Environment variables (like
QBATCH_CHUNKSIZE
) control how many subjects per chunk. Smaller chunks → shorter individual jobs → better chance of fitting into backfill. If you putQBATCH_CHUNKSIZE=1
, it means that Qbatch will split your job to the smallest indivisible jobs, which are each registration/DBM run. The maximum number of jobs to be submitted on Niagara is 1000, so you have to make sure that your total number of jobs is less than 1000.e.g., Let's say
Elapsed (wall clock) time
tells you 10 minutes. If you have 15 subjects and you're running 4 rounds of nonlinear registrations, because 15 * 4 is far less than 1000, the wall time for each run should be around 10 minutes. If we want to leave some room for uncertainties, your wall time can be 15 minutes. When submitting your DBM job, you should putQBATCH_CHUNKSIZE=1
andwalltime-nonlinear 00:15:00
e.g., Let's say
Elapsed (wall clock) time
tells you 15 minutes and you have 1500 subjects and you're sunning 6 rounds of registrations. Your total number of jobs will be 1500 * 6 =9000, which is 9 times of Niagara's limit. This means you have to increase your chunksize to 9. While increasing the chunksize, you also need to multiply the time elapsed for one single job by the new chunksize 15*9=135 minutes. Also remember to leave some room for uncertainties here, for example, 170 minutes. Therefore, when finally submitting your DBM job, you should putQBATCH_CHUNKSIZE=9
andwalltime-nonlinear 02:50:00
.
Always request the resources you actually need—too little and your jobs fail, too much and you may be queued longer.