Running arbitrary voxelwise functions in parallel (qMincApply) - CoBrALab/documentation GitHub Wiki
Defining and running arbitrary functions across all voxels in parallel
There may come a time when you want to run a function across all the voxels in your data and there is no built-in function in RMINC to accomplish this. Some functions, such as mean and standard deviation are already built-in statistics that can be applied (see here: https://mouse-imaging-centre.github.io/RMINC/reference/mincSummary.html), but what if you want to calculate the mean, add one, and then raise it to the power of 15 or something silly like that?
The RMINC function qMincApply (https://mouse-imaging-centre.github.io/RMINC/reference/qMincApply.html) is the perfect function for you! It allows you to pass in a list of filenames (which point to minc files), perform a function on each voxel in parallel, and output the results. This output can be used in R or saved as a minc file.
For this function, you will need:
library(RMINC)
library(batchtools)
Set up your csv as you would for running voxelwise linear models or linear mixed effects models in R with RMINC. Specifically you need, at the very least a csv with one column that includes paths to each minc file you want to use. For this example, the dataframe is called data and the column with the mincs is called file. You'll also need to define a variable that points to a mask in minc format.
# load in data
data <- read.csv("mncl_full_smooth_jacs.csv")
averagemask = "manual_mask.mnc"
The next thing you have to do is define your function. the function below, called my_fun takes as input a list of voxels, calculates the mean, raises it to the 15th power, then returns the output.
my_fun = function(vox_dat){
mew = mean(vox_dat)
out = (mew+1)^15
return(out)
}
After defining the function, you need to call it in qMincApply. Here we pass qMincApply several important variables:
data$file : this is the data frame specifying the column that has the paths to the minc files.
fun = my_fun : This points to the function we previously defines
mask = averagemask : This specifies the mask within which we would like to include the voxels in our analysis.
wait = FALSE : This is an important thing to include when running the function on Trillium/scinet. If you exclude it, jobs will be submitted to the cluster, then qMincApply will try to reduce the outputs of these jobs to a collated object in R, even if the jobs haven't finished running yet! by setting wait = FALSE, we allow the jobs to be queued and run. By default, this command will create 4 jobs of 15 minutes each. For bigger jobs, see the section below on Timing and Scheduling qMincApply Jobs.
qMincApply(data$file, fun = my_fun, mask = averagemask, wait = FALSE)
If you are on Trillium/scinet, after you submit the jobs with qMincApply, you can check their status in a terminal with sq. Once the jobs have finished, run the following code to automatically reduce and collate the output into an object:
qminc_test <- qMincReduce(batchtools::getDefaultRegistry(), collate = simplify2minc)
This code creates an object called qminc_test. If you look at its structure, you should see something like the following:
str(qminc_test)
'mincSingleDim' Named num [1:5429424] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "names")= chr [1:5429424] "inds" "vals" NA NA ...
- attr(*, "filenames")= chr [1:165] "resample_dbm_full_smooth_mncl/MCH_NEO_THC_01_01_10_1-2-1_fix_lsq6_manual_extracted_fwhm_4vox.mnc" "resample_dbm_full_smooth_mncl/MCH_NEO_THC_01_01_03_1-2-1_fix_lsq6_manual_extracted_fwhm_4vox.mnc" "resample_dbm_full_smooth_mncl/MCH_NEO_THC_01_01_05_1-2-1_fix_lsq6_manual_extracted_fwhm_4vox.mnc" "resample_dbm_full_smooth_mncl/MCH_NEO_THC_01_01_07_1-2-1_fix_lsq6_manual_extracted_fwhm_4vox.mnc" ...
- attr(*, "likeVolume")= chr "resample_dbm_full_smooth_mncl/MCH_NEO_THC_01_01_10_1-2-1_fix_lsq6_manual_extracted_fwhm_4vox.mnc"
- attr(*, "mask")= chr "/scratch/lanicupo/neonate_power/mncl2_investigation/voxelwise/data/pMincMask429ee7588d27c.mnc"
It is a minc object and can be written out into a minc like this:
mincWriteVolume(qminc_test, "my_fun_minc.mnc", like.filename = averagemask)
If you are having any problems with qMincReduce, see the section below on some useful tools.
Now, we can visualize it in tools like Display or register from the terminal:
Timing and Scheduling qMincApply Jobs on Trillium
By default, the walltime requested by qMincApply is 15 minutes, but if you want to increase it, you can do so by setting the resources parameter.
qMincApply(data$file,
fun = my_fun,
mask = averagemask,
# Set walltime as 1 hours (3600 seconds)
resources = list(walltime = 3600),
wait = FALSE)
As noted in the comment, the time is measured in seconds, so if you want to give your job an hour, give it 3600 seconds (60 minutes X 60 seconds = 3600). If you want to give it 1:30, set it to 5400, and so forth.
As usual, each computer has 192 cores that can be used to run commands (in this case voxels) in parallel. You can explicitly set this with the cores command, as follows:
qMincApply(data$file,
fun = my_fun,
mask = averagemask,
#set number of cores
cores = 192,
resources = list(walltime = 3600),
wait = FALSE)
The number of batches equates to number of jobs, to submit. But how do you know how many jobs you need? This will depend on the dimensions of the mask you are using in the function.
In a terminal, load the relevant packages (cobralab on Compute Canada or minc-toolkit-v2 on the CIC). Then run:
mincbbox -minccrop mask.mnc
Your output will look something like this:
-xlim 30 123 -ylim 51 173 -zlim 36 130
This tells us the extent of the contents of our mask, so to know its length in each dimension, we need to subtract the smaller from the larger number to get:
x-length: 93
y-length: 122
z-length: 94
qMincApply will slice the data up along one of these dimensions and then package all of the voxels in this slice into jobs for processing. It's important to know what the size of the dimensions are for timing your jobs. For example, you may want to tell it to slice the dimensions along the y-axis so that each slice comprises at most 94 x 93 voxels (at most because some areas will have fewer voxels, such as at the olfactory bulbs). You can control which dimension qMincApply slices along with the slab_sizes argument. This argument wants a vector specifying the dimensions to be taken for each direction. You should set the dimension you want it to slice along as 1 and the others as the extent of the contents of the mask, as calculated before. You can also explicitly control the number of batches (jobs) to be equal to the number of slices in the dimension you are slicing along (122) to ensure that each job will include only the voxels from that slice. See the example below:
qMincApply(data$file,
fun = my_fun,
mask = averagemask,
#set the number of batches to the dimension of the data you are slicing along.
batches = 122,
#set slab sizes to slice along the y-dimension (second)
slab_sizes = c(93, 1, 94),
cores = 192,
resources = list(walltime = 3600),
wait = FALSE)
So, now we know that each slice will have 93x94 (8742) voxels and we can time it appropriately. Each job will be submitted to a computer in the Trillium cluster to run, and each computer has 192 cores. That means that 192 voxels can run in parallel on a computer. Take the total number of voxels in the job (8742) and divide them by the 192 cores (assuming the function you wrote has no implicit parallelization). For us, this is 45.5. Rounded up to 46, this means that each job needs at most 46 x T, where T is the amount of time it takes to run one (or 192) voxels.
But how do you know how long it will take for one voxel to run?
One option is to create a mask that is a single voxel, set your batches to 1, and run the function. You can create a 1-voxel mask in R as follows:
library(RMINC)
#load in your mask
mask_array = mincArray(mincGetVolume("manual_mask.mnc"))
# in the mask array, identify all the locations where the mask is 1
ones <- which(mask_array == 1)
# choose however many you want to keep in the testing (in this case, 1)
# This will set it randomly
keep <- sample(ones, 1)
# set all others to 0
mask_array[-keep] <- 0
#write it out.
mincWriteVolume(mask_array , "mask_1_vox.mnc")
Then, you can call this mask in your qMincApply function:
mask_1 = "mask_1_vox.mnc"
qMincApply(data$file,
fun = my_fun,
mask = mask_1,
batches = 1,
wait = FALSE)
Now you can see how long that single voxel runs.
Some useful tools.
You can manually set the registry you want to load as follows:
reg <- batchtools::loadRegistry("/PATH/TO/REGISTRY/qMincApply_registry001")
You should see the name of the registry when you make your initial qMincApply call.
Then you can check the status like this:
batchtools::getStatus(reg = reg)
Status for 4 jobs at 2026-04-14 14:40:52:
Submitted : 4 (100.0%)
-- Queued : 0 ( 0.0%)
-- Started : 4 (100.0%)
---- Running : 0 ( 0.0%)
---- Done : 0 ( 0.0%)
---- Error : 4 (100.0%)
---- Expired : 0 ( 0.0%)
In this case, you can see that 4 jobs were submitted and all of them experienced errors.
You can read the errors like this:
batchtools::showLog(reg, id = 1)