FMRIPREP preprocessing - GarzaLab/Documentation GitHub Wiki
Tutorial
How to run a minimal preprocessing pipeline and extract the nuissance confounds with FMRIPREP in the LAVIS' ADA HPCC.
Requirements
Singularity image
In order to run fmriprep
inside ADA, a singularity image of fmriprep
must be available for use inside the cluster.
Because the version of Singularity
installed in ADA is old (<2.5), the user needs to have created and uploaded an image to an accessible location. This can be done with the tool docker2singularity.
For this tutorial, the path of the singularity image is:
/mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img
This image should be available to any member of the egarza
group of ADA.
BIDS
FMRIPREP
requires that the input data are organized according to the BIDS standard.
Even though fmriprep
comes with its own validator. It's recommended to run a BIDS validator separately before running the script.
Data
For this tutorial, I'll be using a BIDS dataset of three subjects with two sessions of T1w, BOLD and fieldmap images:
Data
├── participants.tsv
├── sub-001
│ ├── ses-t0
│ │ ├── anat
│ │ │ ├── sub-001_ses-t0_T1w.json
│ │ │ └── sub-001_ses-t0_T1w.nii.gz
│ │ ├── fmap
│ │ │ ├── sub-001_ses-t0_epi.json
│ │ │ └── sub-001_ses-t0_epi.nii.gz
│ │ └── func
│ │ ├── sub-001_ses-t0_task-rest_bold.json
│ │ └── sub-001_ses-t0_task-rest_bold.nii.gz
│ └── ses-t1
│ ├── anat
│ │ ├── sub-001_ses-t1_T1w.json
│ │ └── sub-001_ses-t1_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-001_ses-t1_epi.json
│ │ └── sub-001_ses-t1_epi.nii.gz
│ └── func
│ ├── sub-001_ses-t1_task-rest_bold.json
│ └── sub-001_ses-t1_task-rest_bold.nii.gz
├── sub-002
│ ├── ses-t0
│ │ ├── anat
│ │ │ ├── sub-002_ses-t0_T1w.json
│ │ │ └── sub-002_ses-t0_T1w.nii.gz
│ │ ├── fmap
│ │ │ ├── sub-002_ses-t0_epi.json
│ │ │ └── sub-002_ses-t0_epi.nii.gz
│ │ └── func
│ │ ├── sub-002_ses-t0_task-rest_bold.json
│ │ └── sub-002_ses-t0_task-rest_bold.nii.gz
│ └── ses-t1
│ ├── anat
│ │ ├── sub-002_ses-t1_T1w.json
│ │ └── sub-002_ses-t1_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-002_ses-t1_epi.json
│ │ └── sub-002_ses-t1_epi.nii.gz
│ └── func
│ ├── sub-002_ses-t1_task-rest_bold.json
│ └── sub-002_ses-t1_task-rest_bold.nii.gz
└── sub-003
├── ses-t0
│ ├── anat
│ │ ├── sub-003_ses-t0_T1w.json
│ │ └── sub-003_ses-t0_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-003_ses-t0_epi.json
│ │ └── sub-003_ses-t0_epi.nii.gz
│ └── func
│ ├── sub-003_ses-t0_task-rest_bold.json
│ └── sub-003_ses-t0_task-rest_bold.nii.gz
└── ses-t1
├── anat
│ ├── sub-003_ses-t1_T1w.json
│ └── sub-003_ses-t1_T1w.nii.gz
├── fmap
│ ├── sub-003_ses-t1_epi.json
│ └── sub-003_ses-t1_epi.nii.gz
└── func
├── sub-003_ses-t1_task-rest_bold.json
└── sub-003_ses-t1_task-rest_bold.nii.gz
SGE script
We will be using the multi-job functionality of the Sun Grid Engine environment of the ADA cluster.
For this, a master bash script will be used in order to call the FMRIPREP
singularity image in parallel jobs, one for each subject.
Shebang
It's good practice to always start any script with a shebang. In this case our script will be written in bourne again shell (bash), so the first line should be:
#! /bin/bash
qsub
arguments
SGE allows us to insert the qsub
arguments inside the script appending them with #$
.
The next part of the script should contain the job parameters we want to use, for example:
- The shell used:
#$ -S /bin/bash
. - The name of the job:
#$ -N FMRIPREP
- Export environmental variables:
#$ -V
- Set the amount of memory used:
#$ -l mem_free=16G
- Set the parallel environment:
#$ -pe openmp 3
- Join stdout and stderr to the same file:
#$ -j y
- Set the path for the log:
#$ -wd /mnt/MD1200B/egarza/sfernandezl/logs
Modules
In order to use Singularity, we have to load the module in the script. The version installed in ADA is 2.2:
module load singularity/2.2
Freesurfer license
FMRIPREP requires the user to have a valid freesurfer
license. For the program to run correctly, the path to the location of the license should be exported as an environmental variable.
export FS_LICENSE=/mnt/MD1200B/egarza/sfernandezl/freesurferLicense/license.txt
Environmental variables
I'm going to save the path to the singularity image and the participants list as environmental variables. For the job parallelization, with an awk script, I will extract the first column of every row of the participants list for every job, and use that as the participant label in each case.
IMAGE=/mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img
SUB_LIST=/mnt/MD1200B/egarza/sfernandezl/Data/participants.tsv
SGE_INDEX=$(awk -v F='\t' -v OFS='\t' -v SUB_INDEX=$(($SGE_TASK_ID + 1)) 'NR==SUB_INDEX {print $1}' $SUB_LIST)
Random sleep
In our lab we're accustomed to inserting a small sleep so that jobs don't start at exactly the same time:
sleep $(($SGE_TASK_ID % 10))
Singularity call
Finally, with all set and done, at the end of the script we will insert the call to the singularity image with all of the FMRIPREP arguments.
The singularity container created for the jobs needs to have access to our files, so we have to link a location of the cluster inside the container. This can be done with the -B
option.
In this case, we have all of our data in /Data
, the output will be saved in a derivatives
subdirectory according to BIDS, and the temporal files in /tmp
. All of the directories ought to be created beforehand.
singularity run -B /mnt:/mnt \
$IMAGE \
/mnt/MD1200B/egarza/sfernandezl/Data \
/mnt/MD1200B/egarza/sfernandezl/Data/derivatives \
participant \
--participant-label ${SGE_INDEX} \
--resource-monitor \
--write-graph \
--work-dir /mnt/MD1200B/egarza/sfernandezl/tmp
--output-spaces T1w \
--longitudinal;
For more information on each of these arguments, feel free to read the FMRIPREP's documentation.
Saving the script
According to BIDS, all of the scripts can be in a subdirectory called code
; nonetheless, you are free to save it wherever you want inside the cluster. I like to distinguish my scripts for SGE with the .SGE
extension, but you're free to save it as a .sh
file.
This script is saved in /Data/code/fmriprepADA.SGE
Submitting the job
Once we have the script finished and saved, we're ready to submit the job. For this, we're going to use the qsub
program.
In our call to our program, we're going to ask for three different jobs (one for each participant, independently of the number of sessions) with the option -t 1-3
. If we later wanted to preprocess the next 27 participants, as long as they were in the participants list, we would only need to submit the job again, but specifying the option as -t 4-27
.
qsub -t 1-3 /mnt/MD1200B/egarza/sfernandezl/Data/code/fmriprepADA.SGE
Monitoring the job
qstat
We can see how the jobs are doing with the program qstat
.
qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
14815 0.50817 fmriprep sfernandezl r 12/03/2019 22:28:19 [email protected] 3 1
14815 0.50817 fmriprep sfernandezl r 12/03/2019 22:28:19 [email protected] 3 2
14815 0.50817 fmriprep sfernandezl r 12/03/2019 22:28:19 [email protected] 3 3
An useful tip I have is to use the program watch
to monitor the jobs in real time. The option -n
specifies the time of refresh. So, in order to see the jobs every 5 seconds, we would run:
watch -n 5 qstat
tail
Sometimes, when debugging, it is useful to monitor the stdout of a specific job in real-time. This can be done with the log files saved in the path specified in the #$ -sd
option of the script.
To output to the terminal the log file as it's been updated by the program we only need to run tail -f
with the path of the log we want to monitor, so for the case of our second subject, we would use:
tail -f /mnt/MD1200B/egarza/sfernandezl/logs/fmriprep.o14815.2