FMRIPREP preprocessing - GarzaLab/Documentation GitHub Wiki

Tutorial

How to run a minimal preprocessing pipeline and extract the nuissance confounds with FMRIPREP in the LAVIS' ADA HPCC.

Requirements

Singularity image

In order to run fmriprep inside ADA, a singularity image of fmriprep must be available for use inside the cluster. Because the version of Singularity installed in ADA is old (<2.5), the user needs to have created and uploaded an image to an accessible location. This can be done with the tool docker2singularity.

For this tutorial, the path of the singularity image is: /mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img This image should be available to any member of the egarza group of ADA.

BIDS

FMRIPREP requires that the input data are organized according to the BIDS standard. Even though fmriprep comes with its own validator. It's recommended to run a BIDS validator separately before running the script.

Data

For this tutorial, I'll be using a BIDS dataset of three subjects with two sessions of T1w, BOLD and fieldmap images:

Data
├── participants.tsv
├── sub-001
│   ├── ses-t0
│   │   ├── anat
│   │   │   ├── sub-001_ses-t0_T1w.json
│   │   │   └── sub-001_ses-t0_T1w.nii.gz
│   │   ├── fmap
│   │   │   ├── sub-001_ses-t0_epi.json
│   │   │   └── sub-001_ses-t0_epi.nii.gz
│   │   └── func
│   │       ├── sub-001_ses-t0_task-rest_bold.json
│   │       └── sub-001_ses-t0_task-rest_bold.nii.gz
│   └── ses-t1
│       ├── anat
│       │   ├── sub-001_ses-t1_T1w.json
│       │   └── sub-001_ses-t1_T1w.nii.gz
│       ├── fmap
│       │   ├── sub-001_ses-t1_epi.json
│       │   └── sub-001_ses-t1_epi.nii.gz
│       └── func
│           ├── sub-001_ses-t1_task-rest_bold.json
│           └── sub-001_ses-t1_task-rest_bold.nii.gz
├── sub-002
│   ├── ses-t0
│   │   ├── anat
│   │   │   ├── sub-002_ses-t0_T1w.json
│   │   │   └── sub-002_ses-t0_T1w.nii.gz
│   │   ├── fmap
│   │   │   ├── sub-002_ses-t0_epi.json
│   │   │   └── sub-002_ses-t0_epi.nii.gz
│   │   └── func
│   │       ├── sub-002_ses-t0_task-rest_bold.json
│   │       └── sub-002_ses-t0_task-rest_bold.nii.gz
│   └── ses-t1
│       ├── anat
│       │   ├── sub-002_ses-t1_T1w.json
│       │   └── sub-002_ses-t1_T1w.nii.gz
│       ├── fmap
│       │   ├── sub-002_ses-t1_epi.json
│       │   └── sub-002_ses-t1_epi.nii.gz
│       └── func
│           ├── sub-002_ses-t1_task-rest_bold.json
│           └── sub-002_ses-t1_task-rest_bold.nii.gz
└── sub-003
    ├── ses-t0
    │   ├── anat
    │   │   ├── sub-003_ses-t0_T1w.json
    │   │   └── sub-003_ses-t0_T1w.nii.gz
    │   ├── fmap
    │   │   ├── sub-003_ses-t0_epi.json
    │   │   └── sub-003_ses-t0_epi.nii.gz
    │   └── func
    │       ├── sub-003_ses-t0_task-rest_bold.json
    │       └── sub-003_ses-t0_task-rest_bold.nii.gz
    └── ses-t1
        ├── anat
        │   ├── sub-003_ses-t1_T1w.json
        │   └── sub-003_ses-t1_T1w.nii.gz
        ├── fmap
        │   ├── sub-003_ses-t1_epi.json
        │   └── sub-003_ses-t1_epi.nii.gz
        └── func
            ├── sub-003_ses-t1_task-rest_bold.json
            └── sub-003_ses-t1_task-rest_bold.nii.gz

SGE script

We will be using the multi-job functionality of the Sun Grid Engine environment of the ADA cluster. For this, a master bash script will be used in order to call the FMRIPREP singularity image in parallel jobs, one for each subject.

Shebang

It's good practice to always start any script with a shebang. In this case our script will be written in bourne again shell (bash), so the first line should be:

#! /bin/bash

qsub arguments

SGE allows us to insert the qsub arguments inside the script appending them with #$. The next part of the script should contain the job parameters we want to use, for example:

  • The shell used: #$ -S /bin/bash.
  • The name of the job: #$ -N FMRIPREP
  • Export environmental variables: #$ -V
  • Set the amount of memory used: #$ -l mem_free=16G
  • Set the parallel environment: #$ -pe openmp 3
  • Join stdout and stderr to the same file: #$ -j y
  • Set the path for the log: #$ -wd /mnt/MD1200B/egarza/sfernandezl/logs

Modules

In order to use Singularity, we have to load the module in the script. The version installed in ADA is 2.2:

module load singularity/2.2

Freesurfer license

FMRIPREP requires the user to have a valid freesurfer license. For the program to run correctly, the path to the location of the license should be exported as an environmental variable.

export FS_LICENSE=/mnt/MD1200B/egarza/sfernandezl/freesurferLicense/license.txt

Environmental variables

I'm going to save the path to the singularity image and the participants list as environmental variables. For the job parallelization, with an awk script, I will extract the first column of every row of the participants list for every job, and use that as the participant label in each case.

IMAGE=/mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img
SUB_LIST=/mnt/MD1200B/egarza/sfernandezl/Data/participants.tsv
SGE_INDEX=$(awk -v F='\t' -v OFS='\t' -v SUB_INDEX=$(($SGE_TASK_ID + 1)) 'NR==SUB_INDEX {print $1}' $SUB_LIST)

Random sleep

In our lab we're accustomed to inserting a small sleep so that jobs don't start at exactly the same time:

sleep $(($SGE_TASK_ID % 10))

Singularity call

Finally, with all set and done, at the end of the script we will insert the call to the singularity image with all of the FMRIPREP arguments.

The singularity container created for the jobs needs to have access to our files, so we have to link a location of the cluster inside the container. This can be done with the -B option.

In this case, we have all of our data in /Data, the output will be saved in a derivatives subdirectory according to BIDS, and the temporal files in /tmp. All of the directories ought to be created beforehand.

singularity run -B /mnt:/mnt \
$IMAGE \
/mnt/MD1200B/egarza/sfernandezl/Data \
/mnt/MD1200B/egarza/sfernandezl/Data/derivatives \
participant \
--participant-label ${SGE_INDEX} \
--resource-monitor \
--write-graph \
--work-dir /mnt/MD1200B/egarza/sfernandezl/tmp
--output-spaces T1w \
--longitudinal;

For more information on each of these arguments, feel free to read the FMRIPREP's documentation.

Saving the script

According to BIDS, all of the scripts can be in a subdirectory called code; nonetheless, you are free to save it wherever you want inside the cluster. I like to distinguish my scripts for SGE with the .SGE extension, but you're free to save it as a .sh file.

This script is saved in /Data/code/fmriprepADA.SGE

Submitting the job

Once we have the script finished and saved, we're ready to submit the job. For this, we're going to use the qsub program.

In our call to our program, we're going to ask for three different jobs (one for each participant, independently of the number of sessions) with the option -t 1-3. If we later wanted to preprocess the next 27 participants, as long as they were in the participants list, we would only need to submit the job again, but specifying the option as -t 4-27.

qsub -t 1-3 /mnt/MD1200B/egarza/sfernandezl/Data/code/fmriprepADA.SGE

Monitoring the job

qstat

We can see how the jobs are doing with the program qstat.

qstat

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  14815 0.50817 fmriprep   sfernandezl  r     12/03/2019 22:28:19 [email protected]     3 1
  14815 0.50817 fmriprep   sfernandezl  r     12/03/2019 22:28:19 [email protected]     3 2
  14815 0.50817 fmriprep   sfernandezl  r     12/03/2019 22:28:19 [email protected]     3 3

An useful tip I have is to use the program watch to monitor the jobs in real time. The option -n specifies the time of refresh. So, in order to see the jobs every 5 seconds, we would run:

watch -n 5 qstat

tail

Sometimes, when debugging, it is useful to monitor the stdout of a specific job in real-time. This can be done with the log files saved in the path specified in the #$ -sd option of the script.

To output to the terminal the log file as it's been updated by the program we only need to run tail -f with the path of the log we want to monitor, so for the case of our second subject, we would use:

tail -f /mnt/MD1200B/egarza/sfernandezl/logs/fmriprep.o14815.2