FMRIPREP preprocessing - neuropsytox/Documentation GitHub Wiki

Tutorial

How to run a minimal preprocessing pipeline and extract the nuissance confounds with FMRIPREP in the LAVIS' ADA HPCC.

Requirements

Singularity image

In order to run fmriprep inside ADA, a singularity image of fmriprep must be available for use inside the cluster. Because the version of Singularity installed in ADA is old (<2.5), the user needs to have created and uploaded an image to an accessible location. This can be done with the tool docker2singularity.

For this tutorial, the path of the singularity image is: /mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img This image should be available to any member of the egarza group of ADA.

BIDS

FMRIPREP requires that the input data are organized according to the BIDS standard. Even though fmriprep comes with its own validator. It's recommended to run a BIDS validator separately before running the script.

Data

For this tutorial, I'll be using a BIDS dataset of three subjects with two sessions of T1w, BOLD and fieldmap images:

Data
├── participants.tsv
├── sub-001
│   ├── ses-t0
│   │   ├── anat
│   │   │   ├── sub-001_ses-t0_T1w.json
│   │   │   └── sub-001_ses-t0_T1w.nii.gz
│   │   ├── fmap
│   │   │   ├── sub-001_ses-t0_epi.json
│   │   │   └── sub-001_ses-t0_epi.nii.gz
│   │   └── func
│   │       ├── sub-001_ses-t0_task-rest_bold.json
│   │       └── sub-001_ses-t0_task-rest_bold.nii.gz
│   └── ses-t1
│       ├── anat
│       │   ├── sub-001_ses-t1_T1w.json
│       │   └── sub-001_ses-t1_T1w.nii.gz
│       ├── fmap
│       │   ├── sub-001_ses-t1_epi.json
│       │   └── sub-001_ses-t1_epi.nii.gz
│       └── func
│           ├── sub-001_ses-t1_task-rest_bold.json
│           └── sub-001_ses-t1_task-rest_bold.nii.gz
├── sub-002
│   ├── ses-t0
│   │   ├── anat
│   │   │   ├── sub-002_ses-t0_T1w.json
│   │   │   └── sub-002_ses-t0_T1w.nii.gz
│   │   ├── fmap
│   │   │   ├── sub-002_ses-t0_epi.json
│   │   │   └── sub-002_ses-t0_epi.nii.gz
│   │   └── func
│   │       ├── sub-002_ses-t0_task-rest_bold.json
│   │       └── sub-002_ses-t0_task-rest_bold.nii.gz
│   └── ses-t1
│       ├── anat
│       │   ├── sub-002_ses-t1_T1w.json
│       │   └── sub-002_ses-t1_T1w.nii.gz
│       ├── fmap
│       │   ├── sub-002_ses-t1_epi.json
│       │   └── sub-002_ses-t1_epi.nii.gz
│       └── func
│           ├── sub-002_ses-t1_task-rest_bold.json
│           └── sub-002_ses-t1_task-rest_bold.nii.gz
└── sub-003
    ├── ses-t0
    │   ├── anat
    │   │   ├── sub-003_ses-t0_T1w.json
    │   │   └── sub-003_ses-t0_T1w.nii.gz
    │   ├── fmap
    │   │   ├── sub-003_ses-t0_epi.json
    │   │   └── sub-003_ses-t0_epi.nii.gz
    │   └── func
    │       ├── sub-003_ses-t0_task-rest_bold.json
    │       └── sub-003_ses-t0_task-rest_bold.nii.gz
    └── ses-t1
        ├── anat
        │   ├── sub-003_ses-t1_T1w.json
        │   └── sub-003_ses-t1_T1w.nii.gz
        ├── fmap
        │   ├── sub-003_ses-t1_epi.json
        │   └── sub-003_ses-t1_epi.nii.gz
        └── func
            ├── sub-003_ses-t1_task-rest_bold.json
            └── sub-003_ses-t1_task-rest_bold.nii.gz

SGE script

We will be using the multi-job functionality of the Sun Grid Engine environment of the ADA cluster. For this, a master bash script will be used in order to call the FMRIPREP singularity image in parallel jobs, one for each subject.

Shebang

It's good practice to always start any script with a shebang. In this case our script will be written in bourne again shell (bash), so the first line should be:

#! /bin/bash

`qsub` arguments

SGE allows us to insert the qsub arguments inside the script appending them with #$. The next part of the script should contain the job parameters we want to use, for example:

The shell used: #$ -S /bin/bash.
The name of the job: #$ -N FMRIPREP
Export environmental variables: #$ -V
Set the amount of memory used: #$ -l mem_free=16G
Set the parallel environment: #$ -pe openmp 3
Join stdout and stderr to the same file: #$ -j y
Set the path for the log: #$ -wd /mnt/MD1200B/egarza/sfernandezl/logs

Modules

In order to use Singularity, we have to load the module in the script. The version installed in ADA is 2.2:

module load singularity/2.2

Freesurfer license

FMRIPREP requires the user to have a valid freesurfer license. For the program to run correctly, the path to the location of the license should be exported as an environmental variable.

export FS_LICENSE=/mnt/MD1200B/egarza/sfernandezl/freesurferLicense/license.txt

Environmental variables

I'm going to save the path to the singularity image and the participants list as environmental variables. For the job parallelization, with an awk script, I will extract the first column of every row of the participants list for every job, and use that as the participant label in each case.

IMAGE=/mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img
SUB_LIST=/mnt/MD1200B/egarza/sfernandezl/Data/participants.tsv
SGE_INDEX=$(awk -v F='\t' -v OFS='\t' -v SUB_INDEX=$(($SGE_TASK_ID + 1)) 'NR==SUB_INDEX {print $1}' $SUB_LIST)

Random sleep

In our lab we're accustomed to inserting a small sleep so that jobs don't start at exactly the same time:

sleep $(($SGE_TASK_ID % 10))

Singularity call

Finally, with all set and done, at the end of the script we will insert the call to the singularity image with all of the FMRIPREP arguments.

The singularity container created for the jobs needs to have access to our files, so we have to link a location of the cluster inside the container. This can be done with the -B option.

In this case, we have all of our data in /Data, the output will be saved in a derivatives subdirectory according to BIDS, and the temporal files in /tmp. All of the directories ought to be created beforehand.

singularity run -B /mnt:/mnt \
$IMAGE \
/mnt/MD1200B/egarza/sfernandezl/Data \
/mnt/MD1200B/egarza/sfernandezl/Data/derivatives \
participant \
--participant-label ${SGE_INDEX} \
--resource-monitor \
--write-graph \
--work-dir /mnt/MD1200B/egarza/sfernandezl/tmp
--output-spaces T1w \
--longitudinal;

For more information on each of these arguments, feel free to read the FMRIPREP's documentation.

Saving the script

According to BIDS, all of the scripts can be in a subdirectory called code; nonetheless, you are free to save it wherever you want inside the cluster. I like to distinguish my scripts for SGE with the .SGE extension, but you're free to save it as a .sh file.

This script is saved in /Data/code/fmriprepADA.SGE

Submitting the job

Once we have the script finished and saved, we're ready to submit the job. For this, we're going to use the qsub program.

In our call to our program, we're going to ask for three different jobs (one for each participant, independently of the number of sessions) with the option -t 1-3. If we later wanted to preprocess the next 27 participants, as long as they were in the participants list, we would only need to submit the job again, but specifying the option as -t 4-27.

qsub -t 1-3 /mnt/MD1200B/egarza/sfernandezl/Data/code/fmriprepADA.SGE

Monitoring the job

qstat

We can see how the jobs are doing with the program qstat.

qstat

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  14815 0.50817 fmriprep   sfernandezl  r     12/03/2019 22:28:19 [email protected]     3 1
  14815 0.50817 fmriprep   sfernandezl  r     12/03/2019 22:28:19 [email protected]     3 2
  14815 0.50817 fmriprep   sfernandezl  r     12/03/2019 22:28:19 [email protected]     3 3

An useful tip I have is to use the program watch to monitor the jobs in real time. The option -n specifies the time of refresh. So, in order to see the jobs every 5 seconds, we would run:

watch -n 5 qstat

tail

Sometimes, when debugging, it is useful to monitor the stdout of a specific job in real-time. This can be done with the log files saved in the path specified in the #$ -sd option of the script.

To output to the terminal the log file as it's been updated by the program we only need to run tail -f with the path of the log we want to monitor, so for the case of our second subject, we would use:

tail -f /mnt/MD1200B/egarza/sfernandezl/logs/fmriprep.o14815.2

Niagara Compute Canada SciNet Consortium

This is how we manage to run FMRIPREP using singularity and templateflow

Niagara have singularity by default, but you could load the module if you like

module load singularity/3.6.0

How to get templateflow in Niagara

As users we cannot use pip install directly, we must create a virtualenv as follows:

Load the version of python you need. You can check which version are available with module avail python

module load python/3.7.9

Create the virtual environment at path ENV

ENV=path/to/env/to/create

virtualenv --no-download --python=python3.7 $ENV

Activate it

source $ENV/bin/activate

Now you can use pip, check if it is up to date

(ENV) user@nia: pip install --upgrade pip

Next with your ENV activated, fetch the template that you'll need. Niagara node doesn't have access to the internet, so it's necessary to have all the files that you are gonna use.

(ENV) user@nia: wget https://raw.githubusercontent.com/mgxd/fmriprep/enh/fetch-tf-templates/scripts/fetch_templates.py 
(ENV) user@nia: python -m pip install --upgrade templateflow  
(ENV) user@nia: mkdir <path-to-save-templateflow-templates> 
(ENV) user@nia: python fetch_templates.py --tf-dir <path-to-save-templateflow-templates>

To create a script, use the 'nano' function, it must have a .sh termination. To save changes press control X and then answer to Yes (save changes) and then ENTER to confirm the name and exit.

nano test_script.sh

This is an example of the script (pay attention, do not copy and paste, you must put your own paths, the paths that you are using in Niagara)

SLURM script

#!/bin/bash
#SBATCH --time=3:00:00
#SBATCH --nodes=1
#SBATCH --job-name=fmriprep_job
#SBATCH --account=rrg-sponsor-ab
#SBATCH --mail-type=FAIL (if you like to receive an email notification)

principal_path=/scratch/working/path/directory
input_data=${principal_path}/data
output_data=${principal_path}/outputfmriprep
container_singularity=/scratch/user/containers/fmriprep-22.0.1.simg

export SINGULARITYENV_TEMPLATEFLOW_HOME=/scratch/user/path/where/you/put/the/templateflow 
singularity run --cleanenv -B /scratch:/scratch ${container_singularity} \
${input_data} ${output_data} participant \
 --participant_label 001 \
 --skip_bids_validation \
 --use-aroma \
 --output-spaces MNI15200000000:res-1 \
 --fs-license-file /scratch/user/principal_path/licence.txt

Run the script with SBATCH

sbatch test_script.sh

To learn more commands to monitor or to run the script for multiple subjects via a SLURM ARRAY see the link below

For monitoring jobs in SLURM

FMRIPREP en Cluster C13

Para correrlo se recomienda un script. Recordar tener la licencia de Freesurfer text.

fmriprep.test

#!/bin/bash

module load freesurfer/7.4.1
module load singularity

export FS_LICENSE=/misc/tezca/egarza/license.txt

home=/misc/tezca/egarza/Practica

singularity run --cleanenv -B $home:/home fmriprep.sif /home/bids /home/fmriprepout participant \
--participant-label 020 --resource-monitor --write-graph --ignore fieldmaps --fd-spike-threshold 0.5 \
--fs-license-file /root/license.txt -w /home/fmriprepout/work

The simplest way to run is using fsl_sub:

fsl_sub -N fmriprep bash fmriprep.test