FMRIPREP preprocessing - neuropsytox/Documentation GitHub Wiki
Tutorial
How to run a minimal preprocessing pipeline and extract the nuissance confounds with FMRIPREP in the LAVIS' ADA HPCC.
Requirements
Singularity image
In order to run fmriprep
inside ADA, a singularity image of fmriprep
must be available for use inside the cluster.
Because the version of Singularity
installed in ADA is old (<2.5), the user needs to have created and uploaded an image to an accessible location. This can be done with the tool docker2singularity.
For this tutorial, the path of the singularity image is:
/mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img
This image should be available to any member of the egarza
group of ADA.
BIDS
FMRIPREP
requires that the input data are organized according to the BIDS standard.
Even though fmriprep
comes with its own validator. It's recommended to run a BIDS validator separately before running the script.
Data
For this tutorial, I'll be using a BIDS dataset of three subjects with two sessions of T1w, BOLD and fieldmap images:
Data
├── participants.tsv
├── sub-001
│ ├── ses-t0
│ │ ├── anat
│ │ │ ├── sub-001_ses-t0_T1w.json
│ │ │ └── sub-001_ses-t0_T1w.nii.gz
│ │ ├── fmap
│ │ │ ├── sub-001_ses-t0_epi.json
│ │ │ └── sub-001_ses-t0_epi.nii.gz
│ │ └── func
│ │ ├── sub-001_ses-t0_task-rest_bold.json
│ │ └── sub-001_ses-t0_task-rest_bold.nii.gz
│ └── ses-t1
│ ├── anat
│ │ ├── sub-001_ses-t1_T1w.json
│ │ └── sub-001_ses-t1_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-001_ses-t1_epi.json
│ │ └── sub-001_ses-t1_epi.nii.gz
│ └── func
│ ├── sub-001_ses-t1_task-rest_bold.json
│ └── sub-001_ses-t1_task-rest_bold.nii.gz
├── sub-002
│ ├── ses-t0
│ │ ├── anat
│ │ │ ├── sub-002_ses-t0_T1w.json
│ │ │ └── sub-002_ses-t0_T1w.nii.gz
│ │ ├── fmap
│ │ │ ├── sub-002_ses-t0_epi.json
│ │ │ └── sub-002_ses-t0_epi.nii.gz
│ │ └── func
│ │ ├── sub-002_ses-t0_task-rest_bold.json
│ │ └── sub-002_ses-t0_task-rest_bold.nii.gz
│ └── ses-t1
│ ├── anat
│ │ ├── sub-002_ses-t1_T1w.json
│ │ └── sub-002_ses-t1_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-002_ses-t1_epi.json
│ │ └── sub-002_ses-t1_epi.nii.gz
│ └── func
│ ├── sub-002_ses-t1_task-rest_bold.json
│ └── sub-002_ses-t1_task-rest_bold.nii.gz
└── sub-003
├── ses-t0
│ ├── anat
│ │ ├── sub-003_ses-t0_T1w.json
│ │ └── sub-003_ses-t0_T1w.nii.gz
│ ├── fmap
│ │ ├── sub-003_ses-t0_epi.json
│ │ └── sub-003_ses-t0_epi.nii.gz
│ └── func
│ ├── sub-003_ses-t0_task-rest_bold.json
│ └── sub-003_ses-t0_task-rest_bold.nii.gz
└── ses-t1
├── anat
│ ├── sub-003_ses-t1_T1w.json
│ └── sub-003_ses-t1_T1w.nii.gz
├── fmap
│ ├── sub-003_ses-t1_epi.json
│ └── sub-003_ses-t1_epi.nii.gz
└── func
├── sub-003_ses-t1_task-rest_bold.json
└── sub-003_ses-t1_task-rest_bold.nii.gz
SGE script
We will be using the multi-job functionality of the Sun Grid Engine environment of the ADA cluster.
For this, a master bash script will be used in order to call the FMRIPREP
singularity image in parallel jobs, one for each subject.
Shebang
It's good practice to always start any script with a shebang. In this case our script will be written in bourne again shell (bash), so the first line should be:
#! /bin/bash
qsub
arguments
SGE allows us to insert the qsub
arguments inside the script appending them with #$
.
The next part of the script should contain the job parameters we want to use, for example:
- The shell used:
#$ -S /bin/bash
. - The name of the job:
#$ -N FMRIPREP
- Export environmental variables:
#$ -V
- Set the amount of memory used:
#$ -l mem_free=16G
- Set the parallel environment:
#$ -pe openmp 3
- Join stdout and stderr to the same file:
#$ -j y
- Set the path for the log:
#$ -wd /mnt/MD1200B/egarza/sfernandezl/logs
Modules
In order to use Singularity, we have to load the module in the script. The version installed in ADA is 2.2:
module load singularity/2.2
Freesurfer license
FMRIPREP requires the user to have a valid freesurfer
license. For the program to run correctly, the path to the location of the license should be exported as an environmental variable.
export FS_LICENSE=/mnt/MD1200B/egarza/sfernandezl/freesurferLicense/license.txt
Environmental variables
I'm going to save the path to the singularity image and the participants list as environmental variables. For the job parallelization, with an awk script, I will extract the first column of every row of the participants list for every job, and use that as the participant label in each case.
IMAGE=/mnt/MD1200B/egarza/sfernandezl/singimages/fmriprep20190725.img
SUB_LIST=/mnt/MD1200B/egarza/sfernandezl/Data/participants.tsv
SGE_INDEX=$(awk -v F='\t' -v OFS='\t' -v SUB_INDEX=$(($SGE_TASK_ID + 1)) 'NR==SUB_INDEX {print $1}' $SUB_LIST)
Random sleep
In our lab we're accustomed to inserting a small sleep so that jobs don't start at exactly the same time:
sleep $(($SGE_TASK_ID % 10))
Singularity call
Finally, with all set and done, at the end of the script we will insert the call to the singularity image with all of the FMRIPREP arguments.
The singularity container created for the jobs needs to have access to our files, so we have to link a location of the cluster inside the container. This can be done with the -B
option.
In this case, we have all of our data in /Data
, the output will be saved in a derivatives
subdirectory according to BIDS, and the temporal files in /tmp
. All of the directories ought to be created beforehand.
singularity run -B /mnt:/mnt \
$IMAGE \
/mnt/MD1200B/egarza/sfernandezl/Data \
/mnt/MD1200B/egarza/sfernandezl/Data/derivatives \
participant \
--participant-label ${SGE_INDEX} \
--resource-monitor \
--write-graph \
--work-dir /mnt/MD1200B/egarza/sfernandezl/tmp
--output-spaces T1w \
--longitudinal;
For more information on each of these arguments, feel free to read the FMRIPREP's documentation.
Saving the script
According to BIDS, all of the scripts can be in a subdirectory called code
; nonetheless, you are free to save it wherever you want inside the cluster. I like to distinguish my scripts for SGE with the .SGE
extension, but you're free to save it as a .sh
file.
This script is saved in /Data/code/fmriprepADA.SGE
Submitting the job
Once we have the script finished and saved, we're ready to submit the job. For this, we're going to use the qsub
program.
In our call to our program, we're going to ask for three different jobs (one for each participant, independently of the number of sessions) with the option -t 1-3
. If we later wanted to preprocess the next 27 participants, as long as they were in the participants list, we would only need to submit the job again, but specifying the option as -t 4-27
.
qsub -t 1-3 /mnt/MD1200B/egarza/sfernandezl/Data/code/fmriprepADA.SGE
Monitoring the job
qstat
We can see how the jobs are doing with the program qstat
.
qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
14815 0.50817 fmriprep sfernandezl r 12/03/2019 22:28:19 [email protected] 3 1
14815 0.50817 fmriprep sfernandezl r 12/03/2019 22:28:19 [email protected] 3 2
14815 0.50817 fmriprep sfernandezl r 12/03/2019 22:28:19 [email protected] 3 3
An useful tip I have is to use the program watch
to monitor the jobs in real time. The option -n
specifies the time of refresh. So, in order to see the jobs every 5 seconds, we would run:
watch -n 5 qstat
tail
Sometimes, when debugging, it is useful to monitor the stdout of a specific job in real-time. This can be done with the log files saved in the path specified in the #$ -sd
option of the script.
To output to the terminal the log file as it's been updated by the program we only need to run tail -f
with the path of the log we want to monitor, so for the case of our second subject, we would use:
tail -f /mnt/MD1200B/egarza/sfernandezl/logs/fmriprep.o14815.2
Niagara Compute Canada SciNet Consortium
This is how we manage to run FMRIPREP using singularity and templateflow
Niagara have singularity by default, but you could load the module if you like
module load singularity/3.6.0
How to get templateflow in Niagara
As users we cannot use pip install directly, we must create a virtualenv as follows:
Load the version of python you need. You can check which version are available with module avail python
module load python/3.7.9
Create the virtual environment at path ENV
ENV=path/to/env/to/create
virtualenv --no-download --python=python3.7 $ENV
Activate it
source $ENV/bin/activate
Now you can use pip, check if it is up to date
(ENV) user@nia: pip install --upgrade pip
Next with your ENV activated, fetch the template that you'll need. Niagara node doesn't have access to the internet, so it's necessary to have all the files that you are gonna use.
(ENV) user@nia: wget https://raw.githubusercontent.com/mgxd/fmriprep/enh/fetch-tf-templates/scripts/fetch_templates.py
(ENV) user@nia: python -m pip install --upgrade templateflow
(ENV) user@nia: mkdir <path-to-save-templateflow-templates>
(ENV) user@nia: python fetch_templates.py --tf-dir <path-to-save-templateflow-templates>
To create a script, use the 'nano' function, it must have a .sh termination. To save changes press control X and then answer to Yes (save changes) and then ENTER to confirm the name and exit.
nano test_script.sh
This is an example of the script (pay attention, do not copy and paste, you must put your own paths, the paths that you are using in Niagara)
SLURM script
#!/bin/bash
#SBATCH --time=3:00:00
#SBATCH --nodes=1
#SBATCH --job-name=fmriprep_job
#SBATCH --account=rrg-sponsor-ab
#SBATCH --mail-type=FAIL (if you like to receive an email notification)
principal_path=/scratch/working/path/directory
input_data=${principal_path}/data
output_data=${principal_path}/outputfmriprep
container_singularity=/scratch/user/containers/fmriprep-22.0.1.simg
export SINGULARITYENV_TEMPLATEFLOW_HOME=/scratch/user/path/where/you/put/the/templateflow
singularity run --cleanenv -B /scratch:/scratch ${container_singularity} \
${input_data} ${output_data} participant \
--participant_label 001 \
--skip_bids_validation \
--use-aroma \
--output-spaces MNI15200000000:res-1 \
--fs-license-file /scratch/user/principal_path/licence.txt
Run the script with SBATCH
sbatch test_script.sh
To learn more commands to monitor or to run the script for multiple subjects via a SLURM ARRAY see the link below
FMRIPREP en Cluster C13
Para correrlo se recomienda un script. Recordar tener la licencia de Freesurfer text.
fmriprep.test
#!/bin/bash
module load freesurfer/7.4.1
module load singularity
export FS_LICENSE=/misc/tezca/egarza/license.txt
home=/misc/tezca/egarza/Practica
singularity run --cleanenv -B $home:/home fmriprep.sif /home/bids /home/fmriprepout participant \
--participant-label 020 --resource-monitor --write-graph --ignore fieldmaps --fd-spike-threshold 0.5 \
--fs-license-file /root/license.txt -w /home/fmriprepout/work
The simplest way to run is using fsl_sub
:
fsl_sub -N fmriprep bash fmriprep.test