BIDS on the Cluster - cogcommscience-lab/lab-docs GitHub Wiki

Running BIDS Apps on the Peloton Cluster (scom-marr)

BIDS Gorgolewski et al., 2016, Nature Scientific Data is a relatively new data structure designed to increase data sharing and replication among fMRI datasets. This data stricture also has streamlined data analysis, and there are many new BIDS Apps that use the latest and best tools for data processing and analysis. Our lab is committed to using BIDS on all new datasets. This wiki page explains how to use BIDS Apps on your local workstation or the lab cluster.

How to run MRIQC on Peloton

IMPORTANT: This assumes you have already read Esteban et al, 2017, PLOS ONE and theMRIQC Documentation. If you haven't done that yet, read those materials first.
Your local system needs Docker. If you are on a lab workstation, this is already installed and configured.
Get mriqc locally by running: $ docker run -it nipreps/mriqc:latest --version
Our cluster cannot directly run Docker. It uses Singularity. You will need to create a singularity image and upload it to our cluster. It will help for you to review the directions for creating a Singularity < 2.5 image. Our cluster has a more modern Singularity install, but the directions for versions > 2.5 don't work. Once you've reviewed those directions, you can make a singularity image of mriqc by running this syntax from your terminal:
```
 docker run --privileged -t --rm \
 -v /var/run/docker.sock:/var/run/docker.sock \
 -v /absolute/path/to/output/folder:/output \
 singularityware/docker2singularity \
 nipreps/mriqc:latest
```
NB: Be sure to update the output path. Also, you'll get the latest mriqc version by running the local docker command.
Copy your singularity mriqc image file to lab directory peloton server. Follow these instructions for scp and/or rsync
Make sure your dataset is BIDS compliant
Now, ssh into Peloton. Be SURE to establish a tmux session.
Create a output log dir for slurm using something like (update file path for your project): $ mkdir -p /group/rwhuskeygrp/{study_name}/slurm_logs/mriqc
- These logs are super useful for debugging. Monitor them as your code runs to see if your code crashed, and why.
Create a .sh file that looks something like what you see below. Be sure to update the paths (particularly {study_name}) and preamble. You should read the NiPreps docs for running mriqc via singularity on HPC. Note, many thanks to this helpful guide, which helped Richard realize that the reason his earlier slurm scripts were not working was because of a binding issue. That guide gives nice code for resolving that issue.

IMPORTANT: You may want to add the following flags: --no-sub (prevents the sending of anonymous image quality metrics to the MRIQC developers) -w /group/rwhuskeygrp/{study_name}/mriqc/mriqc_work participant (Path where intermediate results should be stored)

#!/bin/bash -l
  
#SBATCH --job-name=rwhuskeygrp_mriqc
#SBATCH --output=/group/rwhuskeygrp/{study_name}/slurm_logs/mriqc/mriqc.out
#SBATCH --time=12:00:00
#SBATCH --partition=high2
#SBATCH --mem=200G
#SBATCH --cpus-per-task=32

mkdir -p /group/rwhuskeygrp/{study_name}/mriqc/mriqc_output/logs
mkdir -p /group/rwhuskeygrp/{study_name}/mriqc/mriqc_work

module load deprecated/singularity/3.5.2
srun singularity run --cleanenv --bind /group/rwhuskeygrp/{study_name}/bids_nii:/data --bind /group/rwhuskeygrp/{study_name}/mriqc/mriqc_output:/out \
/group/rwhuskeygrp/{study_name}/singularity_images/nipreps_mriqc_latest-2025-01-13-6c3daacf1801.simg \
/data /out participant \  
--n_procs $SLURM_CPUS_PER_TASK

submit your job with: $ sbatch new_sh_file_i_just_made.sh
run $ squeue -u $(whoami) to check if your job is running appropriately
run $ scancel -u <my_user_name> to cancel all your slurm jobs
run $ squeue --format="%.18i" --me -h | grep -w 26699.* | xargs scancel to cancel slurm jobs by jobid. In this case, all jobs starting with jobid 26699 will be cancelled.

How to run FMRIPREP on Peloton

IMPORTANT: This assumes you have already read Esteban et al., 2019, Nature Methods and the fMRIPrepp Documentation. If you haven't done that yet, read those materials first.
As with mriqc, your local system needs Docker. If you are on a lab workstation, this is already installed and configured.
Get fmriprep locally by running: $ docker run -it nipreps/fmriprep:latest --version
Make a Singularity <2.5 image of fmriprep by running this syntax from your terminal:
```
 docker run --privileged -t --rm \
 -v /var/run/docker.sock:/var/run/docker.sock \
 -v /absolute/path/to/output/folder:/output \
 singularityware/docker2singularity \
 nipreps/mriqc:latest
```
NB: Be sure to update the output path. Also, you'll get the latest fmriprep version by running the local docker command.
Make sure your dataset is BIDS compliant
If you haven't already, copy your BIDS compliant dataset to the Peloton server. Also copy your singularity fmriprep image file to lab directory peloton server. Follow these instructions for scp and/or rsync
Now, ssh into Peloton. Be SURE to establish a tmux session.
Create a output log dir for slurm logs using something like (update file path for your project): $ mkdir -p /group/rwhuskeygrp/{study_name}/slurm_logs/fmriprep
Create a output log dir for slurm scripts using something like (update file path for your project): $ mkdir -p /group/rwhuskeygrp/{study_name}/slurm_scripts/fmriprep
- These logs are super useful for debugging. Monitor them as your code runs to see if your code crashed, and why.
Create a .sh file that looks something like what you see below and save it in the newly created slurm_scripts/fmriprep directory. Be sure to update the paths (particularly {study_name}) and preamble. You should read the NiPreps docs for running fmriprep via singularity on HPC. Although, the examples there make assumptions about the configuration of your HPC that are not terribly true for Peloton. Therefore, some pretty substantial rewrites to the slurm script are necessary relative to their examples. The script below worked with fmriprep v24.
You may want to do fmriprep on just one subject at a time. The code do do that is less complex, and this helpful guide might give you some clues for how to do that.

IMPORTANT:

In fMRIprep v24.1.1, there is an issue where assets (template brains) need downloaded and saved to the templateflow directory specified in the script below. This is fine if you run fmriprep on a single subject. But, if you need to run it on multiple subjects, at the same time, there is a known issue that elicits a race condition where the same files attempt to download simultaneously to the same directory for each participant and that causes a crash. The current workaround was to download the assets of the templateflow directory before starting fmriprep, copy them to the templateflow directory specified in the script below, and then start fmriprep. You can find those assets at: /group/rwhuskeygrp/templateflow_fmriprep_v24.1.1_compliant
Be sure to include the --skip_bids_validation flag. This causes a crash. So long as you have previously validated your bids dataset, you'll be fine.
This code was for the bad news bias study. Be sure to update relevant paths accordingly.
Name this file fmriprep.slurm

fmriprep.slurm                                                                                                                                                                            
#!/bin/bash
#
#SBATCH -J fmriprep
#SBATCH --time=48:00:00
#SBATCH -n 1
#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=4G
#SBATCH -p high2  # Queue names you can submit to
# Outputs ----------------------------------
#SBATCH -o /group/rwhuskeygrp/bad_news_bias/slurm_logs/fmriprep/%x-%A-%a.out
#SBATCH -e /group/rwhuskeygrp/bad_news_bias/slurm_logs/fmriprep/%x-%A-%a.err
# ------------------------------------------


# Specify where the data live
# Note $Study gets defined when you call the shell script to submit your jobs to HPC
# results will be saved in the bids_nii directory, under fmriprep
BIDS_DIR="$STUDY/bids_nii"
DERIVS_DIR="derivatives/fmriprep"
#LOCAL_FREESURFER_DIR="$STUDY/bids_nii/derivatives/freesurfer"

# Prepare some writeable bind-mount points
# These will be saved as hidden directories in your home directory
TEMPLATEFLOW_HOST_HOME=$HOME/.cache/templateflow
FMRIPREP_HOST_CACHE=$HOME/.cache/fmriprep
mkdir -p ${TEMPLATEFLOW_HOST_HOME}
mkdir -p ${FMRIPREP_HOST_CACHE}

# Prepare derivatives folder
mkdir -p ${BIDS_DIR}/${DERIVS_DIR}
mkdir -p /group/rwhuskeygrp/bad_news_bias/fmriprep_work

# Make sure FS_LICENSE is defined in the container
export SINGULARITYENV_FS_LICENSE=$HOME/license.txt

# Make sure the right cert is in the container to allow downloading Templateflow
# see https://fmriprep.org/en/20.2.0/singularity.html#internet-access-problems
export SINGULARITYENV_REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

# Designate a templateflow bind-mount point
export SINGULARITYENV_TEMPLATEFLOW_HOME="/templateflow"

# Make the command
SINGULARITY_CMD="singularity run --cleanenv \
-B $BIDS_DIR:/data \
-B ${TEMPLATEFLOW_HOST_HOME}:${SINGULARITYENV_TEMPLATEFLOW_HOME} \
-B /group/rwhuskeygrp/bad_news_bias/fmriprep_work:/work \
$STUDY/singularity_images/nipreps_fmriprep_latest-2024-10-10-c1b24d34568d.simg"

# Parse the participants.tsv file and extract one subject ID from the line corresponding to this SLURM task
subject=$( sed -n -E "$((${SLURM_ARRAY_TASK_ID} + 1))s/sub-(\S*)\>.*/\1/gp" ${BIDS_DIR}/participants.tsv )

# Compose the command line
cmd="${SINGULARITY_CMD} /data /data/${DERIVS_DIR} \
participant --participant-label $subject \
-w /work/ \
-vv --omp-nthreads 8 --nthreads 12 --mem_mb 48000 \
--output-spaces MNI152NLin6Asym:res-2 MNI152NLin2009cAsym \
--skip_bids_validation \
--use-syn-sdc \
--fs-license-file /data/license.txt"


# Setup done, run the command
echo Running task ${SLURM_ARRAY_TASK_ID}
echo Commandline: $cmd
module load deprecated/singularity/3.5.2
eval $cmd
exitcode=$?

# Output results to a table
echo "sub-$subject   ${SLURM_ARRAY_TASK_ID}    $exitcode" \
      >> ${SLURM_JOB_NAME}.${SLURM_ARRAY_JOB_ID}.tsv
echo Finished tasks ${SLURM_ARRAY_TASK_ID} with exit code $exitcode
exit $exitcode

To get this code running, you'll need to execute two commands:

export STUDY=/group/rwhuskeygrp/bad_news_bias

And then:

sbatch --array=1-$(( $( wc -l $STUDY/bids_nii/participants.tsv | cut -f1 -d' ' ) - 1 )) fmriprep.slurm

run $ squeue -u $(whoami) to check if your job is running appropriately
run $ scancel -u <my_user_name> to cancel all your slurm jobs
run $ squeue --format="%.18i" --me -h | grep -w 26699.* | xargs scancel to cancel slurm jobs by jobid. In this case, all jobs starting with jobid 26699 will be cancelled.

How much RAM do you need? Short answer is, we don't know just yet. But this helpful guide points out how we might test this.

How to run xcpEngine on Peloton

Make xcpEngine singularity image using $ singularity build xcpEngine.simg docker://pennbbl/xcpengine:latest
Copy your singularity xcpEngine image file to peloton server
Create a csv file which contains the fmri image information (subject, run number, and file location) touch /group/rwhuskeygrp/derivative/fmriprep-latest/fmriprep/cohort.csv
Add the fmri image information into the csv file such as:

  id0,id1,img
  sub-005,run-01,sub-005/func/sub-005_task-game_run-01_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
  sub-005,run-02,sub-005/func/sub-005_task-game_run-02_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz
  sub-005,run-03,sub-005/func/sub-005_task-game_run-03_space-MNI152NLin2009cAsym_desc-preproc_bold.nii.gz

Create a pipeline design file in the data dir. We can simply download the design file and copy the downloaded file to our data dir using cp <the downloaded dsn file location> /group/rwhuskeygrp/derivative/fmriprep-latest/fmriprep/pipline_design.dsn
Things that we should specifically look at are following:

Items	Suggestion
regress_lopass[3]=0.08	change to desired low pass filter or keep default
regress_hipass[3]=0.01	change to desired high pass filter or keep default
confound2_gsr[2]=1	change value to 0 if you do not want correct motion with gsr
confound2_aroma[2]=1	change to 0 if you do not want use aroma for motion correction
regress_sptf[3]=susan	For ICA denoising pipeline, spatial smoothing is recommended, otherwise spatial smoothing should be set to none
regress_smo[3]=1	For ICA denoising pipeline, spatial smoothing is recommended, otherwise spatial smoothing should be set to none

Create a log output dir using $ mkdir -p /group/rwhuskeygrp/slurm_logs/xcpengine

Create a .sh file that looks something like what you see below. Be sure to update the paths and preamble.

#!/bin/bash
#
#SBATCH -J xcpengine
#SBATCH --time=120:00:00
#SBATCH -n 1
#SBATCH --cpus-per-task=16
#SBATCH --mem=40G
#SBATCH -p high2 # Queue names you can submit to
# Outputs ----------------------------------
#SBATCH -o /group/rwhuskeygrp/slurm_logs/xcpengine/%x-%A-%a.out
# ------------------------------------------
DATA_DIR="/group/rwhuskeygrp/bids_nii/derivatives/fmriprep-latest/fmriprep"
FULL_COHORT="$DATA_DIR/cohort.csv"

SINGULARITY_IMG="/group/rwhuskeygrp/singularity_images/xcpEngine.simg"
LINE_NUM=$( expr ${SLURM_ARRAY_TASK_ID} + 1 )
LINE=$(awk "NR==$LINE_NUM" $FULL_COHORT)
mkdir -p ${DATA_DIR}/tempcohort
TEMP_COHORT=$DATA_DIR/tempcohort/${SLURM_ARRAY_TASK_ID}.csv
HEADER=$(head -n 1 $FULL_COHORT)
echo $HEADER > $TEMP_COHORT
echo $LINE >> $TEMP_COHORT

# Compose the command line
cmd="singularity run -B $DATA_DIR:/home/user/data $SINGULARITY_IMG -d /home/user/data/pipline_design.dsn -c \ 
/home/user/data/tempcohort/${SLURM_ARRAY_TASK_ID}.csv \ 
-o /home/user/data/output -r /home/user/data/ -i $TMPDIR"

# Setup done, run the command
echo Running task ${SLURM_ARRAY_TASK_ID}
echo Commandline: $cmd
module load singularity
eval $cmd
exitcode=$?

# Output results to a table
echo " ${SLURM_ARRAY_TASK_ID}    $exitcode" \
   >> ${SLURM_JOB_NAME}.${SLURM_ARRAY_JOB_ID}.tsv
echo Finished tasks ${SLURM_ARRAY_TASK_ID} with exit code $exitcode
exit $exitcode

submit your job with: $ sbatch --array=1-<this should be the number of files that needs to be processed> xcpengine.sh
run $ squeue -u $(whoami) to check if your job is running appropriately

How to run trimming on Peloton

Trimming process is to separate a 4-d image to a set of 3-d images along the time dimension. We will use fslsplit to do this.
Trimming in peloton using parallel processing might save time about half an hour or longer.
Create a .sh file touch trimming_parallel.sh that looks something like what you see below. Be sure to update the paths and preamble.

#!/bin/bash
# 
#SBATCH -J trimming
#SBATCH --time=120:00:00
#SBATCH -n 1
#SBATCH --cpus-per-task=4
#SBATCH --mem=20G
#SBATCH -p high2 # Queue names you can submit to
# Outputs ----------------------------------
#SBATCH -o /group/rwhuskeygrp/slurm_logs/trimming/%x-%A-%a.out
#SBATCH --array=1-139
# ------------------------------------------
DATA_DIR="/group/rwhuskeygrp/bids_nii/derivatives/fmriprep-latest"

# We will use metadata in the cohort.csv file that we used in the denoising processing using xcpengine
FULL_COHORT="$DATA_DIR/fmriprep/cohort.csv"
XCP_DATA_DIR="$DATA_DIR/output"

SUB=$(awk "NR==$LINE_NUM" $FULL_COHORT | cut -d',' -f1)
TASK=$(awk "NR==$LINE_NUM" $FULL_COHORT | cut -d',' -f2)

FILE="${XCP_DATA_DIR}/${SUB}/${TASK}/${SUB}_${TASK}.nii.gz"
OUT_DIR="/group/rwhuskeygrp/trimming/${SUB}/${TASK}"

echo "FILE is $FILE"
echo "mkdir -p ${OUT_DIR}"
mkdir -p $OUT_DIR

cmd="fslsplit ${FILE} ${OUT_DIR}/"

# Setup done, run the command
echo Running task ${SLURM_ARRAY_TASK_ID}
echo Commandline: $cmd
module load FSL
eval $cmd
exitcode=$?

# Output results to a table
echo " ${SLURM_ARRAY_TASK_ID}    $exitcode" \
      >> ${SLURM_JOB_NAME}.${SLURM_ARRAY_JOB_ID}.tsv
echo Finished tasks ${SLURM_ARRAY_TASK_ID} with exit code $exitcode
exit $exitcode

make a dir to save your log file mkdir -p /group/rwhuskeygrp/slurm_logs/trimming/
submit your job with: $ sbatch trimming_parallel.sh
run $ squeue -u $(whoami) to check if your job is running appropriately

How to run merging on Peloton

Merging process is to merge a set of 3-d images to a 4-d image defined by TR. We will use fslmerge to do this.
Merging in peloton using parallel processing might save time as well.
Create a .sh file touch merging_parallel.sh that looks something like what you see below. Be sure to update the paths and preamble.

#!/bin/bash
# 
#SBATCH -J merging
#SBATCH --time=120:00:00
#SBATCH -n 1
#SBATCH --cpus-per-task=4
#SBATCH --mem=20G
#SBATCH -p high2 # Queue names you can submit to
# Outputs ----------------------------------
#SBATCH -o /group/rwhuskeygrp/slurm_logs/merging/%x-%A-%a.out
#SBATCH --array=1-139
# ------------------------------------------
DATA_DIR="/group/rwhuskeygrp/bids_nii/derivatives/fmriprep-latest"

# We will use metadata stored in cohort.csv file that we created for denoising processing using xcpengine
FULL_COHORT="$DATA_DIR/fmriprep/cohort.csv"

SUB=$(awk "NR==$LINE_NUM" $FULL_COHORT | cut -d',' -f1)
TASK=$(awk "NR==$LINE_NUM" $FULL_COHORT | cut -d',' -f2)

TRIM_DIR="/group/rwhuskeygrp/trimming/${SUB}/${TASK}"
OUT_DIR="/group/rwhuskeygrp/merging"
mkdir -p ${OUT_DIR}
cmd="fslmerge -tr ${OUT_DIR}/${SUB}_task-game_${TASK}_bold_space-MNI152NLin2009aAsym_preproc_merged.nii.gz ${TRIM_DIR}/{0010..0129}.nii.gz 2.0 "

# Setup done, run the command
echo Running task ${SLURM_ARRAY_TASK_ID}
echo Commandline: $cmd
module load FSL
eval $cmd
exitcode=$?

# Output results to a table
echo " ${SLURM_ARRAY_TASK_ID}    $exitcode" \
      >> ${SLURM_JOB_NAME}.${SLURM_ARRAY_JOB_ID}.tsv
echo Finished tasks ${SLURM_ARRAY_TASK_ID} with exit code $exitcode
exit $exitcode

make a dir to save your log file mkdir -p /group/rwhuskeygrp/slurm_logs/merging/
submit your job with: $ sbatch merging_parallel.sh
run $ squeue -u $(whoami) to check if your job is running appropriately

How to compute QCFC correlation using XcpEngine

Find network files in xcpengine output dir and then copy the file in a new dir find <your xcpengine output dir> -name "*_power264_network.txt" | xargs -I % cp -p % <your target dir>/network/
Find files describing quality control indices in xcpengine output dir and then copy the files in a new dir find <your xcpengine output dir> -name "*_relMeanRMS.txt" | xargs -I % cp -p % <your target dir>/qc/
Navigate to the fcqc dir cd <your target dir>
Define variables run=$(ls network | cut -d '_' -f 2 | sort | uniq)
Define variables sub=$(ls network | cut -d '_' -f 1 | sort | uniq)
Make a cohort file for XcpEngine processing echo "id0,id1,motion,connectivity" > cohort.csv
Extract information from the quality control files and network files, and export the information to the cohort file for i in $sub; do for j in $run; do echo "${i},${j},$(find -name "${i}_${j}_relMeanRMS.txt" -exec cat {} +),/data/network/${i}_${j}_power264_network.txt">>cohort.csv; done; done
Make qcfc derivatives using XcpEngine utility QCFC docker run --rm -it --entrypoint /xcpEngine/utils/qcfc.R -v <your target dir>:/data pennbbl/xcpengine:latest -c /data/cohort.csv -o /data/qcfc
Copy the atlas files (power264MNI.nii.gz) of power et al. (2011). This could be found from here
Final step: make plot and derivatives using XcpEngine utility QCFC Dependence docker run --rm -it -v <your target dir>:/data --entrypoint /xcpEngine/utils/qcfcDistanceDependence pennbbl/xcpengine:latest -a /data/power264MNI.nii.gz -q /data/qcfc.txt -o /data/fcdistance -d /data/distanceMatrix.txt -f /data/figureOutput.png -i /data/temp

How to deface your brain images using PyDeface

Make sure your workstation has the dependencies

Package	Tested version
FSL	6.0.2
Python 3	3.7.3
NumPy	1.17.1
NiBabel	2.5.1
Nipype	1.3.0-rc1

Create a tmux session if you are ssh at a remote workstation tmux
Navigate to your data dir cd <your data dir> and make a files_list variable using command files_list=$(ls */anat/*.nii.gz)
- NB you only need to deface the high resolution scans (hence anat in the command above)
For each file in your file_list, run the PyDeface function for file in $files_list; do echo $file; pydeface $file; done

How to clean identifying information from BIDS json files

Navigate to your data dir cd <your data dir> and make a files_list variable using command files_list=$(ls */*/*.json)
Check your work echo $files_list
To delete PerformedProcedureStepStartTime run the following command: for file in $files_list; do echo $file; jq 'del(recurse|.PerformedProcedureStepStartTime?)' $file > $file.edit; done
Now, remove all the old json files that contained identifying information by running for file in $files_list; do echo $file; rm $file; done
Time to rename all the files so that they are not appended with .edit. Start with: files_list=$(ls */*/*.edit)
Then run echo $files_list
Then for file in $files_list; do echo $file; mv -- "$file" "${file%.edit}"; done
For more details on that last step, see https://www.howtogeek.com/423214/how-to-use-the-rename-command-on-linux/
Repeat for other information in the json files (e.g., AcquisitionTime, StudyTime, PerformedProcedureStepStartTime)