Defacing 3D anatomical data - GarzaLab/Documentation GitHub Wiki

Anonymization

When working with human participants, especially in a clinical setting, it's of uttermost importance to ensure data privacy when storing, analyzing, and sharing the data.

Even though the clinical and behavioral data goes through a process of pseudonymization, the identity of the participants may be still be revealed after the reconstruction of the skin surface of the 3D anatomical MRI. One way to avoid this is by defacing or removing the facial structure from the MR images.

Pydeface

We perform this pre-processing step with the tool pydeface developed by Poldrack's Lab.

Local computer

Pydeface is a light python program that can easily be installed and run on a local machine with an FSL installation, especially when running only a handful of participants.

Installation

To install it and its dependencies, run pip with the following code: pip install pydeface

Usage

pydeface path/to/file.nii.gz

The code is easily scriptable to be run on several subjects. If the data is already on BIDS, a viable script would be:

for file in data/sub-???/ses-*/anat/*.nii.gz; do
    pydeface $file
done

BIDS compliance

The output of pydeface will be a defaced niifti in the same directory with the _defaced prefix. This does not comply with the BIDS specifications and will throw an error when trying to validate the data.

Although there is a way of forcing pydeface to overwrite the original data with the --force and --outfile flags, it is never a good idea to overwrite the original data before reviewing the output files after any procedure.

Our preferred approach is to review the data for any errors or deletion of important data before deleting the original files, and finally renaming the defaced files.

for file in data/sub-???/ses-*/anat/*defaced.nii.gz; do
    mv --force $file ${file/_defaced/}
done

ADA HPC

When wanting to deface a dataset with a large number of subjects and sessions, running serially on a local computer might be very computationally expensive. The best option, when available is to make use of the computing power of ADA and its multiprocessing capabilities.

Installation

Because of the nature of HPCs, we don't have superuser permissions in ADA and thus are unable to install software. Fortunately, FSL is already installed as a module and pydeface can be installed locally for the user:

module load python37/3.7.6
pip install --user pydeface

Usage

There are two ways to run pydeface, or any program/script in ADA: by qlogin or qsub.

Qlogin

When there is no need to run several subjects in parallel and/or the user wants to run the program interactively, they can do so by qlogin.

ADA will interactively prompt the user for their password and allocate a session with a computer node. After loading both FSL and python with module, the usage is the same as when running on a local machine.

Qsub

Although running interactively does make use of the high computing power of ADA and will reduce the processing time considerably, the subjects are still run serially.

To take advantage of the multiprocessing and run pydeface on several subjects parallelly, the user must follow these steps.

List of subjects

First, a text file with the paths to all the subjects (to be processed) must be created so the different processes can be managed and scheduled. When wanting to deface only a subset of participants, we recommend to create a full list of the subjects and subset the processes in later steps.

ls data/sub-???/ses-*/anat/*.nii.gz > deface_list.txt

Script

The processes or tasks are submitted in a batch job with a bash script. The creation of these scripts is beyond the scope of this wiki, but here is an example of a working batch job script that loads the modules, parcels through the list, and schedules the tasks saving a log in a specified directory.

#! /bin/bash
#$ -S /bin/bash
#$ -N pydeface
#$ -V
#$ -l mem_free=20G
#$ -pe openmp 8
#$ -j y
# This is the path where the logs are saved. It must be an existing directory.
#$ -wd /mnt/MD1200B/egarza/user/logs

# Modules
module load python37/3.7.6
module load fsl/6.0.3

# Subject list
subs=/home/user/pydeface_list.txt
# With awk, each line of the list is extracting according to the task's number
sge_index=$(awk -v idx=$SGE_TASK_ID 'NR==idx {print $1}' $subs)

# Random sleep so that tasks dont start at _exactly_ the same time
sleep $(( $SGE_TASK_ID % 10 ))

# Main
pydeface $sge_index

Following BIDS, the script would be saved under data/code/pydeface.SGE. Personally, I use the .SGE extension to differentiate conventional bash scripts to those to be run by qsub, but the extension is insignificant.

Running

The user can now send the batch job with the qsub commnand. The -t flag must be used to specify the lines in the list (subject paths) to be run as tasks. This can be either the complete list or a subset of it:

qsub -t 1-100 data/code/pydeface.SGE

Job supervision

SGE will receive the batch job and assign an ID to it. The tasks will be managed and scheduled according to the disponibility of the resources. Each task enlisted by the -t flag in the qsub submission will be a participant. To see the status of the job and its tasks the user can use the qstat command.

The logs of specific tasks can be reviewed in real-time with the job id and the task number:

tail -f logs/pydeface.o23455.1

BIDS compliance

After all the jobs are completed, the user can review the logs and/or outputs for any errors and proceed with the overwriting of the files as previously mentioned.