Defacing 3D anatomical data - GarzaLab/Documentation GitHub Wiki
Anonymization
When working with human participants, especially in a clinical setting, it's of uttermost importance to ensure data privacy when storing, analyzing, and sharing the data.
Even though the clinical and behavioral data goes through a process of pseudonymization, the identity of the participants may be still be revealed after the reconstruction of the skin surface of the 3D anatomical MRI. One way to avoid this is by defacing or removing the facial structure from the MR images.
Pydeface
We perform this pre-processing step with the tool pydeface developed by Poldrack's Lab.
Local computer
Pydeface is a light python program that can easily be installed and run on a local machine with an FSL installation, especially when running only a handful of participants.
Installation
To install it and its dependencies, run pip with the following code:
pip install pydeface
Usage
pydeface path/to/file.nii.gz
The code is easily scriptable to be run on several subjects. If the data is already on BIDS, a viable script would be:
for file in data/sub-???/ses-*/anat/*.nii.gz; do
pydeface $file
done
BIDS compliance
The output of pydeface will be a defaced niifti in the same directory with the _defaced
prefix. This does not comply with the BIDS specifications and will throw an error when trying to validate the data.
Although there is a way of forcing pydeface to overwrite the original data with the --force
and --outfile
flags, it is never a good idea to overwrite the original data before reviewing the output files after any procedure.
Our preferred approach is to review the data for any errors or deletion of important data before deleting the original files, and finally renaming the defaced files.
for file in data/sub-???/ses-*/anat/*defaced.nii.gz; do
mv --force $file ${file/_defaced/}
done
ADA HPC
When wanting to deface a dataset with a large number of subjects and sessions, running serially on a local computer might be very computationally expensive. The best option, when available is to make use of the computing power of ADA and its multiprocessing capabilities.
Installation
Because of the nature of HPCs, we don't have superuser permissions in ADA and thus are unable to install software. Fortunately, FSL is already installed as a module and pydeface can be installed locally for the user:
module load python37/3.7.6
pip install --user pydeface
Usage
There are two ways to run pydeface, or any program/script in ADA: by qlogin
or qsub
.
Qlogin
When there is no need to run several subjects in parallel and/or the user wants to run the program interactively, they can do so by qlogin
.
ADA will interactively prompt the user for their password and allocate a session with a computer node.
After loading both FSL and python with module
, the usage is the same as when running on a local machine.
Qsub
Although running interactively does make use of the high computing power of ADA and will reduce the processing time considerably, the subjects are still run serially.
To take advantage of the multiprocessing and run pydeface on several subjects parallelly, the user must follow these steps.
List of subjects
First, a text file with the paths to all the subjects (to be processed) must be created so the different processes can be managed and scheduled. When wanting to deface only a subset of participants, we recommend to create a full list of the subjects and subset the processes in later steps.
ls data/sub-???/ses-*/anat/*.nii.gz > deface_list.txt
Script
The processes or tasks are submitted in a batch job with a bash script. The creation of these scripts is beyond the scope of this wiki, but here is an example of a working batch job script that loads the modules, parcels through the list, and schedules the tasks saving a log in a specified directory.
#! /bin/bash
#$ -S /bin/bash
#$ -N pydeface
#$ -V
#$ -l mem_free=20G
#$ -pe openmp 8
#$ -j y
# This is the path where the logs are saved. It must be an existing directory.
#$ -wd /mnt/MD1200B/egarza/user/logs
# Modules
module load python37/3.7.6
module load fsl/6.0.3
# Subject list
subs=/home/user/pydeface_list.txt
# With awk, each line of the list is extracting according to the task's number
sge_index=$(awk -v idx=$SGE_TASK_ID 'NR==idx {print $1}' $subs)
# Random sleep so that tasks dont start at _exactly_ the same time
sleep $(( $SGE_TASK_ID % 10 ))
# Main
pydeface $sge_index
Following BIDS, the script would be saved under data/code/pydeface.SGE
. Personally, I use the .SGE
extension to differentiate conventional bash scripts to those to be run by qsub
, but the extension is insignificant.
Running
The user can now send the batch job with the qsub
commnand. The -t
flag must be used to specify the lines in the list (subject paths) to be run as tasks. This can be either the complete list or a subset of it:
qsub -t 1-100 data/code/pydeface.SGE
Job supervision
SGE will receive the batch job and assign an ID to it. The tasks will be managed and scheduled according to the disponibility of the resources. Each task enlisted by the -t
flag in the qsub submission will be a participant. To see the status of the job and its tasks the user can use the qstat
command.
The logs of specific tasks can be reviewed in real-time with the job id and the task number:
tail -f logs/pydeface.o23455.1
BIDS compliance
After all the jobs are completed, the user can review the logs and/or outputs for any errors and proceed with the overwriting of the files as previously mentioned.