Scientific Computing ‐ Step 3: Docker, Signularity, and BIDS - cogcommscience-lab/lab-docs GitHub Wiki

Why This Matters

Docker is a platform that allows users to create, deploy, and run applications in lightweight, portable containers that package software and its dependencies together. This is especially valuable for scientific computing because it ensures reproducibility, enabling researchers to share and execute code across different environments without worrying about software conflicts or system differences. By streamlining deployment and scalability, Docker also makes it easier to run complex simulations, manage dependencies, and leverage high-performance computing resources efficiently.

Docker’s ability to create self-contained, reproducible environments is particularly valuable in fields like neuroimaging, where complex software dependencies and standardized data formats are essential for reliable analysis. One prominent example of this is the Brain Imaging Data Structure (BIDS), which provides a universal framework for organizing fMRI and EEG datasets. By integrating Docker-based solutions, BIDS ensures that researchers can run analysis pipelines seamlessly across different systems, enhancing reproducibility and simplifying data sharing in neuroimaging research.

Our lab relies on both Docker and BIDS, so it is essential that you understand both.

Step 1: Docker Training (~4 Hours)

This self-paced lesson will introduce you to the basics of Docker. Please complete this training in the order listed below, remembering that the training is self-paced and has multiple spots where you can pause and return later.

NOTE: You can do this training on your personal computer, but I strongly recommend against this. Docker is already installed and configured on the lab workstations. I would encourage you to complete this training on the lab workstation.

Reproducible Computational Environments Using Containers
- Introduction to Docker Training
- NOTE: If you choose to do this on a lab workstation (you should), start at the verify installation instructions. Alternatively, if you do this on your own computer, start at the summary and setup instructions.
- Why? Our lab uses Docker--and Singularity which is essentially Docker for HPC--extensively. This is particularly true for fMRI analyses. The field is increasingly moving toward standardized and reproducible pre-processing and analysis pipelines, and those pipelines rely on Docker. Hence, you also need to know about Docker.

Step 2: Do Some Academic Reading (~5 Hours)

Now it is time to learn about analyses you'll need to do, and why they matter. These analyses rely on standardized workflows and Docker.

The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments
- Gorgolewski et al., 2016, Scientific Data
- Why? If we are going to use standardized workflows and procedures that simplify our data analysis via Docker, we need to organize our data in to a standardized way. The BIDS structure does that for us. This article explains what the BIDS structure is, why it matters, and how it works.
BIDS apps: Improving ease of use, accessibility, and reproducibility of neuroimaging data analysis methods
- Gorgolewski et al., 2017 PLOS Computational Biology
- Why? Now that you know what BIDS is, it's time to get a sense of how powerful standardization is. This article gives a nice overview of the BIDS ecosystem of analytical apps. It also sets you up to better understand the next two readings.
MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites
- Esteban et al, 2017, PLOS ONE
- Why? We need to examine the quality of our fMRI data. There are a lot of ways to examine the data's quality, and they are computationally expensive to implement. Historically, these data quality control routines were difficult to implement, but MRIQC + Docker make them easy!
fMRIPrep: a robust preprocessing pipeline for functional MRI
- Esteban et al., 2019, Nature Methods
- Why? Once the data's quality is examined, we need to pre-process the data. This is a complicated series of steps necessary to clean the data and prepare it for subsequent analysis. This also used to be a massively complex task to implement, and one that each lab implemented differently, which made it hard to compare results from different studies. Thankfully, fMRIPrep helps standardize the process, thereby making research more comparable and reproducible. fMRIPrep also makes what used to be a complicated and difficult task easier thanks to Docker.

Step 3: Bookmark These Tabs (5 Minutes):

You now understand how we use Docker + standardization via BIDS, and scientific computing. You'll need to start using these tools soon. And to do that, you'll need to read their documents as you work on your analyses. Reading package documentation and using those materials to inform your work is a standard practice for scientific computing, so it will be important for you to start developing these skills.

HeuDiConv:
- Software that transforms raw fMRI data (in the medical imaging DICOM standard) to a 4-dimensiona (4) .nii file (the fMRI data analysis standard). HeuDiConv is a pain to use, but it is also really powerful and flexible and worth learning.
- https://heudiconv.readthedocs.io/en/latest/index.html
MRIQC:
- MRIQC extracts no-reference IQMs (image quality metrics) from structural (T1w and T2w) and functional MRI (magnetic resonance imaging) data.
- https://mriqc.readthedocs.io/en/latest/
fMRIPrep:
- fMRIPrep is a NiPreps (NeuroImaging PREProcessing toolS) application (www.nipreps.org) for the preprocessing of task-based and resting-state functional MRI (fMRI).
- https://fmriprep.org/en/stable/#

Helpful Tips:

Take Your Time: The time estimates are just guides—don’t worry if it takes longer. This isn’t about getting it perfect, just building a solid starting foundation.
Feel Stuck? That’s completely normal when learning to program! If you run into issues, slack/email Richard, ask a lab-mate, or try Googling (it’s a skill you’ll use often). You can also ask ChatGPT to explain how code works. It's really good at that!