Organizing a project folder with BIDS - GlascherLab/LabWiki GitHub Wiki
Your project folder should adhere to the Brain Imaging Data Structure (BIDS) format. This is a standardized way to organize files and folder names and create specific metadata in a machine readable format (mostly JSON files). You can familiarize yourself with the BID format using this link.
The BIDS folder tree specifies certain folders and their contents. Here is one example for the top-level folders (and the description of their contents) from an fMRI study (matchpennies)
.
├── code project-specific script/programs etc.
├── containers docker/singularity container files (e.g. fmriprep)
├── derivatives preprocessed data, modeling results, statistical analysis
├── doc general documentation for the project
├── plots data plots and results figures
├── sourcedata the original datafiles (e.g. DICOM, BDF etc.)
├── sub-01 raw (unprocessed) data files for each subject
├── <sub-02 until sub-43> omitted
└── task all files for running the experiental task
The subject-specific folders (e.g. sub-01) has a subfolder for each data modality:
sub-01
├── anat anatomical image (structural T1)
├── beh behavioral data
├── eeg EEG data
├── eye eye-tracking data (non-standard name)
├── loc electrode location coordinates (non-standard name)
├── fmap field map (phase/magn images)
└── func functional MRI data (EPI)
If there are multiple data collection sessions (e.g. on separate days), then you can insert a ses-01 etc. level under the subject-level.
The derivatives contains all derived data and analyses (e.g. behavioral modeling, image preprocessing, statistical analyses etc.)
derivatives/
├── analysis statistical analyses
│ ├── first_level subject-specific analyses
│ │ ├── sub-01
│ │ │ └── model_onset name of analysis
│ │ ├── sub-02 (omitted)
│ └── second_level group-wise analyses
│ └── model_onset same name as in first_level
├── modeling computational modeling
│ ├── ActiveInference modeling approach 1
│ ├── kToM modeling approach 2
│ └── stan
└── preprocessed preprocessed neuroimaging data (parallel pipelines at this level)
├── sub-01 for each subject (this are the input to the statistical analyses
│ ├── anat preprocessed T1
│ ├── fmap preprocessed Fiedmap data (and derived images (e.g. VDM)
│ └── func preprocessed EPI
├── sub-02 (omitted)
The code folder contains general scripts at the top level and subfolders for specific scripts for each part of the derivatives.
code general scripts (e.g. in R or Matlab), further subfolders corresponding to subfolder in derivatives
├── R e.g. all scripts in R
├── first_level scripts for 1st level analyses (e.g. SPM batches)
├── preproc scripts for preprocessing (different pipelines in different folders)
└── second_level scripts for 2nd level analyses
The doc folder is for all kinds of documentation about the project. Useful subfolders are:
doc
├── grant grant proposal for the project (no read permission to "other", e.g. chmod o-rwx
├── ethics ethics proposal for the project (no read permission to "other", e.g. chmod o-rwx
├── papers useful papers
└── summary results summaries and presentations
For more information, please consult the BIDS documentation.
According to the BIDS specification each file and position in the BID tree should be identifable by its filename. This seems a bit over the top for me (because the location of the file details its provenance), but in some cases I have seen its value, especiall when rearranging parts of the BIDS tree. Although it makes scripting file names a bit awkward, please ato adhere to the BIDS file naming contention.
The general template for a file name is:
sub-XXX_task-XXX_run-XXX_<name>-<value>_<modality>.<suffix> (e.g. sub-01_task-sft_run-01_eeg.xdf)
The - component is spearated with underscores (_) from other components. Each filename starts with the subject (e.g sub-01) and should contain the task (e.g. task-sft) and the run number (e.g. run-01) and the final modality is separated with another underscore (e.g. _eeg).
More information can be found in the BID specification
There are two type of metadata files: JSON files (.json) with a special format for name-value combinations and tab-separated value files (.tsv), which are tables of data with subjects/trials/events as rows and variables as columns separate by a tab for better readability by humans. In addition, there is a unstructures README.MD file at the top level of the project folder (with Markdown formatting) explaining the project in general terms. This is the first entry into the dataset for a researcher unfamiliar with the project (e.g. when the data are published in and public repo)
These metadata contain important information that are also useful when writing up the paper (e.g. information about imaging parmaeters), so even collecting this information and creating a corresponding JSON file can appear as a waste of time, it will beocme in very useful later on. So, please take care and enter the information in the metadata files early and as you go through your project. Then the time investment is limited and you (and possible others) get to benefit for this effort later on.
Here is a list of some of the required BIDS metadata files. Most of these are located in the top-level projects bolder (aka $BIDSROOT)
-
$BIDSROOT/dataset_description.json: a specific JSON file with general information about the project (e.g. title, authors, acknowledgments, grant number etc.) -
$BIDSROOT/participants.tsv: a table with subject-specific information (e.g. demographic information, experimental condition, subject ID of partner in social interaction experiments) -
$BIDSROOT/participants.json: an accompanying side-car with a longer description of the variables (columns) inparticipants.tsv -
_events.tsv: a table with experimental events (e.g. trials in rows) and variables (e.g. CUE, CHOICE, OUTCOME) in columns. The table contains onsets for fMRI/EEG and accomapnies every data file in the func/eeg subfoler in the subject folder (e.g.sub-01/func/sub-01_task-sft_run-01_events.tsv) -
$BIDSROOT/events.json: a JSON side-car files with longer description of the events in ansub-XX..._events.tsvfile. -
_beh.tsv: table with behavioral data for each run in the experiment with trials as rows and variables as columns (e.g. stimulsu configuration, choices, outcomes, RTs etc.). These files reside in subject-specific folderssub-XX/beh. For convenience, I usually also keep a binary.mat(or in some other analysis software, e.g..Rdata) in the same folder. But the.tsvshould be master reference file, which is created from the original logfiles of the presentation softward (e.g. PTB) -
$BIDSROOT/beh.json: a side-car JSON file with long description (and possible values) of the variables in the_beh.tsvfiles. -
$BIDSROOT/{anat,epi,fmap,eeg}.jsonetc: JSON fiels with parameters for each imaging modality, very useful for the Methods section of the paper.
NOTE: events.json and beh.json are really important for understanding the (behavioral) data in the experiment. Please make sure tha the description in these JSON files is accurate and informative!
More information and template JSON files can be found in the BIDS documentation.
For an example of these files, please consult the files in the matchpennies project (fMRI) and tiger project (EEG hyperscanning). They are both on dendrite in /projects/crunchie/glaescher. The matchpennies BIDS project is almost complete (including the README.md file), the tiger project is still in progress (especially the metadata files)