Deformation Based Morphometry MBM.py with atlas - GarzaLab/Documentation GitHub Wiki

By Gabriel A. Devenyi (CobraLab)

Are you working with rats and mice? Keep reading.

MICe-build-model is a toolkit that allows you to perform image registration on mouse/rat brains. This works by trying to fit two ore more images into an average. Differences between the average image and individual images allow us to calculate differences between 2 or more groups of mice/rats. For detailed information on how the pipline works at every stage, please consult: https://wiki.mouseimaging.ca/display/MICePub/MiceBuildModelAlgorithm.

This is done only in ADA-LAVIS.

In order to run MICe-build-model, you will first create a run directory, ex: mbm_run_date.

First, you must create a .csv file with a list of absolute paths to all of your input files. The first row of your .csv file should be "file", following by the list of scan paths.

Create a config: ADA.cfg file to override the defaults as follows:

[ADA]
queue-type=sge
queue-name=all.q
time-to-accept-jobs=180
mem-request-attribute=mem_free
mem=46
num-executors=240
#resolution=0.1
init-model=/mnt/MD1200A/QUARANTINE/resources/Fischer344/formbm/Fischer344.mnc
lsq6-simple=True
lsq12-protocol=/mnt/MD1200A/QUARANTINE/resources/protocols/linear/Pydpiper_default_lsq12_protocol.csv
no-nuc=True
maget-atlas-library=/mnt/MD1200A/QUARANTINE/resources/Fischer344/Fischer344_minc
maget-masking-method=ANTS
maget-registration-method=ANTS
maget-masking-nlin-protocol=/mnt/MD1200A/QUARANTINE/resources/protocols/nonlinear/Pydpiper_mincANTS_SyN_0.1_Gauss_2_1_40_micron_MAGeT_one_level_MASKING.pl
maget-nlin-protocol=/mnt/MD1200A/QUARANTINE/resources/protocols/nonlinear/Pydpiper_mincANTS_SyN_0.1_Gauss_2_1_40_micron_MAGeT_one_level.pl

You can see this one runs with maget-brain. Module load and check the MBM.py --help to show the other options to not run it if you don't need it.

Next, you'll create your run script, which will look something like this:

module load imagemagick/7.0.5.4 minc-toolkit-v2/1.9.17 torch7/29jul2019 minc2-simple/2.2.30 minc-toolkit-extras/24abr2019 minc-stuffs/0.1.24 python36/3.6.8 pydpiper/2.0.13 parallel/20190522
module load qbatch/2.1.3 ants/9may2019

export PYDPIPER_CONFIG_FILE=/mnt/MD1200B/egarza/egarza/fatrat/mbm/ADA.cfg

MBM.py \
--verbose \
--pipeline-name=fatrat_04JUL2019 \
--num-executors 48 \
--csv-file /mnt/MD1200B/egarza/egarza/fatrat/mbm/input.csv \
--no-nuc \
--lsq6-simple \
--run-maget \
--output-dir /mnt/MD1200B/egarza/egarza/fatrat/mbm/output

MBM.py: the pipeline that will run the registrations. Running MBM.py --help in your terminal will give you a list of all the pipeline options.

--verbose: will print jobs to screen

--pipeline-name: whatever name makes sense to you

--num-executors: number of processing units you will use

--csv-file: your input csv-file. You can also just use --files and point to a folder with your input files.

--bootstrap: uses the first scan as the target scan (instead of the atlas) for the registrations

--no-nuc: turns off non-uniformity correction

--lsq6-simple: this is useful if your scans are well aligned (it will cut down on computational time). The default is --lsq6-large-rotations, which you may want to use if your scans are not aligned (more that 15 degrees offcenter)

--no-run-maget: turns off the MAGeT run.

--no-common-space-registration: self-explanatory.

Quality Control

Did it do a good job? QC'ing is a must before moving on to look at your results. Your results are meaningless if the process failed at some point.

Inside the folder where you conducted your 1 or 2-level run, there should now be a few more folders. You'll notice a first level folder (where each subject's different timepoints were registered together) and a second level folder (where each subject's average was then itself registered to other subjects in one final big average). There are also a ton of logs, notably a pipeline.log which you can open in a text editor if you'd like.

Within the first level folder, you will find a "processed" directory for each subject. In each of these there will be various directories--two of these are the subject's lsq6 and lsq12 directory. In each of these directories should be lsq[6 12]_montage.jpg files. Use display (lowercase) to open the png. Three slices at different orientations for each scan appear. So if subject 1 has 3 timepoints, three scans (with 3 orientations each) show up plus, for the lsq12 you'll also see the final average (also 3 orientations). The average is the one on the bottom right. Check to make sure the orientations and sizes are similar and in the same place. If all scans look aligned in the same way you're good, but if one or multiple scans look to be oriented differently, registration failed.

You can use the Display tool (part of "minc-toolkit" library) to visualize these images.

Now it's time to QC the second level registrations. In the second level _lsq12 folder, there will be an "LSQ12_montage.jpg" that you can use to see if your LSQ12 stage worked for you second level.

#Mask for nlin-3 One more thing for you to check: how is the final mask that your one or two-level produced? This will be important when you start analyses.

In your run folder, there should be a directory called "[pipeline-name]_second_level-nlin". Go there. Inside you should find multiple mncs that are the averages of the images ([pipeline-name]-nlin-[1 2 3].mnc) and the masks ([pipeline-name]-nlin-[1 2 3]_mask.mnc). They will be differentiated by numbers. We will focus on the last one (with the highest number). To check how good your mask is, type this in your terminal: Display [pipeline-name]-nlin-3.mnc -label [pipeline-name]-nlin-3_mask.mnc (remember to module load minc-toolkit)

In the Display:Menu, click "d" on your keyboard followed by another "d". This will grayscale your background average.

You should see a semi-transparent red mask displayed on a gray background average. Here is mine:

This mask is over segmented, meaning it's including a lot of non-brain voxels as brain. This is especially apparent at the ventral regions; check out those olfactory bulbs! If your mask lines up well, you're good to move on to the next section, but if it looks more like mine, don't worry! There's a fix!

The problem is that our mask is using a very generous function for considering voxels part of the mask called max (voxel-wise maximum). Any voxel that is labelled as a mask in individual scans is considered part of the brain in the final average mask as well. A more conservative approach is voxel-voting, where a voxel must be part of multiple masks for it to end up in the final average.

So, to fix your over-segmented mask, follow these steps:

Load the same modules you used to run pydpiper (check the run-script.sh):

If you run 'minchistory [your final mask]', you will see the command used to generate the original nlin-3 mask. You should see the following: mincmath -clobber -2 -max \ <a long list of files> \ nlin-3_mask.mnc.

To generate a better mask, we're going to use the voxel_vote command instead of mincmath and apply it to the files listed by the minchistory command. This command works as follows: voxel_vote <list of inputs.mnc> output_mask.mnc. To do so, move up one directory to your derivatives directory, and run voxel_vote <paste long list of filenames> statsmask.mnc.

Now you can display your new mask over your old average. Looks much better, eh? See an example below.

Go ahead and use your improved mask in your analysis!

Looking at Results

Assuming your run has passed QC, the next step should be to look at your actual results and make fancy brain maps to impress the boss.

As always, make a folder to run your statistical analysis in.

Make a csv with your subject numbers, whatever information you might want to include in your model (timepoint will be present in a 2 level model building as it is longitudinal, then sex, genotype etc), and whatever jacobians you're interested in.

For R to do statistics on your csv, it's easiest to put all these categories into their own labelled columns. Try to avoid special characters in your column headers. Underscores are fine however. Have subject number in one column, sex in another, etc.

In 2-level data, when it comes to scan timepoint however, you'll have multiples (due to longitudinality). R likes to have one row per scan, so the csv shown above simply runs through all the subjects for timepoint 1, then repeats all subjects (and all their other info) for timepoint 2 and then again for timepoint 3. R doesn't mind redundancy and doesn't care if subject 1a is listed as being female 3 separate times or just once. Things are organized this way so that every scan can have an associated jacobian (deformation value) file listed with it.

The csv should contain columns with the jacobians you're interested in, listed as absolute paths. So where do you find them? The twolevel_model_building.py will output two .csv files: resampled_determinants.csv and overall_determinants.csv, which have absolute paths to the Jacobian determinants created from your analysis, as well as the corresponding group ID (the same one you specific in your input csv). For your statistics, you'll want to use the Jacobians from the resampled_determinants.csv, as these are resampled into the study average space.

For both of these .csv, you'll have columns with paths to both the Absolute Jacobians (log_full_det) and the Relative Jacobinas (log_nlin_det).

Relative Jacobian determinants explicitly model only the non-linear part of the deformations and remove residual global linear transformations (attributable to differences in total brain size). Absolute Jacobians include the overall linear transformations in addition to the nonlinear ones, so they give you information on overall brain differences. Choosing which ones to look at depends on your population of interest. For a more developmental cohort, you may choose the absolute Jacobians, for a study in which you don't expect large differences in brain volume, Relative Jacobians may be more appropriate. As a rule of thumb, it's always good to look at both.

The next thing to look at is blurring factor, which shows up near the end of the file (0,0.2,). This allows some amount of deformation "bleeding" from one voxel to the nearby voxels. Some amount of influence on neighbouring voxels is necessary. It's unlikely that only a single voxel would grow or shrink in isolation, it must "tug" on surrounding tissue. It smooths out the deformation field which you can imagine as a 3D grid embedded in your scans. Generally, 0.2 is what you want to use.

In the terminal, navigate to the folder you made in step 1 for the statistical analysis. Load up the following modules in this order:

Start rstudio

module load R/3.4.0
module load rstudio
module load R-extras/3.4.0
module load minc-toolkit/1.9.16
module load RMINC/1.5.1.0^minc-toolkit-1.9.16^R-3.4.0
module load rstudio
rstudio

Set the working directory for rstudio to the folder that you made for this analysis.

Write and run a script to model your Jacobians.

For 1-level data (group): mincLm is the place to start. For 2-level data (group x timepoint): the linear mixed effect model (LMER) formula will be different depending on how you want to build your model and what you want to test. "squeak" is the data frame being calling from. Note that you can input a mask here which greatly speeds up the program because it tells R where NOT to compute the model. Parallel processing can speed things up, you can use the snowfall to dedicate more cores.

Degrees of freedom is so messy for LMERs that an entire program is called here to estimate how many there are for the purposes of statistical testing.

FDR or "false discovery rate" corrects for multiple comparisons and the chance of finding significant results when doing the same test over and over and over again. You expect some things to pass by chance. You can see if any voxels pass significance after correction (have different slopes, aka getting bigger or smaller over timepoints on average by group), and at what t-values are necessary to pass at what thresholds you set.

Example below of 2-level model with commentary:

> library(ggplot2)
> library(RMINC)
> library(lme4)
#load necessary libraries

#in a separate R file in the folder where you run your analyses, call it "sge_resources.R", with the following content:
#cluster.functions <- makeClusterFunctionsSGE(system.file("parallel/sge_script.tmpl", package = "RMINC"))
#default.resources <-
# list(nodes = 1,
#       cores = 2, ##if the model run fails due to "error in qMincRegistry", then increase the number of cores, save "sge_resources.R", then close and rerun your analysis R file, until it works. 
#       walltime = "01:00:00")

> options(RMINC_BATCH_CONF = "sge_resources.R")  

> squeak<-read.csv(file="jacobian.csv",head=TRUE,sep=",")
#make thing called squeak which summons your csv with jacobians

> mask = "/path/tomask.mnc"
# create an object that is the absolute path of the mask you created running your two-level
# the mask can be found in the nlin directory, which includes nlin3 average and a corresponding mask  
    
> squeak$sex = relevel(squeak$sex, ref = "f")
#set female as default for comparison
    
> squeak$treat_group = relevel(squeak$treat_group, ref = "sham")
#set sham as default for comparisons
    
> filtered.squeak = subset(squeak, ! squeak$treat_group == "none")
#subset data to exclude skullhole controls "!" means do not include
    
> cheese<-mincLmer(reljacob02 ~ treat_group + timepoint + (1|mouse_batch), squeak, mask = mask, parallel = c("sge", 200))
#actual model. Things after + are covariates (sex) and | expressions are random effects to be taken out, reljacob02 is the column header that contained the absolute paths for jacobian files
    
> cheese = mincLmerEstimateDF(cheese)
#degrees of freedom estimator
    
> FDR = mincFDR(cheese, mask= mask)
#do FDR correction on cheese
    
> FDR
#displays FDR corrected stats for your run

This will generate output that looks like this:

Degrees of Freedom: 188.002164322792 188.002164322789 188.002164322792 188.002164322795 188.002164322789 
FDR Thresholds:
     tvalue-(Intercept) tvalue-genotype2 tvalue-timepoint2 tvalue-timepoint3
0.01           2.801642               NA          2.929002          2.807175
0.05           2.148982               NA          2.259712          2.157496
0.1            1.816631               NA          1.917197          1.825371
0.15           1.599764               NA          1.693466          1.608211
0.2            1.432213               NA          1.520069          1.440741
     tvalue-timepoint4
0.01          2.789115
0.05          2.141490
0.1           1.810780
0.15          1.594782
0.2           1.428045

Here are displayed the various components of the formula you constructed with LMER and any t-values that passed FDR at the thresholds listed here. Reference groups set that condition as 0 or baseline. Timepoint 1 was reference in this case, which means the model tested if timepoint 2, 3, and 4 were different. All timepoints tested contained voxels which were significantly different at each threshold, from the less stringent .2 (20% FDR) to the more stringent 0.01 (1% FDR). For instance, at timepoint 3, there were voxels that passed 0.01 FDR. The t-values of these significant voxels will be greater than 2.929002. Positive t-score indicates greater than reference group, negative indicates smaller than reference group. In this case, timepoint 3 scans were significantly larger than timepoint 1 in our case, simply because these younger mice aged 1.5 months in between. NAs indicate no voxels were significantly different at those thresholds.

You can then map the T-values in one of these columns by writing them into a minc file. To to write timepoint 3 to mnc format, we use the mincWriteVolume command.

In this case, the data frame now containing our FDR and t-stat info is now called "cheese". It is followed by a chosen name of the new t-stat mnc file, and what column of t-stat data you'd like to write. The new file is saved in whatever your current working directory is.

> mincWriteVolume(cheese, "myt-statsmap.mnc", column = 'tvalue-timepoint3')
#makes a minc file that puts your t values onto voxels, choose column of tstats from the FDR output

Once you've written your t-values into a minc, you can pull it up in display, mapped on top of the final, average brain.

Hunt for the final average. Go back to your 2-level run folder and look in the "*_secondlevel" folder this time. Then choose the second_level_nlin folder, then second_level-nlin-3.mnc file. Copy it into your R-stats analysis folder where your t-stats map minc file is living.

Navigate terminal back to the R-stats folder and open both files in Display. For example:

> Display second_level-nlin-3.mnc myt-statsmap.mnc

This should conjure your t-stats map on top of your final average.

Keep the FDR output summary handy. Using the t-value cut-off for whatever threshold you choose, you can slide the max and min for the displayed t-values around on the left hand scale of Display. If you want to display t-values at 0.05 FDR cutoff for instance, you'd choose to put the minimum at 2.259712 and max as high as the values actually go. For a more complete guide on changing Display's options to make the map visualizations more visually appealing, see

https://en.wikibooks.org/wiki/File:Display_MINC_toolkit_overlay_images_input.png