Preprocessing - theunissenlab/lab-documentation GitHub Wiki

Original Page: https://github.com/gallantlab/lab-knowledge/blob/master/old_wiki/python_fmri_preprocessing.md

Instructions for setting up couchdb: https://github.com/gallantlab/lab-knowledge/blob/master/old_wiki/computers/sysadmin_stuff/adding_a_couchdb_database.md

Overview

Getting the code

In the directory you want the code to live in:

git clone https://github.com/alexhuth/docdb

You will be on the master branch by default.

Viewing documentation

The code is vaguely documented and reasonably clean. You can see a nice epydoc version here: /auto/k1/huth/python-docs/docdb-module.html

Setting up a database

go to the HTML interface at peyote:5984/_utils/

ALTERNATIVE: To create a database from the command line, Mark has an iPython notebook with commented code that can help.

How it works

The two important classes that you need to know about in order to roll your own preprocessing pipeline are the Action and the ActionGraph. As you might guess from the name, an Action is a verb that does something like motion correct, detrend, or slice timing correct a run.

Actions

An example of an Action. This is an instance of a MotionCorrect Action, which has one input (a 4D image to be motion corrected), two outputs (a set of tranformations that carry out the motion correction and the transformed image), and two parameters shown here (the number of degrees of freedom of the correcting transformation and the reference volume number).

An Action is comprised of three important data structures: (1) a set of named inputs, (2) a set of configuration parameters, and (3) a set of named outputs.

When you construct an Action you give it two things: parameters and inputs. The Action automatically creates its outputs when it needs to. The parameters are given as a dictionary with no remarkable structure. The inputs are a bit more complicated: they are given as a dictionary of 2-tuples (pairs) with the following structure:

input_name:(originating_action, action_output_name)

For example:

"reference":(loaddicom, "image")

In this example, we want our Action to have an input called "reference" which we want to define as the output named "image" from the action "loaddicom". This is kind of like we're connecting a wire from the "image" output of "loaddicom" to the "reference" input of our action.

Here is an example of some code to construct the Action shown in the nearby figure and its predecessor:

Start database client, this needs to be attached to each action

import docdb
from glab_constants import *
docdbi = docdb.DocDBClient(SERVERADDRESS, SERVERPORT)
from docdb.actions import ImportDicom, MotionCorrectFSL

Construct action to import dicom files (the parameter dictionary is built separately just to make things clean)

mydicomdir = "/path/to/dicom/dir"
iaparams = dict(dicomdir=mydicomdir,
                experiment_name="test", block_number=0)
importaction = ImportDicom(inputs={}, params=iaparams, dbinterface=docdbi)

Construct action to run motion correction

mcaction = MotionCorrectFSL(inputs=dict(image=(importaction, "image")),
           params={}, dbinterface=docdbi)

This short script demonstrates everything you need to know about how to construct actions. Step by step:

Start the database client. This allows our actions to construct and fetch database-mapped objects.
Construct the ImportDicom action. This is the only preprocessing action that has no inputs (and so we pass it an empty dict for its inputs). We also pass it a dictionary of parameters, including the path to the dicom directory to be imported, the name of the experiment, and the block number. These metadata will be very useful in the future.
Construct the MotionCorrectFSL action. This action takes one input called "image", so we pass it a dict with an item called "image" that points to a pair, (importaction, "image"). This tells the MotionCorrectFSL action that it should grab its "image" input from the "image" output of importaction. Notice that we pass MotionCorrectFSL an empty parameter dict, but we know that it takes parameters. It just so happens that every configuration parameter for MotionCorrectFSL has a default value, so we can pass it no parameters and it assumes all defaults.

To check what inputs and parameters any Action takes, you can either look directly at its source or, once it's loaded into your interpreter namespace, take a look at MotionCorrectFSL.param_dict or MotionCorrectFSL.input_form, e.g. These dictionaries are defined in the class itself, and have a very stereotypical structure.

input_form is always a dictionary of input_name:input_type pairs, where the input_type is one of the database-mapped types (discussed elsewhere?). param_dict is a dictionary of dictionaries and is quite descriptive.

An example of a few Actions connected together into a typical preprocessing pipeline. Data is first imported, then motion corrected, then registered to an outside image (actually skipping a step of taking the temporal mean). The motion correcting and registration transformations are concatenated and then applied to the original imported image.

The ActionGraph

Once you have defined some actions you need to define an ActionGraph. The ActionGraph holds references to all your actions and lets you do useful things like clone them and run them. Given the code snippet above, here is a snippet that will construct a corresponding ActionGraph and then run the actions:

actgraph = docdb.actions.ActionGraph()
actgraph.add([importaction, mcaction])
actgraph.run(docdbi, local=False)

Running a preprocessing pipeline

In the current paradigm there are two unique pipelines that are run for each experiment: one for the "root" image that will serve as the registration template, and the other is run on every other image. Using this system you only need to define these two pipelines, and then you simply replicate the second pipeline across all the other runs. For reference you should see the file actionfmridatabase/scripts/run_shinji_wholehead_movie_1.py or scripts/run_testdata.py, which demonstrate all the important steps in preprocessing.

Fixing mistakes

Sooner or later you will screw something up in one of your pipelines, or want to re-do a stage of processing with different options, or whatever. (Note that if your screw-up is bad enough, the whole pipeline will crash and not run to completion, and the code will clean up after itself as if nothing ever ran. The following discussion only deals with minor bugs, which have not prevented the code from running to completion, but which you nonetheless want to re-do for some reason). The great good of this system is that it leaves a record (in the database documents) of every action that has been performed on a given data set - but this also requires some care when you are un-doing or re-doing things.

To re-visit an action or the result of an action (a document in the database), use

docdbi = docdb.getclient()
D = docdbi.query(experiment_name='My_Experiment_Name',generated_by_name='<whatever action created some document>')

Note that this method of querying can NOT be used to recover actions, but only to recover documents (actions don't have generating actions). Also (AH- Is this a bug??) it seems that it's impossible to query for actions directly by searching for ( name='...', type='action', experiment_name='...')
Instead, you have to query for a document, and then you can traverse the action tree up or down to find the branch of the action graph that you want to clip / redo:

A = docdbi.query(experiment_name='My_Experiment_Name',generated_by_name='<whatever action created some document>')
Branch = A.get_all_children()