Preprocessing - theunissenlab/lab-documentation GitHub Wiki
Original Page: https://github.com/gallantlab/lab-knowledge/blob/master/old_wiki/python_fmri_preprocessing.md
Instructions for setting up couchdb: https://github.com/gallantlab/lab-knowledge/blob/master/old_wiki/computers/sysadmin_stuff/adding_a_couchdb_database.md
Overview
Getting the code
In the directory you want the code to live in:
git clone
https://github.com/alexhuth/docdb
You will be on the master branch by default.
Viewing documentation
The code is vaguely documented and reasonably clean. You can see a nice
epydoc version here: /auto/k1/huth/python-docs/docdb-module.html
Setting up a database
go to the HTML interface at peyote:5984/_utils/
ALTERNATIVE: To create a database from the command line, Mark has an iPython notebook with commented code that can help.
How it works
The two important classes that you need to know about in order to roll your own preprocessing pipeline are the Action and the ActionGraph. As you might guess from the name, an Action is a verb that does something like motion correct, detrend, or slice timing correct a run.
Actions
An example of an Action. This is an instance of a MotionCorrect
Action, which has one input (a 4D image to be motion corrected), two
outputs (a set of tranformations that carry out the motion correction
and the transformed image), and two parameters shown here (the number of
degrees of freedom of the correcting transformation and the reference
volume
number).
An Action is comprised of three important data structures: (1) a set of named inputs, (2) a set of configuration parameters, and (3) a set of named outputs.
When you construct an Action you give it two things: parameters and inputs. The Action automatically creates its outputs when it needs to. The parameters are given as a dictionary with no remarkable structure. The inputs are a bit more complicated: they are given as a dictionary of 2-tuples (pairs) with the following structure:
input_name:(originating_action, action_output_name)
For example:
"reference":(loaddicom, "image")
In this example, we want our Action to have an input called "reference" which we want to define as the output named "image" from the action "loaddicom". This is kind of like we're connecting a wire from the "image" output of "loaddicom" to the "reference" input of our action.
Here is an example of some code to construct the Action shown in the nearby figure and its predecessor:
- Start database client, this needs to be attached to each action
import docdb
from glab_constants import *
docdbi = docdb.DocDBClient(SERVERADDRESS, SERVERPORT)
from docdb.actions import ImportDicom, MotionCorrectFSL
- Construct action to import dicom files (the parameter dictionary is built separately just to make things clean)
mydicomdir = "/path/to/dicom/dir"
iaparams = dict(dicomdir=mydicomdir,
experiment_name="test", block_number=0)
importaction = ImportDicom(inputs={}, params=iaparams, dbinterface=docdbi)
- Construct action to run motion correction
mcaction = MotionCorrectFSL(inputs=dict(image=(importaction, "image")),
params={}, dbinterface=docdbi)
This short script demonstrates everything you need to know about how to construct actions. Step by step:
- Start the database client. This allows our actions to construct and fetch database-mapped objects.
- Construct the ImportDicom action. This is the only preprocessing action that has no inputs (and so we pass it an empty dict for its inputs). We also pass it a dictionary of parameters, including the path to the dicom directory to be imported, the name of the experiment, and the block number. These metadata will be very useful in the future.
- Construct the MotionCorrectFSL action. This action takes one input called "image", so we pass it a dict with an item called "image" that points to a pair, (importaction, "image"). This tells the MotionCorrectFSL action that it should grab its "image" input from the "image" output of importaction. Notice that we pass MotionCorrectFSL an empty parameter dict, but we know that it takes parameters. It just so happens that every configuration parameter for MotionCorrectFSL has a default value, so we can pass it no parameters and it assumes all defaults.
To check what inputs and parameters any Action takes, you can either
look directly at its source or, once it's loaded into your interpreter
namespace, take a look at MotionCorrectFSL.param_dict
or
MotionCorrectFSL.input_form
, e.g. These dictionaries are defined in the
class itself, and have a very stereotypical structure.
input_form
is always a dictionary of input_name:input_type
pairs,
where the input_type
is one of the database-mapped types (discussed
elsewhere?). param_dict
is a dictionary of dictionaries and is quite
descriptive.
An example of a few Actions connected together into a typical
preprocessing pipeline. Data is first imported, then motion corrected,
then registered to an outside image (actually skipping a step of taking
the temporal mean). The motion correcting and registration
transformations are concatenated and then applied to the original
imported
image.
The ActionGraph
Once you have defined some actions you need to define an ActionGraph. The ActionGraph holds references to all your actions and lets you do useful things like clone them and run them. Given the code snippet above, here is a snippet that will construct a corresponding ActionGraph and then run the actions:
actgraph = docdb.actions.ActionGraph()
actgraph.add([importaction, mcaction])
actgraph.run(docdbi, local=False)
Running a preprocessing pipeline
In the current paradigm there are two unique pipelines that are run for
each experiment: one for the "root" image that will serve as the
registration template, and the other is run on every other image. Using
this system you only need to define these two pipelines, and then you
simply replicate the second pipeline across all the other runs. For
reference you should see the file
actionfmridatabase/scripts/run_shinji_wholehead_movie_1.py
or
scripts/run_testdata.py
, which demonstrate all the important steps in
preprocessing.
Fixing mistakes
Sooner or later you will screw something up in one of your pipelines, or want to re-do a stage of processing with different options, or whatever. (Note that if your screw-up is bad enough, the whole pipeline will crash and not run to completion, and the code will clean up after itself as if nothing ever ran. The following discussion only deals with minor bugs, which have not prevented the code from running to completion, but which you nonetheless want to re-do for some reason). The great good of this system is that it leaves a record (in the database documents) of every action that has been performed on a given data set - but this also requires some care when you are un-doing or re-doing things.
To re-visit an action or the result of an action (a document in the database), use
docdbi = docdb.getclient()
D = docdbi.query(experiment_name='My_Experiment_Name',generated_by_name='<whatever action created some document>')
- Note that this method of querying can NOT be used to recover actions, but only to recover documents (actions don't have generating actions). Also (AH- Is this a bug??) it seems that it's impossible to query for actions directly by searching for ( name='...', type='action', experiment_name='...')
- Instead, you have to query for a document, and then you can traverse the action tree up or down to find the branch of the action graph that you want to clip / redo:
A = docdbi.query(experiment_name='My_Experiment_Name',generated_by_name='<whatever action created some document>')
Branch = A.get_all_children()