KEfED OBI Model - SciKnowEngine/SciKnowGraph GitHub Wiki

Modeling KEfED in OBI - First Attempt

Following feedback about disconnects between our model and the base OBI design and IEDB's implementation of elements of the model. We sought to redefine the KEfED formalism as closely aligned to existing OBI data as possible.

To attempt to be able to fully express the KEfED methodology, we added some terms to OBI in this extension: obi_kefed.ttl Base OBI ontology: obi.ttl James Overton's example driven from Panepinto 2002 experiment: Panepinto2002.ttl

As a worked example for the Richardson paper, we fleshed out the example in diagrams to show how the model defined above could be drawn out in this formalism.

Elements

Here we show the existing ontological elements we will be using for this example. These are all existing OBI classes.

Modeling the Protocol

Again, here are the various classes and Object properties we will be using to model the skeleton of the workflow.

We provide two extensions to the existing OBI model here:

  1. a planned process can provide input to and receive input from another planned process.
  2. We introduce the notion that a planned process can have a first part, meaning that the parts of a process can have a starting point defined.

Modeling how variables parameterize the Protocol Steps

Following James' work with Panepinto, we here use the three types of variables in a study design (whilst also defining a new class: study design variable as the single parent of all three). We also define the parameterizes object property to permit such variables to be connected to elements in the workflow.

Importantly, the relationship between value specification and the values that they can take is left open. We will specify it in more detail within our worked example.

Rewriting the Richardson et al. example

Here we define the elements needed for the simple top level Richardson example.

Elements

There are new elements:

  1. name measurement datum and nominal value specification classes to denote data that corresponds to named elements. This could be extended to denote naming conventions in various domains. This is different from the categorical measurement datum and categorical value specification since we are referring to data that are simply names of things rather than being groups of things (note: 'Nominal' might need to be changed, since that has some proscribed meaning corresponding closely to the categorical value specification)
  2. ontological term value specification is designed to accommodate the common situation where we want to use an ontological data structure as the value of a given variable.
  3. natural language measurement datum and natural language value specification classes denote data that corresponds to simple free text in data fields.

Protocol

The representation links a study design instance to a protocol instance. This is realized and concretized by a single high level planned process that denotes the entire workflow. This process is then decomposed by has part properties into constituent processes that use the has specified input and has specified output properties to denote interactions with lower-level planned process instances.

We also use the is first part of property to denote the starting point of the process (which is crucial for metadata propagation).

Variables

Here, we define several instances of the 'study design constant variable, 'study design independent variable, and study design dependent variable classes and link them using our new parametrizes object property to elements of the protocol.

Value Specifications

Finally, we specify various value specifications, with some informal attempts to provide details of specifications for specific data types. At present, we specify the immunogen and antigen variables by pointing to an appropriate instance of an immunogen or antigen class as specified in OBI.

Note that this representation only represents a set of possible experiments that could be performed by laying out the study design and protocol and saying what potential values each constant, parameter and measurement could take. Under KEfED, this would then be instantiated by performing metadata propagation from the start of the protocol to any dependent variables that provide measurements.

The end product of this work is this model: pmid9499101_f3+4.ttl

Schema Diagrams for KEfED-OBI model

Basic design of the data model in the software. Ideally, I would like to store the data as JSON-LD formatted as 'kefed-extended-OBI' data, based on the core OBI ontology with a relatively small number of extensions defined for KEfED.

Data Model

In order to make this representation as explicit as possible, when instantiated for the example presented from the Richardson et al. 1998, the data look like this (for a single line in the data table). Note also that this reflects a calculation that aggregates the data over all entries in the table. The data in the papers themselves are MUCH more complex and have a more detailed data structure.  

Example [full-size image]

This is essentially an extension of the initial work from modeling the 4 study papers (with some key differences). This forms the basis for our Karma-based modeling of actual data from the IEDB publicly-released database.