KEfED OBI Model - SciKnowEngine/SciKnowGraph GitHub Wiki
Modeling KEfED in OBI - First Attempt
Following feedback about disconnects between our model and the base OBI design and IEDB's implementation of elements of the model. We sought to redefine the KEfED formalism as closely aligned to existing OBI data as possible.
To attempt to be able to fully express the KEfED methodology, we added some terms to OBI in this extension: obi_kefed.ttl Base OBI ontology: obi.ttl James Overton's example driven from Panepinto 2002 experiment: Panepinto2002.ttl
As a worked example for the Richardson paper, we fleshed out the example in diagrams to show how the model defined above could be drawn out in this formalism.
Elements
Here we show the existing ontological elements we will be using for this example. These are all existing OBI classes.
Modeling the Protocol
Again, here are the various classes and Object properties we will be using to model the skeleton of the workflow.
We provide two extensions to the existing OBI model here:
- a
planned process
canprovide input to
andreceive input from
another planned process. - We introduce the notion that a
planned process
canhave a first part
, meaning that the parts of a process can have a starting point defined.
Modeling how variables parameterize the Protocol Steps
Following James' work with Panepinto, we here use the three types of variables in a study design (whilst also defining a new class: study design variable
as the single parent of all three). We also define the parameterizes
object property to permit such variables to be connected to elements in the workflow.
Importantly, the relationship between value specification
and the values that they can take is left open. We will specify it in more detail within our worked example.
Rewriting the Richardson et al. example
Here we define the elements needed for the simple top level Richardson example.
Elements
There are new elements:
name measurement datum
andnominal value specification
classes to denote data that corresponds to named elements. This could be extended to denote naming conventions in various domains. This is different from thecategorical measurement datum
andcategorical value specification
since we are referring to data that are simply names of things rather than being groups of things (note: 'Nominal' might need to be changed, since that has some proscribed meaning corresponding closely to the categorical value specification)ontological term value specification
is designed to accommodate the common situation where we want to use an ontological data structure as the value of a given variable.natural language measurement datum
andnatural language value specification
classes denote data that corresponds to simple free text in data fields.
Protocol
The representation links a study design
instance to a protocol
instance. This is realized
and concretized
by a single high level planned process
that denotes the entire workflow. This process is then decomposed by has part
properties into constituent processes that use the has specified input
and has specified output
properties to denote interactions with lower-level planned process
instances.
We also use the is first part of
property to denote the starting point of the process (which is crucial for metadata propagation).
Variables
Here, we define several instances of the 'study design constant variable, 'study design independent variable
, and study design dependent variable
classes and link them using our new parametrizes
object property to elements of the protocol.
Value Specifications
Finally, we specify various value specifications, with some informal attempts to provide details of specifications for specific data types. At present, we specify the immunogen
and antigen
variables by pointing to an appropriate instance
of an immunogen or antigen class as specified in OBI.
Note that this representation only represents a set of possible experiments that could be performed by laying out the study design and protocol and saying what potential values each constant, parameter and measurement could take. Under KEfED, this would then be instantiated by performing metadata propagation from the start of the protocol to any dependent variables that provide measurements.
The end product of this work is this model: pmid9499101_f3+4.ttl
Schema Diagrams for KEfED-OBI model
Basic design of the data model in the software. Ideally, I would like to store the data as JSON-LD formatted as 'kefed-extended-OBI' data, based on the core OBI ontology with a relatively small number of extensions defined for KEfED.
In order to make this representation as explicit as possible, when instantiated for the example presented from the Richardson et al. 1998, the data look like this (for a single line in the data table). Note also that this reflects a calculation that aggregates the data over all entries in the table. The data in the papers themselves are MUCH more complex and have a more detailed data structure.
This is essentially an extension of the initial work from modeling the 4 study papers (with some key differences). This forms the basis for our Karma-based modeling of actual data from the IEDB publicly-released database.