How to use the ROO to create a project specific linked data model - RadiationOncologyOntology/ROO GitHub Wiki

The ROO is an ontology but that is not sufficient in itself to create linked data. The ROO can be implemented in many, equally valid ways but if two data providers choose a different implementation, the data the provide is not (properly) linked data. This page describes one approach to create a project specific data model from the ROO.

What is being delivered

A project specific data model in OWL
A visualization of the project specific data model for non OWL experts
A mapping file for D2R for a CSV data file

The running example here is a request by a collaborator to use the Varian Learning Portal to learn and validate a model for non-small-cell-lung-cancer.

Prerequisites

JDBC driver for csv: (csvjdbc)
Database to RDF mapping engine (D2RQ)
JDBC driver for csv jar included in the lib\db-drivers folder of D2RQ
Protégé 5.0.0 (Protégé), note that Protege 5.2.0 has a bug saving ttl files so cannot be used
A public SPARQL endpoint available for uploading

1. Understand the data and get a dummy set / sample

The user(s) need to define what date they want to be made linked. Data elements, units, ranges and context (e.g. that the research is done in a selected patient group) need to be defined. In our example the request was for the following data elements:

For patients with non-small-cell-lung-cancer:

- Mean Heart Dose (gray)
- Volume of the PTV (ml)
- Survival at one year (true or false)
- ECOG performance status (ECOG 0-4)

Make or ask for a csv file with some (fake) data that captures the ranges. This will be later used to develop and show the mapping. An example can be found here: sample data.

2. Create a copy of the ROO, add project specific entities

Using Protégé load the ROO
Save the ROO as a new ontology using the Turtle format (Protégé seems to handle individuals better that way) e.g. [myproject].ttl
Change the ontology IRI to a project specific one e.g. http://www.cancerdata.org/roo/datamodel/project/[myproject]. Do not change the ROO entities (say no to this question that Protégé asks).
In the File->Preferences menu select Active ontology IRI and User Supplied Name (this makes visualization later easier)
Add project specific entities (ones that are not yet covered by the ROO)

3. Create data model through asserting individuals in Protégé

Create your data model:

Add for one patient all concepts you need as individuals
For sibling classes (e.g. ECOG0-4 are all sub classes of ECOG Performance Status): Make sure you assert for individuals which have siblings also the parent class (otherwise the parent class is not visible later)
Assert individuals for all sibling classes (e.g. ECOG0-4) and assert again the subClassOf the parent class
Link the individuals up with the properties you selected
State date properties and data types
Ontograf might at some point help with visualization, but it cannot be saved or exported, and if you have too many properties it fails as well, the below was made after reducing the data model to only the properties under consideration) images/ontograf.PNG

4. Load data model in a SPARQL endpoint and visualize using LD-VOWL

Upload your data model turtle file to a SPARQL endpoint (at the moment I can't get LD-VOWL to work on localhost)
Go to http://sparql.cancerdata.org/ld-vowl
Fill in your endpoint (e.g. http://sparql.cancerdata.org/namespace/gilles/sparql)
Click visualize
In settings
- make sure you have number of classes at 100 (or more)
- include http://www.w3.org/2002/07/owl#NamedIndividual as a blacklisted Class (otherwise these will dominate in the visualization)
include http://www.w3.org/2002/07/owl#Axiom as a blacklisted Class
Click stop, reload and check your data model
(Alternatively you can use LD-VOWL locally: (LD-VOWL)) Something like the picture below should appear images/ld-vowl.PNG

5. Make the DW2RQ mapping and upload additional triples

Using the VOWL visualization make a mapping file. Example can be found here: sample mapping
Dump the triples using D2RQ as Turtle (e.g. dump-rdf -o data.ttl -f TURTLE mapping.ttl) and upload to the SPARQL endpoint. An example results can be found here: target triples.
Re-visualize and check (step 4) the data model with LD-VOWL

6. Communicate artefacts with user and data provider

OWL: A project specific data model in OWL
LD-VOWL: A visualization of the project specific data model for non OWL experts
D2RQ, CSV and TURTLE: Source data, mapping, target triples