How to use the ROO to create a project specific linked data model - RadiationOncologyOntology/ROO GitHub Wiki
The ROO is an ontology but that is not sufficient in itself to create linked data. The ROO can be implemented in many, equally valid ways but if two data providers choose a different implementation, the data the provide is not (properly) linked data. This page describes one approach to create a project specific data model from the ROO.
What is being delivered
- A project specific data model in OWL
- A visualization of the project specific data model for non OWL experts
- A mapping file for D2R for a CSV data file
The running example here is a request by a collaborator to use the Varian Learning Portal to learn and validate a model for non-small-cell-lung-cancer.
Prerequisites
- JDBC driver for csv: (csvjdbc)
- Database to RDF mapping engine (D2RQ)
- JDBC driver for csv jar included in the lib\db-drivers folder of D2RQ
- Protégé 5.0.0 (Protégé), note that Protege 5.2.0 has a bug saving ttl files so cannot be used
- A public SPARQL endpoint available for uploading
1. Understand the data and get a dummy set / sample
The user(s) need to define what date they want to be made linked. Data elements, units, ranges and context (e.g. that the research is done in a selected patient group) need to be defined. In our example the request was for the following data elements:
For patients with non-small-cell-lung-cancer:
- - Mean Heart Dose (gray)
- - Volume of the PTV (ml)
- - Survival at one year (true or false)
- - ECOG performance status (ECOG 0-4)
Make or ask for a csv file with some (fake) data that captures the ranges. This will be later used to develop and show the mapping. An example can be found here: sample data.
2. Create a copy of the ROO, add project specific entities
- Using Protégé load the ROO
- Save the ROO as a new ontology using the Turtle format (Protégé seems to handle individuals better that way) e.g. [myproject].ttl
- Change the ontology IRI to a project specific one e.g. http://www.cancerdata.org/roo/datamodel/project/[myproject]. Do not change the ROO entities (say no to this question that Protégé asks).
- In the File->Preferences menu select Active ontology IRI and User Supplied Name (this makes visualization later easier)
- Add project specific entities (ones that are not yet covered by the ROO)
3. Create data model through asserting individuals in Protégé
Create your data model:
- Add for one patient all concepts you need as individuals
- For sibling classes (e.g. ECOG0-4 are all sub classes of ECOG Performance Status): Make sure you assert for individuals which have siblings also the parent class (otherwise the parent class is not visible later)
- Assert individuals for all sibling classes (e.g. ECOG0-4) and assert again the subClassOf the parent class
- Link the individuals up with the properties you selected
- State date properties and data types
- Ontograf might at some point help with visualization, but it cannot be saved or exported, and if you have too many properties it fails as well, the below was made after reducing the data model to only the properties under consideration) images/ontograf.PNG
4. Load data model in a SPARQL endpoint and visualize using LD-VOWL
- Upload your data model turtle file to a SPARQL endpoint (at the moment I can't get LD-VOWL to work on localhost)
- Go to http://sparql.cancerdata.org/ld-vowl
- Fill in your endpoint (e.g. http://sparql.cancerdata.org/namespace/gilles/sparql)
- Click visualize
- In settings
- make sure you have number of classes at 100 (or more)
- include http://www.w3.org/2002/07/owl#NamedIndividual as a blacklisted Class (otherwise these will dominate in the visualization)
- include http://www.w3.org/2002/07/owl#Axiom as a blacklisted Class
- Click stop, reload and check your data model
- (Alternatively you can use LD-VOWL locally: (LD-VOWL)) Something like the picture below should appear images/ld-vowl.PNG
5. Make the DW2RQ mapping and upload additional triples
- Using the VOWL visualization make a mapping file. Example can be found here: sample mapping
- Dump the triples using D2RQ as Turtle (e.g. dump-rdf -o data.ttl -f TURTLE mapping.ttl) and upload to the SPARQL endpoint. An example results can be found here: target triples.
- Re-visualize and check (step 4) the data model with LD-VOWL
6. Communicate artefacts with user and data provider
- OWL: A project specific data model in OWL
- LD-VOWL: A visualization of the project specific data model for non OWL experts
- D2RQ, CSV and TURTLE: Source data, mapping, target triples