Example RDF Dump File Reconciliation - stkenny/grefine-rdf-extension GitHub Wiki

This example will reconcile the list of 100 top universities from the Guardian data blog against an RDF dump file from data.nytimes.com.

Create an OpenRefine project from the CSV file that can be exported from the Google spreadsheet provided by the Guardian Data Blog. A snippet is shown in the figure below.

https://github.com/stkenny/grefine-rdf-extension/blob/main/website/files/dump/screenshots/universities.png

Define a new reconciliation service based on the RDF dump provided by the NY Times describing organizations. Select Based on RDF file... from the RDF menu as shown in the figure below.

https://github.com/stkenny/grefine-rdf-extension/blob/main/website/files/dump/screenshots/add-service.png

Enter the details of the new service. Pick a name for the service (in the example below we chose "NYT organizations"). Choose the file and the format. Finally, select properties that are used to label resources in the RDF data (NYT organizations dump uses skos:prefLabel so we selected it as shown below).

https://github.com/stkenny/grefine-rdf-extension/blob/main/website/files/dump/screenshots/details.png

Choose start reconciling... from the column drop down menu of the "University" column. Select the "NYT organizations" service that we have just added. As shown below, type guessing will suggest a list of types with skos:Concept.

https://github.com/stkenny/grefine-rdf-extension/blob/main/website/files/dump/screenshots/reconciliation.png

Go ahead with the start reconciling button. After a while OpenRefine presents reconciliation results with facets about reconciliation decisions and top candidate scores (see figure below).

https://github.com/stkenny/grefine-rdf-extension/blob/main/website/files/dump/screenshots/results.png

Now we still have the task of going through the results and confirming correct suggestions that are not automatically matched. You can preview reconciliation suggestions to inform your decision (see figure below).

https://github.com/stkenny/grefine-rdf-extension/blob/main/website/files/dump/screenshots/preview.png

To get the reconciled URIs in the RDF exporter use cell.recon.match.id