Annotating Files with Metadata using MOSS - LOD-GEOSS/databus-snippets GitHub Wiki

Annotating Databus Files with additional Metadata

During the deploy process files get only annotated with minimal metadata, e.g. about the author, the license and specific data information like the shasum. To allow more extensive metadata about the content, the Databus MOSS was introduced. The MOSS (Metadata Overlay Search System) allows the annotation of files (identified by their Databus File Identifier DFI) and other Databus Entities, like Collection and Group, with simple resources and the search for those. Additionally you can submit (nearly) arbitrarily large graphs of Metadata to a certain Databus File. This page is a step by step tutorial how to achieve that.

Annotating a simple resource tag (URI)

At the annotation page any DFI can be annotated with different classes. The page contains a search based on multiple ontologies to find a fitting class for the data. To annotate a certain DFI just paste it into the left text field and search on the right field for a fitting class. Note that a click on the refresh button will show you what classes the file is currently annotated with. To make use of that annotation just head over to the search page and search for a class (similar to the annotation page) and add it to the list. To search for annotations select Annotation instead of VOID in Search Type and hit search. This will return the file you just annotated.

Annotating a RDF graph

Step 1: Generating the Metadata

Even though it is possible to submit metadata as raw JSON (MOSS transforms it to RDF), it is not recommended and instead JSON should be transformed to JSON-LD. For this a JSON-LD context was written for the OEP metadata format. With a few simple steps (see this commit) and with the help of the context the JSON metadata can be transformed to JSON-LD and therefor to RDF. A helpful tool to understand whats happening is the JSON-LD playground.

Step 2: Submitting the Metadata

Step 2A: Doing it manually

The first thing to do here is to get the File identifier for the Databus File. This URI can be found on an version page (example) by clicking on List or by copying the URI of the Download button next to the file. Then the Metadata content generated in Step 1 can be copied to the large textfield at the MOSS submission page while the one line will hold the Databus File Identifier (Note that a click on the refresh button will show you the RDF content available for the given identifier). A click on submit will finish the process and show you whether it had worked or not.

Step 2B: Submission via API

Note: It could be that these URIs change a bit, so make sure to check this wiki again if something stops working.

The same process can be automated via API PUT request: Needed are the Databus Identifier (File, Collection, Group etc.) percent encoded and the Metadata Content as RDF. Generally this API can be called with any programming language, but here I will show how to do it with curl and python3:

curl

This assumes that the metadata is written in a file wit the path ./metadata.jsonld. Furthermore the Content-Type Header must be set according to the RDF format, so in this case for JSON-LD application/ld+json.

curl -X PUT "http://moss.tools.dbpedia.org/annotation-api-demo/submit?id=https%3A%2F%2Fenergy.databus.dbpedia.org%2Fdenis%2Fcollections%2Fexample-collection" -H "Content-Type: application/ld+json" --upload-file metadata.jsonld

python3

import requests
from urllib.parse import quote

# it is assumed that the data is present as a python dict (or list)
metadata = {}

# the identifier to annotate
databus_identifier = "https://energy.databus.dbpedia.org/denis/collections/example-collection"

# generate the URI for the request with the encoded identifier
api_uri = f"http://moss.tools.dbpedia.org/annotation-api-demo/submit?id={quote(databus_identifier)}"

requests.put(api_uri, headers={"Content-Type": "application/ld+json"}, json=metadata)

Step 3: Accessing the Metadata

The metadata can then be accessed using the Databus Mods SPARQL endpoint. A simple query using the DFI to retrieve all the triples would be:

PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>


SELECT DISTINCT ?s ?p ?o WHERE {
  
  GRAPH ?g {
    ?s ?p ?o .
    ?activity a <http://mods.tools.dbpedia.org/ns/demo#ApiDemoMod>; 
       prov:used <$DFI> .
  }
} 

A working example for this can be found here, another more usable example is a query retrieving all the column information of a table.

Step 3.1: Searching the Metadata by String

It is also possible to search the Mods SPARQL Endpoint for searching a certain String in the data, check out the query below for an example (searches for the term wind).

PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:   <http://www.w3.org/2000/01/rdf-schema#>


SELECT DISTINCT ?di ?p ?o WHERE {
  
  GRAPH ?g {
    ?s ?p ?o .
    filter contains(lcase(str(?o)), "wind")
    ?activity a <http://mods.tools.dbpedia.org/ns/demo#ApiDemoMod>; 
       prov:used ?di .
  }
} 
⚠️ **GitHub.com Fallback** ⚠️