Getting Started with Inscription Data in the CKG and Wikidata - strohne/Facepager GitHub Wiki

The first part of the tutorial introduced you to the data provided by the Culture Knowledge Graph. We dived into the domain of inscriptions, working with data from the project "Die Deutschen Inschriften" (DIO), and created a plot with freely available online tools. In this second part, we show how to work with authority data using a Facepager tool chain. We will get labels from Wikidata to compare inscriptions from different regions - look at the following plot, can you spot where to find the oldest objects?

Inscriptions by date and region, three examples

Each object in the Culture Knowledge Graph is enriched with keywords. The objects from the DIO project all contain at least the aat:300028702 identifier from the Arts & Architecture thesaurus which classifies the object as an inscription. In addition, Wikidata identifiers are used for object types such as facade and for locating the object, for example, in Schaumburg.

Locations can be retrieved by following the predicates cto:CTO_0001010 (has related organization) and cto:CTO_0001011 (has related location). Both link to external identifiers represented by (https://nfdi.fiz-karlsruhe.de/4culture/ontology/#https://nfdi.fiz-karlsruhe.de/ontology/NFDI_0001006) (has external identifier).

In the following examples, we will follow the predicates to the external identifiers, query the locations and finally get labels for a visualisation from Wikidata.

How old are the inscriptions in different urban municipalities?

The distribution of creation periods of all inscriptions in the DIO project was determined in the first part of this tutorial. In a next step we will break that down further by differentiating between the urban municipalities covered by the project.

To run the queries from this part of the tutorial, we will use Facepager. With Facepager you can fetch publicly available data from Youtube, Twitter and other websites and APIs such as e.g. Wikidata or the Culture Knowledge Graph. See the Facepager page for download instructions and help materials. The wiki contains a getting started tutorial for using Facepager for SPARQL queries. Check it out!

Get data from the CKG

First, after opening Facepager, create a database (New Database in Menu Bar). Next, add a node named dio (Add Nodes in Menu Bar). Now, go to the presets section (Presets in Menu Bar) and select the first preset from Knowledge Graph>Culture Knowledge Graph (Get GND IDs of Ferdinand Gregorovius's addressees).

Replace the example query in the Query View by the following query:

PREFIX cto: \<https://nfdi4culture.de/ontology/\> 
PREFIX nfdicore: \<https://nfdi.fiz-karlsruhe.de/ontology/\>  
PREFIX xsd: \<http://www.w3.org/2001/XMLSchema#\> 
PREFIX n4c: \<https://nfdi4culture.de/id/\>

SELECT ?beginDate25 ?category ( COUNT(?object) AS ?numberOfObjects ) WHERE {

  ?object a cto:CTO_0001005 ;                        # limit to source items
          cto:CTO_0001006 n4c:E6569 ;                # limit to the DIO project
          cto:CTO_0001073 ?creationPeriod ;          # get the creation period
          ?predicate ?node .
  
  # retrieve associated areas
  ?object ?predicate ?node .									# match any pattern
  FILTER(?predicate IN (cto:CTO_0001010, cto:CTO_0001011)) .	# limit to organizations and locations
  ?node nfdicore:NFDI_0001006 ?category .						# get external identifier

  # transform dates
  BIND(STRBEFORE(STR(?creationPeriod), "-") AS ?beginDate)  # extract the begin date
  BIND(xsd:integer(?beginDate) AS ?beginDateInt)            # convert to a number
  BIND(ROUND(?beginDateInt / 25) * 25 AS ?beginDate25)      # round to 25 years

}
GROUP BY ?beginDate25 ?category 
ORDER BY ASC(?beginDate25)
LIMIT 5000

Instead of simply retrieving dates, we created an additional column ?category in the SELECT clause and we group by category. In consequence, we get the number of objects for each period within each category. The category, in this case, is filled with locations. The locations are assembled in the WHERE by a FILTER expression that combines the two paths to location data as explained above. You could use other categories such as object types as well.

Facepager automitcally adds a LIMIT clause (LIMIT 100) at the end of a SPARQL query when fetching data, if no limit is set. As we expect to get more than 100 resulting rows, we have to manually set a higher limit.

One strength of Facepager is, that queries can dynamically include data of nodes from the Nodes View during fetching. We will use this feature in the next step. For now, please note: Such placeholders are enclosed by angle brackets (e.g. <Object ID> ). Thus, angle brackets are reserved characters when working with Facepager. We must escape all angle brackets in the query that are not placeholders with a backslash. Therefore, the prefixes in the query are preceded by backslashes.

To fire up the request, select the dio node in the Nodes View and press the "Fetch data" button (in the Query Setup section).

The resulting table contains links to Wikidata identifiers for locations attached to the inscriptions in the DIO project. Click a row and see the Data View to the right. Revealing what's behind a Wikidata identifier requires following the link in the category.value entry . It will lead you to the corresponding Wikidata entry. But this gets tiresome as soon you have many Wikidata identifiers in the table. It would be helpful to have an additional column in the resulting table with the labels to the Wikidata identifiers. For example Worms reads better than http://www.wikidata.org/wiki/Q3852), right? This is where Facepager comes into play.

Get labels of Wikidata identifiers with Facepager

In the next step, we will get the labels to the Wikidata identifiers returned above by querying the Wikidata SPARQL endpoint with Facepager.

Load the first Wikidata preset ("Get gender of solo artist/band members") from the Knowledge Graph presets. Replace the query in the Nodes View by the following query:

prefix rdfs: \<http://www.w3.org/2000/01/rdf-schema#\>

SELECT * WHERE {
  \<<category.value>\> rdfs:label ?categoryLabel    
  FILTER(LANG(?categoryLabel) = "en") . 
  \<<category.value>\> wdt:P31 wd:Q42744322 . # limit to urban municipality in Germany
}

Instead of fetching the labels from Wikidata row by row, Facepager can run queries over all rows at once. Select the first row of the results from the previous queries in the Nodes View. Then, select Select all nodes in the Query Setup settings and fetch data. This might take a while.

If the query was successful, you should see a new level of rows in the Nodes View of Facepager and the text "fetched (200)" in the Query status column.

The query above fetches the labels of all Wikidata identifiers that are instances of a urban municipality in Germany. It uses a placeholder (<category.value>) as described in the previous section. The effect is, that Facepager sends the query to Wikidata multiple times, once for each row that was returned by the earlier CKG query. For each time the query is sent to Wikidata, <category.value> is replaced by the category.value entry of the respective row in the Nodes View (e.g. http://www.wikidata.org/wiki/Q3852). We have to encapsulate them in escaped angle brackets in order for Wikidata to interpret them as IRIs.

Click one result and inspect the data in the top right pane. Wikidata returns the labels and additional information, e.g. the type of values like "literal".

To see just the values in the Nodes view, adjust the column settings. Here is a suggestion:

beginDate25=beginDate25.value
numberOfObjects=numberOfObjects.value
category=category.value
categoryLabel=categoryLabel.value

This will change which data from the Wikidata results will be displayed in the Nodes View. You can now see the labels for the Wikidata identifiers relating to urban municipalities. Next we will visualise the result.

Side note: Why Facepager?

Instead of using Facepager, we could have run the queries on the endpoints directly. If a SPARQL endpoint supports federated queries, you can directly retrieve data from external knowledge graphs such as Wikidata with just one query. So why use Facepager?

Federated queries for large amounts of data requires additional resources on the server and likely results in a time out. So we need another tool chain.

Facepager helpt to chain data retrieval. We send one query to the CKG SPARQL endpoint, followed by multiple queries to the Wikidata SPARQL endpoint.

Look at the Wikidata query from above. If we run this query directly on the Wikidata SPARQL endpoint, we would have to run it over 1000 times and each time we would have to replace the URL with the respective Wikidata identifier. Facepagers saves us that work by dynamically replacing placeholders by values from the specified nodes.

As the CKG query from this tutorial doesn't contain placeholders and is only executed once, it is not strictly necessary to use Facepager. But the CKG query's resulting data is used as input for the Wikidata query. By using Facepager from the beginning, everythings is handled in one place, simplifying data documention and making the data collection reproducable.

Visualisation with RawGraphs

In Facepager select the "dio" node on top of the Nodes View. Export the data (Export Data in the Menu Bar) with the following settings:

  • Export mode: Selected nodes in wide format (...)
  • Object types: data
  • Level: 2

Go to RAWGraphs, click Upload your data and load the exported csv file. Set the column separator to semicolon.

When looking at the file preview in RAWGraphs, you will notice a lot of columns, many of them empty. This is an effect of exporting in wide format. Nodes in Facepager are structured hierarchically. The labels of interes are contained in level 2 nodes. The corresponding dates and the number of objects are contained in their parents nodes (level 1 nodes). For the visualisation we need all that information in one line per node. The wide format ensures that all exported nodes (in our case the level 2 nodes) contain all data from their ancestor nodes. Hence all columns are repeated for all parent levels.

Fair enough, it's not super easy to understand. Hopefully, you will see that it makes sense once you have created the graph.

  1. Choose Bar chart as plot type.
  2. Adjust the mapping by dragging the columns into the aesthetics. Set Bars to lvl_1_beginDate25, Size to lvl_1_numberOfObjects (Sum) and Series to lvl_2_categoryLabel.
  3. Customize the plot. In the artboard section, increase the width to 1200px and height to 10000px. In the series section, disable the fixed scale and set number of columns to 1. Play around with the other options.
  4. Export: Choose a filename and export the plot as svg file.

When interpreting the figure, keep in mind that the subplots are differently scaled (compare e.g. Landshut and Bamberg).

What's next?

You can adapt the category and analyse, for example, the object types instead of locations. To get you started, note that there are two variants how identifiers for object types of DIO items are stored in the CKG:

a) directly,with the predicate cto:CTO_0001026 (has external classifier) or
b) in a nested structure, with the predicate cto:CTO_0001012 (has related event), which in turn is connected to nfdicore:NFDI_0001006 (has external identifier).

We suggest you first focus on one of them. Later you can combine them with a FILTER expression as laid out in the above tutorial or, alternatively, with a UNION query.

Funded by the German Research Foundation (DFG) - 441958017.

Logo of the German Research Foundation

⚠️ **GitHub.com Fallback** ⚠️