Getting Started with Inscription Data in the CKG - strohne/Facepager GitHub Wiki
This first part of the tutorial introduces you to the data and guides you to creating plots with freely available online tools. In the second part, we show how to get additional data from Wikidata using Facepager.
What is the Culture Knowledge Graph?
The Cultural Knowledge Graph (CKG) represents a network of metadata from cultural heritage projects. It "aims to be a connector for all research data produced within the NFDI4Culture research landscape, improving the findability, accessibility, interoperability and reusability of cultural heritage data within the 4Culture Domain". (Source: NFDI4Culture)
The CKG contains Linked Open Data, so everyone can query it. For that NFDI4Culture implemented a SPARQL endpoint, which we will use for some of the following queries. You don`t know what SPARQL is or want to freshen up your knowledge? Have a look at the Getting Started with SPARQL Tutorial from Facepager. You can read up until the part where Facepager comes into play. You won't need it for now. We will come to Facepager in the second part of this tutorial.
Already familiar with SPARQL? Then let's head over to the CKG SPARQL endpoint and start exploring the CKG. All of the following queries are to be pasted into the textfield. Queries are run by clicking the play symbol in the upper right corner of the textfield and results can be downloaded with the download button (second symbol from the right, below the textfield).
Which projects are represented in the CKG?
Let's start our CKG journey with finding out from which projects we can find data in the CKG. Just replace the example query on the CKG SPARQL endpoint with the following query and run it.
prefix schema: <http://schema.org/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?sub ?subLabel WHERE {
?sub a schema:DataFeed .
OPTIONAL {
?sub rdfs:label ?subLabel .
OPTIONAL { FILTER(LANG(?subLabel) = "en") .}
}
}
The query returns all schema:DataFeed items in the CKG. Those are our projects. Each project has an identifier and we get labels using the OPTIONAL statement. Have a look at the resulting table. One project you will see is "Inscriptions metadata from Deutsche Inschriften Online/Epigraf". That's the DIO project. In the following we will find out more about the data in this particular project.
Which data does the CKG store from the DIO project?
The German Inscriptions project collects all latin and german inscriptions from the middle ages up to 1650 in Germany, Austria and South Tyrol. The inscriptions are published in volumes. Each volume covers a specific geographical area, e.g. Mainz. The DIO project digitized those volumes and publishes the data online.
The data of the DIO project included in the CKG is a selection of all data available on the DIO website. This is a design choice made by the CKG developers. The CKG doesn't aim to contain as much data as possible about items like inscriptions. It merely stores metadata and contains links to the original items, where more information might be found.
So which data does the CKG store for inscriptions from the DIO project? Let's look at an inscription on the facade of the city church of Bückeburg as an example. The following data can be found in the CKG:
- The name of the inscription (Fassade)
- The date the inscription was created (1613)
- The AAT identifier aat:300028702, which means that it's an inscription
- The Wikidata identifier wd:Q183061 for the object type (facade)
- The Wikidata identifier wd:Q5966 for the region (Schaumburg)
Content data, such as the text of the inscription and its translation are not included in the CKG. They can be retrieved by following the IRIs in the triples.
To query the CKG one needs vocabulary from NFDI4Culture ontologies. The queries for the first part of this tutorial will use the following vocabulary from those ontologies:
| Predicate/Object | Meaning | Comment |
|---|---|---|
| cto:CTO_0001005 | source item | All inscription articles of the DIO project are labeled as source items in the NFDI4Culture Ontology. |
| cto:CTO_0001006 | is referenced in | refers to the project, e.g. DIO project |
| cto:CTO_0001073 | has creation period | time span, during which the inscription was created |
| n4c:E6569 | Inscriptions metadata from Deutsche Inschriften Online/Epigraf | NFDI4Culture identifier for the DIO project in the NFDI4Culture Metadata API. |
How old are the inscriptions?
Run the following query on the CKG SPARQL endpoint:
prefix cto: <https://nfdi4culture.de/ontology/>
prefix n4c: <https://nfdi4culture.de/id/>
SELECT ?sub ?creationPeriod WHERE {
?sub a cto:CTO_0001005 ; # limit results to source item/ information content entity
cto:CTO_0001006 n4c:E6569 ; # limit results to the DIO project
cto:CTO_0001073 ?creationPeriod . # get the creation period
}
LIMIT 100
The result should be a list of 100 DIO articles with their respective creation periods. Creation periods in the DIO project are given in ISO 8601 time interval format. As there are over 20.000 articles with creation periods in the DIO project, it is advised to use a LIMIT clause.
In the next step, we query all the objects and group them by 25-year-slices to overcome the limit.
PREFIX cto: <https://nfdi4culture.de/ontology/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX n4c: <https://nfdi4culture.de/id/>
SELECT ?beginDate25 (COUNT(?object) AS ?numberOfObjects) WHERE {
?object a cto:CTO_0001005 ;
cto:CTO_0001006 n4c:E6569 ;
cto:CTO_0001073 ?creationPeriod .
BIND(STRBEFORE(STR(?creationPeriod), "-") AS ?beginDate) # extract begin date string before hyphen
BIND(xsd:integer(?beginDate) AS ?beginDateInt) # convert to number
BIND(ROUND(?beginDateInt / 25) * 25 AS ?beginDate25) # round to nearest 25 years
}
GROUP BY ?beginDate25
ORDER BY ASC(?beginDate25)
Compared to the first query the following changes were made:
- The begin dates of the creation periods are extracted. End dates are not considered in this query.
- The begin dates are rounded to 25-year bins. This leads to fewer result rows which can be visualized in a cleaner way.
- The numbers of articles per 25-year period are counted in the SELECT clause of the query in combination with the GROUP BY clause.
- The result is ordered by begin dates after converting the bins to numbers within the WHERE clause of the query.
The resulting table shows how many articles fall within a 25-year period derived from the starting point of its creation period.
How to create a timeline plot from the data?
RAWGraphs provides a versatile interface to visualize such data. Download the data by pressing the download button above the table. Then go to RAWGraphs and create a plot:
- Click
Upload your dataand paste the downloaded csv file there. Set the column separator to comma. - Choose
Line chartas plot type. - Adjust the mapping by dragging the columns into the aesthetics.
Set Bars to
beginDate25and Size tonumberOfObjects (Sum). - Customize the plot.
In the artboard section, increase the width to
1000px and height to500px. Play around with the other options. - Export: Choose a filename and export the plot as svg file.
The resulting figure gives an overview about the creation periods of inscriptions covered by the DIO project. As you see, most inscriptions were created in the 17th century.
What's next?
Group the data by locations or object types: Fetch the data using Facepager and get labels from Wikidata in the second tutorial (coming soon).
Funded by the German Research Foundation (DFG) - 441958017.
