Reading time: 11 minutes

This Getting Started contains a universally transferable, beginner-friendly introduction to SPARQL (pronounced "sparkle"), a semantic query language used to access and retrieve data stored in graph-like RDF (Resource Description Framework) databases such as the Culture Knowledge Graph. Making an effort to learn the basics of SPARQL and its syntax will allow you to literally understand any graph database query and, more crucially, will let you write your own queries to fetch data from all kinds of public knowledge graphs.

Now, you've already learned how to pronounce SPARQL. Great! Yet, simply looking at a SPARQL query can be daunting. The whole structure and vocabulary, everything seems cryptic at first, not to mention the headache of writing a query from the ground up. That's no surprise! The syntax behind how data is stored was meant to provide it with meaning in a machine-readable form. While SPARQL allows us (humans) to query all kinds of databases, it still follows a syntax that was tailored to machines not humans. In the following sections, we will break down the logic behind SPARQL step-by-step so you will eventually be able to read its syntax just as easily as this introduction.

Getting the hang of RDF Triples and SPARQL

"What now are RDF Triples? Did you not want to teach me about SPARQL?!", you ask. Well, yes, but to do so, let's first take a step back: Imagine you had an appointment with the authorities but not only is every door labelled in a different language which makes it hard for you to find the correct room, you also have to speak different languages when applying for a passport versus registering a new place of residence. That would not be particularly efficient, would it? The same applies to huge databases where lots of different information are stored. To prevent users from having to learn a new language every time they wanted to retrieve or store data a common standard for communicating with database was needed. The Resource Description Framework, or RDF for short, is just that: a standard model for data interchange on the web. It has a simple syntax that is based on triples consisting of a subject, a predicate, and an object (e.g. "Susan has age 42"). Several such triples can be formalised as networks or knowledge graphs. Note how the syntax does not adhere to natural language grammar ("has age" is the predicate whereas "42" is the object). Don't worry, this peculiarity becomes second nature after a short time of working with data triples. SPARQL, on the other hand, is the semantic query language that is commonly used to query data from RDF's triple stores. Because RDF dictates data to be stored in triples, we can retrieve it using triple statements as well. Thus, all SPARQL statements are made up of the same three elements: a subject, a predicate and an object. A person's age, for instance, can be formulated as follows: Susan (subject) has age (predicate) 42 (object). In principle, all dates can be formalised as such triples.

?sub ?pred ?obj .

Susan has age 42 .

The RDF was developed to bring various different data models down to the lowest common denominator, and yet it does not define the format in which data is stored. Formats range from JSON-LD, XML/RDF to text formats such as Turtle. Often several formats are available to choose from, as RDF makes them interchangeable. For a detailed introduction to RDF see, Jünger and Gärtner (2023) Computational Methods for the Social Sciences and Humanities (Chapter 3.7).

SPARQL Endpoints

Many free online databases let users execute queries via so-called SPARQL endpoints that are implemented in RDF databases. These endpoints act as an interface between the user and the underlying RDF data, enabling queries to be submitted over the web. Well-known SPARQL endpoints are provided by DBpedia or Wikidata. The NFDI4Culture also provides a public SPARQL endpoint. You will get to use them soon!

SPARQL Queries

Let's now turn to the heart of it all. SPARQL queries, as already established, allow you to search and query data from RDF formatted databases or knowledge graphs. While at its core, a SPARQL query consists of triple statements, it features all of the following basic elements:

PREFIX: Defines prefixes for namespaces to, one, improve the readability of queries and, two, tell what data can be queried using which vocabulary. Do not worry about it for now. We will turn to Namespaces and Vocabularies shortly.
SELECT: Determines what results will be returned from the query.
WHERE: Defines the patterns that are searched for in the RDF graph. Remember all data can be found using triples and here is where to put them. Because we want to request information, we work with placeholders or variables that are embedded in the triple structure. Variables can be recognised by the ? in front of them.
OPTIONAL: Allows optional patterns that do not necessarily have to be present in the graph. This is especially helpful, if you query for data that is not certainly available.
FILTER: Used to filter results based on predefined conditions.
SORTING PARAMETERS: Use LIMIT, GROUP BY or ORDER BY to limit or sort the returned results. Without an explicit instruction, the results are returned in the order in which they are processed by the RDF database or the SPARQL endpoint.

For starters, we can create a simple SPARQL query by asking for an unspecified triple of subject, predicate, and object. Remember to set a low LIMIT. The less specific your query, the higher the chance that an astronomical number of results will bring the query to its knees. Let's try to query the DBpedia SPARQL Endpoint. Simply, paste the query below and hit Execute Query.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?sub ?pred ?obj 
WHERE {
    ?sub ?pred ?obj .
} LIMIT 10

This pattern will match any triple in the RDF graph, effectively retrieving all triples at once (limited to the first 10 results). Congratulations! You have successfully created your first SPARQL query. Now, let's try something a little more ... insightful that we can actually make sense of. Let's catch the first ten persons' names, stored in DBpedia. Again, paste the query below and hit Execute Query.

PREFIX foaf: <http://xmlns.com/foaf/0.1/> 

SELECT ?name 
WHERE {

?person foaf:name ?name .

} LIMIT 10

That worked! But notice how the returned names do not seem to belong to humans but to all kinds of entities? That's because ?personis a variable with no meaning. To fetch the names of actual persons, we have to incorporate a triple statement that tells the database that we are only looking for entries with the attribute person. Notice, how by using ; we can separate multiple predicates that refer to the same subject:

PREFIX foaf: <http://xmlns.com/foaf/0.1/> 

SELECT ?name 
WHERE { 

?entity a foaf:Person ;
          foaf:name ?name . 

} LIMIT 10

Far better! Using SPARQL, we used the first triple to tell that our variable ?entity (subject) is a (predicate) foaf:Person(object). By doing so, the database we are fetching data from knows we only look for entities that are defined as a person through the FOAF (Friend of a Friend) namespace. FOAF provides terms for describing people (Note how we had to define the prefix first!). Using ; we then added another triple referring to the ?entity again. Here we applied the same predicate (foaf:name) as in our first query. This predicate points to the entity's name. Lastly, we stored the resulting object in another variable (?name). As we only selected ?nameto be returned, that is all we see after running the query. Try to select ?entity as well and see what happens.

SELECT ?name ?entity

Literally, the query can be read aloud as follows: ?entity is a person (as defined by FOAF) and has the name ?name.

Prefixes, Namespaces, and Vocabularies

Now that you know the basic syntax, let's talk about prefixes, namespaces, and vocabularies. You already encountered a common namespace, namely Friends of a Friend (FOAF). A namespace defines the vocabulary used to clearly label the elements of a triple. Vocabularies contain defined expressions for certain categories such as names or birthdays and all kinds of other information. FOAF, for example, stores predicates such as names, addresses or acquaintances in a standardised manner. Other common vocabularies come from schema.org or DBPedia itself. Within a SPARQL query, we call on this vocabularies by setting a Prefix. Prefixes are abbreviations of their namespaces' full URIs:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX schema: <http://schema.org/>
PREFIX dbo: <http://dbpedia.org/ontology/> 
PREFIX dbp: <http://dbpedia.org/property/>

A collection of related terms, for example that people have a name or a date of birth, or that a book has a title and a publishing year is called an ontology. Ontologies are formalised with languages such as the Web Ontology Language (OWL).

Adjusting SPARQL to Facepager

Facepager supports querying graph databases using SPARQL via the Generic module and a dedicated SPARQL module that allows users to build and test queries in a more streamlined and practical manner. However, there is one central peculiarity that always needs to be taken into account when issuing a SPARQL query using Facepager. Facepager marks seed node placeholders such as the Object ID with arrow heads: <Object ID>. As you may have noticed, usually, the namespaces in SPARQL queries are also embraced by arrow heads. In order for Facepager to resolve a SPARQL query correctly, every arrow head must be escaped \<LINK TO ONTOLOGY\>, except of course the ones marking the <Object ID> (or any other intentional placeholders for that matter). For an illustration, see the same SPARQL query as before, optimised for Facepager.

PREFIX foaf: \<http://xmlns.com/foaf/0.1/\> 

SELECT ?name 
WHERE { 

?entity a foaf:Person ;
          foaf:name ?name . 

} LIMIT 10

Finding your way around Facepager's SPARQL module

Now that you know how to use SPARQL and how to adjust it for Facepager, let's explore the reason behind going through the hassle of querying from within Facepager. The main appeal stems from Facepager's powerful placeholder function. Consider the following already optimised SPARQL query.

PREFIX dbo: \<http://dbpedia.org/ontology/\>  
PREFIX rdfs: \<http://www.w3.org/2000/01/rdf-schema#\>  

SELECT ?entity ?birthDate  
WHERE {  
  ?entity rdfs:label "<Object ID>"@en ;
          dbo:birthDate ?birthDate .  
}

Calling on the DBpedia ontology (DBO) allows quick access to the most used information of any entry on DBpedia (such as dbo:birthDate) while the foundational RDF Schema (RDFS) provides a way to identify an entity based on its human-readable label (instead of an abstract ID) by using rdfs:label. What this allows us in combination with Facepager is to automate querying with the help of the application's standard placeholder <Object ID>.

Facepager successively replaces the <Object ID> with selected nodes upon fetching data. Follow the steps below to walk through a typical workflow.

Set up the query using Facepager's SPARQL module. Start by specifying DBpedia's SPARQL Endpoint (https://dbpedia.org/sparql) in the designated Endpoint field. Then copy the query above into the Query field. Also make sure to set Response to JSON and to enter results.bindings as the Key to extract.

Create a list of seed nodes containing the names of influential academics (e.g., Kimberlé Crenshaw, Judith Butler, etc.) or any natural person that interests you.

Select all seed nodes and fetch data.

Find the birth dates for every node in the data view and add the key to your column setup to have it shown in the node view as well.

Voilà! Instead of having to manually query for each person's birth date and noting it down in your own Excel file, Facepager allows you to automatically iterate through nodes and store query results however you choose. While this example is fairly simple, it demonstrates how placeholders work in Facepager in conjunction with SPARQL queries. And now that you're familiar with the basics, the only limit—apart from your own creativity—is the availability of data.

What's next?

A great resources to learn more about SPARQL and some of its advanced syntax quirks is Wikidata's introduction to SPARQL.
Check out our Getting Started with Knowledge Graph to behold the combined might of SPARQL and Facepager in an hands-on tutorial.

Getting Started with SPARQL - strohne/Facepager GitHub Wiki

Getting the hang of RDF Triples and SPARQL

SPARQL Endpoints

SPARQL Queries

Prefixes, Namespaces, and Vocabularies

Adjusting SPARQL to Facepager

Finding your way around Facepager's SPARQL module

What's next?

⚠️ GitHub.com Fallback ⚠️

Getting Started with SPARQL - strohne/Facepager GitHub Wiki

Getting the hang of RDF Triples and SPARQL

SPARQL Endpoints

SPARQL Queries

Prefixes, Namespaces, and Vocabularies

Adjusting SPARQL to Facepager

Finding your way around Facepager's SPARQL module

What's next?

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️