Linked Data and SPARQL - bounswe/bounswe2024group4 GitHub Wiki

SPARQL

What is SPARQL?

SPARQL is a query language for data stored in RDF format specified by W3C.

What is W3C?

W3C is a standards organization for World Wide Web.

What is RDF?

RDF is Resource Description Framework. It is a data model and instead of the usual key-value format, data is stored in the form of triplets: subject - predicate - object.

The RDF Triplet

Subject: Denotes the resource.

Predicate: The relation between the subject and object.

Object: Another resource in relation with the subject.

For example in the following Wikidata query:

SELECT ?item ?itemLabel
WHERE {
  ?item wdt:P31 wd:Q146 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

and let's make it more human readable by removing the Wikidata tags and replace them with the names of these tags:

SELECT ?item ?itemLabel
WHERE {
  ?item instance_of human.
  ...

Here item is the subject, denoting any item on Wikidata.

instance_of is the predicate, denoting the relation between the object and subject. (Item is an instance of human)

human is the object, which is the destination of this relation.

The powerful aspect of this format of storing data is that an object can be the subject of another relation, forming a graph of triplets. Following toy examples denote this:

Lebron James is a player of Los Angeles Lakers.

Los Angeles Lakers is a participant in NBA.

In the first relation, Lebron James is the subject, while a player of is the predicate and the Los Angeles Lakers is the object. In the second relation, Los Angeles Lakers is the object, is a participant in is the predicate and NBA is the object.

By capturing these relations in an SQL-like query langauge which is SPARQL, we can query a semantic database for information.

Querying with SPARQL

SPARQL has 4 types of queries:

ASK

Checks whether there is at least one match of the pattern.

SELECT

Returns either all or some of the matches of the pattern in a table.

CONSTRUCT

Creates an RDF graph, used by specifying triplets.

DESCRIBE

Constructs an RDF graph that includes the matches of the pattern.

Linked Data

Firstly, in addition to above-mentioned terms, following simple definitions may help understanding the context better.

What is HTTP

HTTP, or Hypertext Transfer Protocol, is like a language computers use to talk to each other on the internet. It's what makes web pages load when you type in a website address. When you click a link or type a URL into your browser, it sends an HTTP request to a server asking for the webpage, and the server responds by sending back the webpage you asked for. So, HTTP is basically the way web browsers and servers communicate to show you web pages.

What is URI

A URI, or Uniform Resource Identifier, is like a web address. It helps computers find things on the internet, such as web pages, files, or images. It's a unique string of characters that tells your browser where to go when you click a link or type in a URL.

What is Linked Data

Linked Data is a method of structuring and interconnecting data on the web in a way that facilitates seamless integration and understanding. Unlike traditional databases where data is often siloed and disconnected, Linked Data enables us to create a web of interconnected information, where each piece of data is linked to related data points through standardized relationships.

At the heart of Linked Data, there are four fundamental principles as coined by Tim Berners-Lee:

1. Use URIs to Identify Things

Every entity or resource in Linked Data is assigned a unique Uniform Resource Identifier (URI). URIs serve as globally unique identifiers, allowing us to unambiguously refer to specific resources on the web.

2. Use HTTP URIs

HTTP URIs are particularly powerful because they enable direct access to resources via the web. By using HTTP URIs, we can leverage the existing infrastructure of the World Wide Web to retrieve data and navigate between interconnected resources.

3. Provide Useful Information Using RDF

Resource Description Framework (RDF) is the standard data model used in Linked Data. RDF enables us to express relationships between resources using simple subject-predicate-object statements, known as triples. These triples form the building blocks of Linked Data graphs.

4. Include Links to Other URIs

The true power of Linked Data lies in its ability to create connections between different datasets. By including links to other URIs within RDF triples, we can establish meaningful relationships between resources, enriching the overall web of data.

Benefits and Applications of Linked Data

Linked Data brings many benefits and uses in the digital world. It helps different systems work together smoothly by providing a common way to represent and share data, making it easier to combine large and varied datasets from different places. This makes it simpler to find and reuse information across various projects, saving time and effort. Linked Data also powers knowledge graphs, which organize information in a way that computers can understand, helping with smarter searches and predictions. Additionally, it makes it easier to merge different sets of data, even if they come from different places or use different languages, allowing for better collaboration and communication between systems and organizations.