Wikidata Research - bounswe/bounswe2024group12 GitHub Wiki

Wikidata Research

To have a better understanding of the concepts we will start with linked data.

Linked Data

It is a set of design principles for sharing machine-readable interlinked data on Web.

Semantic Web

It is a machine processable navigable space of interconnected objects with mappings from URIs to resources. We add further data descriptors to existing content and data in Web so computers can made meaningful interpretations similar to the way humans process information to achieve their goals.

If the data is open and linked then it is linked open data. Linked open data includes:

  • Factual data about specific entities and concepts
  • Ontologies - semantic schemata defining
    • Classes of objects (eg. Person)
    • Relationship types (eg. parent)
    • Attributes (eg DoB of person)

Because of their linking LODs form a giant web of data or a knowledge graph.

Four design principles proposed by Tim Berners-Lee:

  1. Use URIs as names for things
    Uniform Resource Identifier (URI) is a single global identification system giving unique names to anything.
  2. Use HTTP URIs so that people can look up those names
    HTTP protocol provides a simple mechanism for retrieving resources, when things can be identified by URIs in conjunction with this protocol they become easier to find.
  3. When someone looks up a URI, provide useful information using the standards (RDF, SPARQL)
    Use RDF, SPARQL to be able to use URIs efficiently for querying.
  4. Include links to other URIs so that they can discover more things
    Links to other URIs makes data interconnected and enables us to find different things. Maximize the reuse and interlinking and create richly interconnected network of machine-processable meaning.

Wikidata is a linked open data repository.

Wikidata

  • Wikidata is a free, collaborative, multilingual, secondary knowledge base, collecting structured data.
  • It can be read by human and machines.
  • It is a central storage repository.
  • Wikidata repository mainly consist of items. Items represent all the things in human knowledge, including topics, concepts and objects: love, Elvis Presley, gorilla... Each item has a label, a description and any number of aliases. Items are uniquely identified by letter Q followed by a number.
  • Statements describe characteristics of items. They consist of property - value pairs. Property describes data value, which can be thought as category of data. Value holds the actual data that describes the item. Also properties are called identifiers if they link to other databases.
  • Wikidata is not a database that stores facts, but a secondary knowledge base collects and links to references to such knowledge.
  • References point to specific sources that back up the data provided in a statement.
  • Reasonator is a tool to view wikidata entries, also show significant data through simple reasoning.
  • Data on wikidata is licensed CC0, "No rights reserved", for public domain.

SPARQL

SPARQL, which stands for SPARQL Protocol and RDF Query Language, is a query language and protocol, proposed by W3C, used to retrieve and manipulate data stored in Resource Description Framework (RDF) format which is a standard model for data interchange on the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF.

RDF

RDF, Resource Description Framework, is a directed, labeled graph data format for representing information in the Web. It is a widely used standard for representing information on the web in a machine-readable form. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link, which is usually referred to as a “triple”. RDF uses a triple-store model, where data is stored in subject-predicate-object triples.

  • Semantic queries involve searching for information based on its meaning or semantics, rather than only relaying on exact matching.
  • RDF is a fundamental technology of the Semantic Web, providing a standardized way to represent and link data.
  • The results of SPARQL queries can be results sets or RDF graphs.

Wikidata API

Since Wikidata is large and structured, The need of an interface arises which allows developers to interact with, retrieve and edit items and statements on Wikibase instances. Among many other APIs such as REST API, one is the SPARQL API.

Wikidata SPARQL API

The Wikidata API serves as a tool for accessing and utilizing structured data from Wikidata, enabling developers to build innovative applications and services that leverage the wealth of knowledge stored in the Wikidata knowledge base.


Links that I researched and used to write this page and might be helpful for further reading:
https://www.ontotext.com/knowledgehub/fundamentals/linked-data-linked-open-data/
https://www.ontotext.com/knowledgehub/fundamentals/what-is-the-semantic-web/
https://www.wikidata.org/wiki/Wikidata:Main_Page
https://www.wikidata.org/wiki/Wikidata:Introduction
https://www.wikidata.org/wiki/Help:Items
https://www.wikidata.org/wiki/Help:Statements
https://www.wikidata.org/wiki/Wikidata:Data_access
https://reasonator.toolforge.org/
https://www.w3.org/2001/sw/wiki/SPARQL
https://www.w3.org/2001/sw/wiki/RDF
Querying semantic web data with SPARQL by Marcelo Arenas and Jorge Pérez
https://agg-shashank.medium.com/an-introduction-to-using-wikidata-apis-a678ee6d2968
https://www.wikidata.org/wiki/Wikidata:Data_access#Wikidata_Query_Service
https://www.wikidata.org/wiki/Wikidata:REST_API
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service

⚠️ **GitHub.com Fallback** ⚠️