Wikidata Research - bounswe/bounswe2024group12 GitHub Wiki
To have a better understanding of the concepts we will start with linked data.
It is a set of design principles for sharing machine-readable interlinked data on Web.
It is a machine processable navigable space of interconnected objects with mappings from URIs to resources. We add further data descriptors to existing content and data in Web so computers can made meaningful interpretations similar to the way humans process information to achieve their goals.
If the data is open and linked then it is linked open data. Linked open data includes:
- Factual data about specific entities and concepts
- Ontologies - semantic schemata defining
-
- Classes of objects (eg. Person)
-
- Relationship types (eg. parent)
-
- Attributes (eg DoB of person)
- Attributes (eg DoB of person)
Because of their linking LODs form a giant web of data or a knowledge graph.
- Use URIs as names for things
Uniform Resource Identifier (URI) is a single global identification system giving unique names to anything. - Use HTTP URIs so that people can look up those names
HTTP protocol provides a simple mechanism for retrieving resources, when things can be identified by URIs in conjunction with this protocol they become easier to find. - When someone looks up a URI, provide useful information using the standards (RDF, SPARQL)
Use RDF, SPARQL to be able to use URIs efficiently for querying. - Include links to other URIs so that they can discover more things
Links to other URIs makes data interconnected and enables us to find different things. Maximize the reuse and interlinking and create richly interconnected network of machine-processable meaning.
Wikidata is a linked open data repository.
- Wikidata is a free, collaborative, multilingual, secondary knowledge base, collecting structured data.
- It can be read by human and machines.
- It is a central storage repository.
- Wikidata repository mainly consist of items. Items represent all the things in human knowledge, including topics, concepts and objects: love, Elvis Presley, gorilla... Each item has a label, a description and any number of aliases. Items are uniquely identified by letter Q followed by a number.
- Statements describe characteristics of items. They consist of property - value pairs. Property describes data value, which can be thought as category of data. Value holds the actual data that describes the item. Also properties are called identifiers if they link to other databases.
- Wikidata is not a database that stores facts, but a secondary knowledge base collects and links to references to such knowledge.
- References point to specific sources that back up the data provided in a statement.
- Reasonator is a tool to view wikidata entries, also show significant data through simple reasoning.
- Data on wikidata is licensed CC0, "No rights reserved", for public domain.
SPARQL, which stands for SPARQL Protocol and RDF Query Language, is a query language and protocol, proposed by W3C, used to retrieve and manipulate data stored in Resource Description Framework (RDF) format which is a standard model for data interchange on the Web. This specification defines the syntax and semantics of the SPARQL query language for RDF.
RDF, Resource Description Framework, is a directed, labeled graph data format for representing information in the Web. It is a widely used standard for representing information on the web in a machine-readable form. RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link, which is usually referred to as a “triple”. RDF uses a triple-store model, where data is stored in subject-predicate-object triples.
- Semantic queries involve searching for information based on its meaning or semantics, rather than only relaying on exact matching.
- RDF is a fundamental technology of the Semantic Web, providing a standardized way to represent and link data.
- The results of SPARQL queries can be results sets or RDF graphs.
Since Wikidata is large and structured, The need of an interface arises which allows developers to interact with, retrieve and edit items and statements on Wikibase instances. Among many other APIs such as REST API, one is the SPARQL API.
- This API lets developers to access and manipulate data stored Wikidata.
- Wikidata provides an SPARQL endpoint including a powerful Web-GUI.
- Developers can use the Wikidata Query Service to execute SPARQL queries against the Wikidata knowledge base, enabling complex data retrieval and analysis.
-
- SPARQL query service GUI is here: https://query.wikidata.org
-
- Some SPARQL query examples: https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples
The Wikidata API serves as a tool for accessing and utilizing structured data from Wikidata, enabling developers to build innovative applications and services that leverage the wealth of knowledge stored in the Wikidata knowledge base.
Links that I researched and used to write this page and might be helpful for further reading:
https://www.ontotext.com/knowledgehub/fundamentals/linked-data-linked-open-data/
https://www.ontotext.com/knowledgehub/fundamentals/what-is-the-semantic-web/
https://www.wikidata.org/wiki/Wikidata:Main_Page
https://www.wikidata.org/wiki/Wikidata:Introduction
https://www.wikidata.org/wiki/Help:Items
https://www.wikidata.org/wiki/Help:Statements
https://www.wikidata.org/wiki/Wikidata:Data_access
https://reasonator.toolforge.org/
https://www.w3.org/2001/sw/wiki/SPARQL
https://www.w3.org/2001/sw/wiki/RDF
Querying semantic web data with SPARQL by Marcelo Arenas and Jorge Pérez
https://agg-shashank.medium.com/an-introduction-to-using-wikidata-apis-a678ee6d2968
https://www.wikidata.org/wiki/Wikidata:Data_access#Wikidata_Query_Service
https://www.wikidata.org/wiki/Wikidata:REST_API
https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service