Brief Research on Linked Data, Lexvo and WordNet - bounswe/bounswe2024group6 GitHub Wiki

Introduction

This document aims to gather information related to Linked Data, Semantic Web and related terms into one compact resource, focusing on the features that are of relevance to our project. There are also sections containing basic information about various data sources that may be used in our project, namely Lexvo and WordNet.

Brief Explanation of Some Terms

W3C

W3C stands for World Wide Web Consortium. It is an international organization aiming to create guidelines, specifications and recommendations for languages, protocols and technologies related to the web, with a focus on accessibility, internationalization, privacy and security.

Semantic Web

Semantic Web refers to an extension to the World Wide Web that aims to connect "data" rather than documents. It is designed to provide a common framework for data to be shared and reused. One if its key features is that the data in the Semantic Web is meant to be machine readable, human readability is a secondary concern.

RDF:

RDF stands for Resource Description Framework, and it's one of the key technologies used in the Semantic Web. Its purpose is to formally represent metadata in a common format. RDF statements are "triples" consisting of a subject, a predicate, and an object, this structure naturally leads to each RDF statement being a directed graph from the subject node to the object node with the predicate as the edge. Normally, all three items in the triple have a URI (Uniform Resource Identifier) that can be used to refer to them.

Turtle

Turtle is short for Terse RDF Triple Language, it is a textual syntax for RDF that allows an RDF graph to be completely written in a compact and natural text form. The language has constructs for defining prefixes to shorten RDF statements (kind of like variables in programming languages) and ways of using the same subject and/or object with multiple different relations. An example is given below for reference.

@base <http://example.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://www.perceive.net/schemas/relationship/> .

<#green-goblin>
    rel:enemyOf <#spiderman> ;
    a foaf:Person ;    # in the context of the Marvel universe
    foaf:name "Green Goblin" .

<#spiderman>
    rel:enemyOf <#green-goblin> ;
    a foaf:Person ;
    foaf:name "Spiderman", "Человек-паук"@ru .

SPARQL

SPARQL is a set of specifications that provide languages and protocols to query and manipulate RDF graph content on the Web or in an RDF store. It can be used to formulate queries ranging from simple graph pattern matching to complex queries; including unions, filters, aggregation, nested expressions and many more. The example below queries the names of persons and the number of friends they have.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name (COUNT(?friend) AS ?count)
WHERE { 
    ?person foaf:name ?name . 
    ?person foaf:knows ?friend . 
} GROUP BY ?person ?name

Linked Data:

Linked Data is the idea of putting data on the web and creating links between different data stores to make a web of data explorable by a machine. It's main proponent Tim Berners-Lee argues that Linked Data on the web must adhere to the same four principles used by the World Wide Web:

  1. Use URIs (Uniform Resource Identifiers) as names of things
  2. Use HTTP URIs so that people can look up those names.
  3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL)
  4. Include links to other URIs. so that they can discover more things.

He also outlines 5 different criteria for Data to be considered Linked Open Data:

  1. It should be openly available on the web
  2. It should be available as machine-readable structured data (like excel)
  3. The format of the structured data should be non-proprietary (like csv instead of excel)
  4. It should use open standards from W3C (RDF and SPARQL) to identify things
  5. It should be linked to other people's data to provide context.

Lexvo

Lexvo.org is part of the Linked Data Web worldwide initiative, and it exposes how everything in our world is connected in terms of language. It defines Uniform Resource Identifiers for language related objects, and also ensures these objects are highly interconnected as well as externally linked to other resources on the web. Some of its main features are:

  1. Identifiers for terms in specific languages, linked to many external resources (like WordNet). http://lexvo.org/id/term/eng/computer is the URI for one such term.
  2. Information related to over 7000 languages with translations, geographic regions etc.
  3. Identifiers for scripts and characters linked to these scripts.
  4. Lexvo.org Ontology, general properties to describe classes and relationships using RDF. including identity links.

WordNet

WordNet®️ is a large lexical database of English. It consists of nouns, verbs, adjectives and adverbs; and relations between them. The most common relation is synonymy, connecting words with similar meanings and thereby creating unordered sets called synsets. Most other relations link synsets to each other, and are usually constrained to elements belonging to one of the four groups of words (nouns, verbs, adjectives and adverbs). Some common examples are ISA relations ({bed} ISA {furniture}) and part-whole relations ({seat} is a part of {chair}). There are also relations connecting different verbs that describe increasingly specific aspects of an event (e.g. {communicate}-{talk}-{whisper}). Adjectives are organized based on antonymy ({wet}-{dry}) and similarity in meaning. There are also a few links connecting different types of words like nouns and verbs. A few examples are relations between words sharing the same stem, and relations between nouns and verbs that specify the semantic role of the noun with respect to the verb: {sleeper, sleeping_car} is the LOCATION for {sleep} and {painter}is the AGENT of {paint}, while {painting, picture} is its RESULT.

References

⚠️ **GitHub.com Fallback** ⚠️