LAB ASSIGNMENT 7 - Sreelakshmi-N/CS5560SreelakshmiLabAssignment GitHub Wiki

CS5560 Knowledge Discovery Management

Name: Sreelakshmi Nandanamudi

Classid: 17

Protege:

Protege is a free, open-source platform to construct domain models and knowledge-based applications with ontologies.

ProtegeOWL editor: enables users to build ontology for the Semantic Web, in particular to OWL

  • Classes

  • Properties

  • Instances

  • Reasoning

OWL:

The W3C Web Ontology Language (OWL) is a Semantic Web language designed to represent rich and complex knowledge about things, groups of things, and relations between things.

OWL Properties represent relationships between two entities.

There are two main properties:

  • Object properties: to link entity to another entity

  • Datatype properties: to link entity to XML Schema datatype or rdf:literal

  • Annotation properties: used to add annotation information to entity (for example label, verisoninformation, comments etc.)

SPARQL:

SPARQL is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.

DLQuery: The DL Query tab allows users to quickly test definitions of classes to see that they subsume the appropriate subclasses. Or check for class membership of arbitrary descriptions without having to create named class placeholders.

1.In Class Questions

Generate a knowledge graph (ontology learning) as explained in tutorial in the following manner.

i.Create a class hierarchy (classes) for a given domain (e.g., Food, Family) containing at least 5 classes

I have taken dataset from Superbowl football game and then passed my dataset to generate the triplets using openie and the wordnet.

ii.Create properties by specifying domains (subjects) and ranges (objects) (at least 5 properties)

I have created the data properties and object properties by specifying domains and ranges.

Data Properties:

Object Properties:

Individuals:

iii.Prepare a paragraph (in the same domain with your ontology) containing at least 5 sentences (containing at least 10 instances)

Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals under which the game would have been known as Super Bowl so that the logo could prominently feature the Arabic numerals 50.The Panthers finished the regular season with a 15–1 record, and quarterback Cam Newton was named the NFL Most Valuable Player (MVP). They defeated the Arizona Cardinals 49–15 in the NFC Championship Game and advanced to their second Super Bowl appearance since the franchise was founded in 1995. The Broncos finished the regular season with a 12–4 record, and denied the New England Patriots a chance to defend their title from Super Bowl XLIX by defeating them 20–18 in the AFC Championship Game. They joined the Patriots, Dallas Cowboys, and Pittsburgh Steelers as one of four teams that have made eight appearances in the Super Bowl.The Broncos took an early lead in Super Bowl 50 and never trailed. Newton was limited by Denver's defense, which sacked him seven times and forced him into three turnovers, including a fumble which they recovered for a touchdown. Denver linebacker Von Miller was named Super Bowl MVP, recording five solo tackles, 2½ sacks, and two forced fumbles.CBS broadcast Super Bowl 50 in the U.S., and charged an average of $5 million for a 30-second commercial during the game. The Super Bowl 50 halftime show was headlined by the British rock group Coldplay with special guest performers Beyoncé and Bruno Mars, who headlined the Super Bowl XLVII and Super Bowl XLVIII halftime shows, respectively. It was the third-most watched U.S. broadcast ever.

iv.Generate a knowledge graph including new Individuals and new Triplets by the execution of the knowledge graph generation program.

I have visualized my ontology using the VOWL pulgin and the triplets are generated for the given dataset which can be seen in the below screenshot.

Triplets Generated:

Knowledge Graph Generation:

v.Design at least 5 questions and execute the DL queries with your new knowledge graph

The below screenshots depict the question and answering system that was performed on the ontology generated

vi.Visualize ontologies and report about your results/observations from the steps 1-5

2.Take Home Question

i.Design your own ontology (classes, hierarchy, object and data properties) based on two datasets from the following datasets

a.Your own data

I have taken the dataset from Education domain

Visualization:

b.Stanford data https://rajpurkar.github.io/SQuAD-explorer/

I have taken standford data from University domain.

**c.Yahoo! Answer data (The training datacontains 2,698 questions, already labeled with one of the following 7 categories. ** The test datacontains 1,874 questions that are unlabeled).

I have taken the dataset from University domain from both Yahoo Answer data and Stanford data. I have combined both the data into a single data and created ontology.

ii.Automatically create a set of individuals and triplets from your dataset based on classes and properties in your ontology

The below screenshots will depicts the creation of the object properties,data properties,individuals and triplets for the particular dataset.

Data Properties:

object Properties:

Individuals:

Classes:

Triplets Generated:

iii.Formulate at least 10 valid and informative queries, implement them using DL queries, and execute them by mapping questions to DL queries. Report the results

The following screenshots will depicts the 10 DL queries that were applied to my dataset.

iv.Conduct Machine Learning Tasks with the Yahoo Questions/Answers datasets at https://umkc.box.com/s/4cvpv05cxets8jb2t5qf5vqvwoyrgxe3(at least 3 categories)

The below screenshots will depicts the output of the machine learning tasks

Decision tree:

Naive Bayes:

Random Forest:

Visualize ontologies and report your results and observations on (i), (ii), (iii), (iv) questions

The below screenshot will depcits the ontology that was visualized using the VOWL plugin.