Lab Assignment 4 - SaratM34/KDM-Lab-Assignments GitHub Wiki

Name: Mudunuri Sri Sai Sarat Chandra Varma
ClassId: 14

Objective: The objective of this lab assignment is to Create a simple question answering system as an extension of the dataset and tasks done in (1).Continuation from Tutorial 3. a.Use OpenIE for triplet extraction b.Use ConceptNet to enhance the semantic meaning ofentitiesc.Use WordNet and LDA to enhance the answers and reformulate or group questionsQuestion Answering system should be able to enrich the questions and answers based OpenIE, WordNet and LDA approaches. Use the diagram to guide you.

1. In class Question:

Write a simple spark program to read a dataset and do the following tasks

  • a.Extract Triplets using OpenIE
  • b.Extract Semantic Meaning using ConceptNet
  • c.Extract Synonyms using WordNet
  • d.Group the Data into LDA in below given pipeline and compare results
    • i.Data=>LDA
    • ii.Data=> NLP =>LDA
    • iii.Data=>NLP=>StopWord=>LDA
    • iv.Data=>NLP=>StopWord=>TFIDF=>LDA
  • Report your insights on each of the task.

a.Extract Triplets using OpenIE The following is the output of triplets extracted after using OpenIE.

b.Extract Semantic Meaning using ConceptNet The following is the spark program and related output after running ConceptIE.

c.Extract Synonyms using WordNet The following is the spark program and corresponding synonyms output using WordNet.

d.Group the Data into LDA in below given pipeline and compare results

  • i.Data=>LDA
  • ii.Data=> NLP =>LDA
  • iii.Data=>NLP=>StopWord=>LDA
  • iv.Data=>NLP=>StopWord=>TFIDF=>LDA

i.Data=>LDA Output after applying data for LDA.

ii.Data=> NLP =>LDA Output after processing the data with NLP and giving data to LDA.

iii.Data=>NLP=>StopWord=>LDA Output after processing the data with NLP, StopWord and giving data to LDA.

iv.Data=>NLP=>StopWord=>TFIDF=>LDA Output after processing the data with NLP, StopWord, TFIDF and giving data to LDA.

Report your insights on each of the task.

  • In the first task we used Open information extraction (open IE) which refers to the extraction of relation tuples, typically binary relations, from plain text. The central difference is that the schema for these relations does not need to be specified in advance; typically the relation name is just the text linking two arguments. Here for the given dataset based on variuos relations between text elements corresponding triplets are being generated.

  • In the second task we used conceptnet which is used to which is used to extract semantic meaning across words in a given text. In this task we have provided a dataset and conceptnet for a paricular words are run and their semantic meaning is being retreived.

  • In the third task we have used wordnet to extract synonyms for given words. The dataset is provided and for each word corresponding synonyms are extracted this is very important to extract synonyms for words becauses if the question is asked using synonyms of the targeted word the system must be able process and answer question.

  • In the fourth task we have used LDA to group data based on four tasks first is to directly group data using LDA and next one is to use nlp to group words using LDA and for the third we have used stop words and for fourth we have used TFIDF to process data. Output in task is being filtered and final output is fully processed.

2. Take home Question

Create a simple question answering system as an extension of the dataset and tasks done in (1).Continuation from Tutorial 3.

  • a.Use OpenIE for triplet extraction
  • b.Use ConceptNet to enhance the semantic meaning of entities
  • c.Use WordNet and LDA to enhance the answers and reformulate or group questions
  • Question Answering system should be able to enrich the questions and answers based OpenIE, WordNet and LDA approaches. Use the diagram to guide you.

Question Answering with OpenIE, ConceptNet, WordNet, LDA

The question answering system has been extended using OpenIE, ConceptNet, WordNet, LDA. Using these features the question answering system has been much more accurate than before. As we used OpenIE it brings the relation among the different sets of tuples in the document so it is easy to retrieve answers and also the system used ConceptNet to retrieve their semantic meaning which will help while processing "What" type questions. Also, used wordnet which will help in finding synonyms which helped while processing answers. Also, used LDA which will bring up related terms from different documents. Using all these enriched the questions and answers thereby the systems gives much more accurate data.

⚠️ **GitHub.com Fallback** ⚠️