LAB ASSIGNMENT 1B - Sreelakshmi-N/CS5560SreelakshmiLabAssignment GitHub Wiki

CS5560 Knowledge Discovery Management

Name: Sreelakshmi Nandanamudi

Classid: 17

1. Generate the following NLP tasks for the following sentence manually Input:

a. The dog saw John in the park

NLP Tasks:

a. Part-of-speech (POS) tagger

Result:

[(The,DT),(dog,NN),(saw,VB),(John,NN),(in,IN),(the,DT),(park,NN)]

b. Named entity recognizer (NER)

Result:

park - Location

John - Person

dog - Animal

c. Co-reference resolution system

Result:

saw--park were coreferential entities.

b. The little bear saw the fine fat trout in the rocky brook.

NLP Tasks:

a. Part-of-speech (POS) tagger

Result:

[(the,DT),(little,JJ),(bear,NNP),(saw,VBD),(the,DT),(fine,JJ),(fat,JJ),(trout,NNS),(in,IN),(the,DT),(rocky,JJ),(brook,NN)]

b. Named entity recognizer (NER)

Result:

bear - Animal brook - Place

c. Co-reference resolution system

Result:

saw--trout are coreferential entities.

The below screenshots are outputs for the first input

The below screenshots are outputs for the second input

2. Create a NLP project for the following tasks using CoreNLPInput:

Choose Dataset from the sheets

NLP Tasks:

a. Part-of-speech (POS) tagger

The Part of Speech Tagger marks tokens with their corresponding word type based on the token itself and the context of the token. A token might have multiple pos tags depending on the token and the context. The OpenNLP POS Tagger uses a probability model to predict the correct pos tag out of the tag set. To limit the possible tags for a token a tag dictionary can be used which increases the tagging and runtime performance of the tagger.

Example:

Original text: The roses are very beautiful

Analysis Result: The|DT roses|NNS are|VBP very|RB beautiful|JJ

b. Named entity recognizer (NER)

Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data.

Example:

Original text: Obama became president of USA in the year 2009.

Analysis Result: [Obama]Person became president of USA[Location] in the year 2009[Time].

c. Co-reference resolution system

Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction.

Example:

Original text: Obama became president of USA because he was too talented.

Analysis Result: Obama....he

d. Sentiment Analysis

Sentiment analysis is definitionally a form of NLP; you're processing natural language text. The only way to know exactly how well your approach is going to work is to try it. Conveniently, that will also tell you if it works well enough for your purpose, which is actually the part that matters

Example: "just read the book". it contains no explicit sentiment word and it is highly depending on the context.

Nat­ur­al Lan­guage Pro­cessing

“Nat­ur­al Lan­guage Pro­cessing" is a field that cov­ers com­puter un­der­stand­ing and ma­nip­u­la­tion of hu­man lan­guage, and it’s ripe with pos­sib­il­it­ies for news­gath­er­ing.NLP is a way for computers to analyze, understand, and derive meaning from human language in a smart and useful way.

The below screenshots depicts the NLP project for the following tasks

I have taken the datasets from the News domain and applied the basic NLP operations on that data.

The below screenshots depicts the output for Sentiment Analysis

Each and every sentence is analysed and calculated and the sentiment for every sentence is shown in the below screenshots

Create a simple question answering system as an extension of the dataset and tasks done in (2).

****I have created a simple question answering system where I can retrieve the answers based on the questions that were asked. First we need to the perform the NLP operations and have to store it and based on that when the questions such as when,who,what were asked, the corresponding answers were retrieved.

The below screenshot depicts the simple question answering system:

References:

https://en.wikipedia.org/wiki/Interrogative_word

https://answers.yahoo.com/