Tutorial 2 - Nagumkc/CS5560_KDM_Lab-assignments GitHub Wiki

                                         Tutorial 2
                                 Knowledge Discovery Management

Name : Nageswara Rao Nandigam

Class ID : 18


Lab assignment #2


1.In Class Question

Generate the output (changes or transformations in the data) manually when the following Spark tasks are applied on the input text. Show your output in details.

Input: The dog saw John in the park The little bear saw the fine fat trout in the rocky brook.


a.Map vs FlatMap


b.Map Reduce


C.Group by Starting Letter (Draw diagram how the spark methods used changes the data similar wordcount diagram as shown below)


2.In Class Question

Write a simple spark program to read a dataset and group each word by the starting letter of its lemmatized word (in this exercise, we assume case-not-sensitive).


**a.Write a function Fin Java using CoreNLP to extract Lemmatized Words **


b.Call the function F from a SparkTransformationfunction

Highlighted in diagram where I called NLP function and applied group by transformation

Output:


3.Take home Question

Create a simple question answering system as an extension of the dataset and tasks done in (2). Continuation from Tutorial 1B. Make sure to use at least two Spark Transformations and two Spark Actions.


  1. NLP function and storing results in hashmap


  1. Detecting answer type and calling nlp function in spark transformation function


  1. Applying spark transformations and actions to get final answer for each question


Output:

Question 1: where type

Answer:


Question 2: When type

Answer:


Question 3:Who type

Answer: