Tutorial 2 - Nagumkc/CS5560_KDM_Lab-assignments GitHub Wiki
Tutorial 2
Knowledge Discovery Management
Name : Nageswara Rao Nandigam
Class ID : 18
Lab assignment #2
1.In Class Question
Generate the output (changes or transformations in the data) manually when the following Spark tasks are applied on the input text. Show your output in details.
Input: The dog saw John in the park The little bear saw the fine fat trout in the rocky brook.
a.Map vs FlatMap
b.Map Reduce
C.Group by Starting Letter (Draw diagram how the spark methods used changes the data similar wordcount diagram as shown below)
2.In Class Question
Write a simple spark program to read a dataset and group each word by the starting letter of its lemmatized word (in this exercise, we assume case-not-sensitive).
**a.Write a function Fin Java using CoreNLP to extract Lemmatized Words **
b.Call the function F from a SparkTransformationfunction
Highlighted in diagram where I called NLP function and applied group by transformation
Output:
3.Take home Question
Create a simple question answering system as an extension of the dataset and tasks done in (2). Continuation from Tutorial 1B. Make sure to use at least two Spark Transformations and two Spark Actions.
- NLP function and storing results in hashmap
- Detecting answer type and calling nlp function in spark transformation function
- Applying spark transformations and actions to get final answer for each question
Output:
Question 1: where type
Answer:
Question 2: When type
Answer:
Question 3:Who type
Answer: