Lab Assignment 2 - SaratM34/KDM-Lab-Assignments GitHub Wiki
Name: Mudunuri Sri Sai Sarat Chandra Varma
ClassID: 14
Mail: [email protected]
Objective: The objective of this lab is to Create a simple question answering system as an extension of the dataset and tasks done in (2). Continuation from Tutorial 1B. Make sure to use at least two Spark Transformations and two Spark Actions.
1) Generate the output (changes or transformations in the data) manually when the following Spark tasks are applied on the input text. Show your output in details.
Input:
- The dog saw John in the park
- The little bear saw the fine fat trout in the rocky brook.
Task 1: Map vs FlatMap
- Map
- FlatMap
- Map
- FlatMap
Task 2: Map Reduce
Task 3: Group by Starting Letter (Draw diagram how the spark methods used changes the data similar wordcount diagram as shown below)
2) Write a simple spark program to read a dataset and group each word by the starting letter of its lemmatized word (in this exercise, we assume case-not-sensitive).
- a) Write a function F in Java using CoreNLP to extract Lemmatized Words
- b) Call the function F from a SparkTransformationfunction
- Output after running the lemmatize function and using spark tranformations: The following screen shot is the output that grouped each word by the starting letter of its lemmatized word.
Create a simple question answering system as an extension of the dataset and tasks done in (2). Continuation from Tutorial 1B. Make sure to use at least two Spark Transformations and two Spark Actions.
- Reading dataset using Spark methods:
- Processing dataset using spark’s transformation and actions to call the coreNLP function
- CoreNLP output using spark transformations and actions
- Used spark transformations Map, filter, flatmap to map the words to their letters and filter to filter out the entities and flatmap to map items into a single entity. Used spark actions collect count, take to count the number of entities and take particular elements from the processed dataset.
- Question 1
- Question 2
- Question 3
- Question 4
- Question 5