Lab Assignment 3 - sirisha1206/Spark GitHub Wiki
LAB ASSIGNMENT 3
Name : Naga Sirisha Sunkara
Class ID : 21
Team ID : 5
Technical partners details :
Name : Vinay Santhosham
Class ID : 17
Objective:
Task 1 : Hadoop MapReduce Algorithm
Implement MapReduce algorithm for finding Facebook common friends problem and run the MapReduce job on Apache Spark.
Task 2 : Spark Data Frames
a. Create a Spark DataFrame using one of datasets, trying to use all different StructType.
b. Perform 10 intuitive questions in Dataset (e.g.: pattern recognition, topic discussion, most important terms, etc.). Use your innovation to think out of box.
c. Perform any 5 queries in Spark RDD’s and Spark Data Frames. Compare the results
Task 1:
Algorithm:
CodeSnippet:
Mapper:
Reducer:
Mapper Output:
Reducer Output:
Task 2:
a) Creation of Spark Data Frames ( Code Snippet)
b) Queries
- NOT Operator (Code)
NOT Operator (Output)
-
Groupby And Orderby (Code)
Groupby And Orderby (Output)
![](https://github.com/sirisha1206/Spark/raw/master/Lab/Lab3/Documentation/2b_2o.PNG)
3) Sub Query Implementation (Code)
Sub Query Implementation (Output)
4)Aggregate Function (Code)
Aggregate Function(Output)
5)Join query (Code)
Join query(Output)
6) Pattern recognition (Code)
Pattern recognition (Output)
7)Range query (Code)
Range query (Output)
8) IN operator(Code)
IN opertaor(Output)
9)Union (Code)
Union (Output)
10)Right Join (Code)
Right Join(Output)
c) 5 Queries and comparing the result
- Not Operator
Not Operator(Output)
2)Groupby and OrderBy (Code)
Groupby and OrderBy(Output)
3)SubQuery Implementation(Code)
SubQuery Implementation(Output)
4)Pattern recognition(Code)
Pattern recognition (Output)
5)Range Query(Code)
Range Query(Output)