CASE1 - RoshiniVarada/BDP_Project2 GitHub Wiki

BIG DATA PROGRAMMING

PROJECT-2

Case1-FacebookMutualFriendsUsingSpark

Team Members and collaboration:

Roshini varada --Facebook Mutual Friends Using Spark

Sarika Reddy Kota -- Spark Data Frames

Pallavi Arikatla – Spark Streaming

Zakari, Abdulmuhaymin –Spark Graph Frames

Idea:

To identify the Common Friends of any two people in a social network with the help of map-reduce using spark. In spark the MapReduce algorithm is hundred times faster than the Hadoop MapReduce and is efficient. In spark we use the transformations such as group by, map, flat-map and reduce to perform the operation.

Usage or the real time scenario:

Social networking sites identifies the common friends between people. When a person visits another person profile then they can see the mutual contacts.

Example:

If P1 and P2 has P3,P4 as their common friends, If P1 visits P2's profile P1 can identify P3,P4 as their common friends and vice versa.

Approach and solution:

Flow Chart or Pictorial Representation:

1.The identification of mutual friends for a unit takes place in 5 phases. They are

a.Input

b.GroupBy

c.Split

d.Map

e.Shuffle

f.Reduce

The below diagram represents how the input transforms in each phase.

Algorithm For the approach

Implementation

a)Implementing mapreduce using spark with the example given.

1.Here the input taken is the first attribute indicates the user and the second attribute indicates the friend of the user.

Main method

Input:

2.After this the users are mapped with their friends using groupby key.

3.After grouping the data is split and the mapping is done for each user. Here each user is paired with the other user and the and the list of the other friends is formed.

4.Then the data is reduced using the reduce by operation after the shuffle sorting.

b)Implementing mapreduce using spark data set given.

1.The facebook_combined data is taken as the input.

2.Output after grouping.

3.Output after mapping

4.The final output after reduction.

Challenges faced

1.The coding part was smooth without any difficulties.

Integration and Milestones

Since all the cases are independent from each other there are no difficulties faced in the integration part.

Link for YouTube Video

https://youtu.be/FPTmD63Nh9A

Team Members Links

Use-case1 -FacebookMutualFriendsUsingSpark

Wiki-link- https://github.com/RoshiniVarada/BDP_Project2/wiki/CASE1

Use-case2-Spark Data Frames

Wiki-link- https://github.com/RoshiniVarada/BDP_Project2/wiki/CASE2

Use-case3- Spark Streaming

Wiki-link- https://github.com/RoshiniVarada/BDP_Project2/wiki/CASE3

Use-case4-Spark-Graphx

Wiki-link- https://github.com/RoshiniVarada/BDP_Project2/wiki/CASE4