exam1 - manaswinivedula/Big-Data-Programming GitHub Wiki

Project-based Exam-1

Task1:

Introduction: Facebook is one of the most popular used social media applications in today's world and generated terabytes of data every day. The task is to find mutual friends for all the input pairs the same as Facebook using a Big data technology Hadoop Map and reduce algorithm.

Objective: To create a Hadoop map-reduce algorithm to find mutual friends for the given input pairs.

Approach:

The below flow diagram shows the flow of data between the Mapper and reducer Phase.

1.Mapper class:

I.initially each line of the input file is taken and splitter up into two columns as user_accounts and mutuals based on the symbol ‘->’. II. Then for each mutual friend in the if account_user compared with(mutual)<0 then the account_user, mutual is returned as a pair or mutual,account_user is returned as a pair. III. The output of the mapper phase will be with pairs along with one of their mutual friends. The output is written into conf.

Reducer class:

I. In the reducer Phase, each output from the conf is taken.

II. Hash functions are used to eliminate the duplicates. Whereas string builder functions are used to store the output result.

III. Each pair is compared, and the non-duplicate mutual friends is added to the string builder.

IV. Finally, the output is stored into the result and is written to conf.

Main class

I. In the main class the all the setup is being done like the configuration and job setup and also checking whether the columns are split into 2 or not. II. All the class names, setting up mapper and reducer classes. III. Specifying the input and output file formats.

Workflow:

i. Initially, a java project named FbMutualFriends is created in the eclipse and a class named FbMutualFriends is created. ii. All the external libraries of Hadoop are added iii. A jar file named FbMutualFriends is created from the FbMutualFriends class

This is the input file which is saved as input.txt in the local system with the following values.

This is viewing the input file in the command prompt.

Created a directory named exam question1 in Hdfs.

i. Imported this input.txt file into the Hdfs. ii. Now performing the Fb Mutual friends Hadoop Map reduces algorithm on the input file using the FbMutualFriends jar and storing it in the question1output directory.

Successful execution of Map and reduce phase.

5.The output of the Fb mutual friends in the command prompt.

6.Output visualization in the Hue.

Task4

Facebook is one of the most popular used social media applications in today’s world and generated terabytes of data every day. The task is to find mutual friends for all the input pairs the same as Facebook using a Big data technology Hadoop Map and reduce algorithm.
As people are extensively using Facebook newer accounts have been created and people nowadays are more willing to socialize. So, with the help of Big data technology Hadoop Map reduce we can find the mutual between two individuals As It is highly scalable and the processing is very fast and efficient even for the high volumes of data and people can find whether they know the person or not with the help of mutual contacts.
I have done the Task1 Hadoop map and reduce the algorithm for finding Face book mutual friends.
The challenge I faced was to eliminate the duplicate key-value pairs in the reducer class but later came up with the idea of a hash function.
Everyone in the team has divided the tasks equally and individually working on them but in case of facing challenges, we as a team are to figure that out.

GitHub links:

Task 2.1

Task 2.2

Task 3