LAB 01 - acikgozmehmet/BigDataProgramming GitHub Wiki

LAB 01:

Contributors

  • Name: Mehmet Acikgoz
  • Class Id: 1

Table of Contents:

Objectives

a. Question-1: Use Case: Hadoop Map-Reduce Algorithm


Hadoop Map-Reduce Algorithm

Finding Facebook common friends:

Facebook has a list of friends (note that friends are a bi-directional thing on Facebook. If I'm your friend, you're mine). They also have lots of disk space and they serve hundreds of millions of requests every day. They've decided to pre-compute calculations when they can to reduce the processing time of requests. One common processing request is the "You and Joe have 230 friends in common" feature. When you visit someone's profile, you see a list of friends that you have in common. We're going to use MapReduce so that we can calculate everyone's common friends.

Assume the friends are stored as Person->[List of Friends], our friends list is then (depicted as "input" in the following figure):

During the Split-phase: Each line will be distributed to cluster and each line will be an argument to a mapper.

During the Map-phase: For every friend in the list of friends, the mapper will output a key-value pair. The key will be a friend along with the person. The value will be the list of friends. The key will be sorted so that the friends are in order, causing all pairs of friends to go to the same reducer.

Mapper:

During the Shuffle-part: We group them by their keys and get, before we send these key-value pairs to the reducers,

During Reducer-phase: Each line will be passed as an argument to a reducer. The reduce function will simply intersect the lists of values and output the same key with the result of the intersection.

Please click on the link to to reach the full source for MutualFriends.

Please click on the link to to reach the source for generating your own test data.

Implementation:

Test-1:

(For the given example above.)

hadoop jar MutualFriends-1.0.jar MutualFriends /user/cloudera/lab1/test5 /user/cloudera/lab1/test5out

Test-2:

(A new test data is created with the [UserFriendsTestData.java] (https://github.com/acikgozmehmet/BigDataProgramming/blob/master/Lab1/question1/SourceCode/UserFriendsTestData.java))

hadoop jar MutualFriends-1.0.jar MutualFriends /user/cloudera/lab1/test10 /user/cloudera/lab1/test10out