Exam 2 Part 1: Facebook Friends - gabriellawillis/BigData GitHub Wiki

Part 1 Source Code:

SourceCode

Technology Used:

  • Github
  • IntelliJ
  • ApacheSpark

a) Implement MapReduce algorithm for finding Facebook common friends' problem and run the MapReduce job on Apache Spark.

Create Mapper Function

def FacebookMapper(line: String) = {
  val words = line.split(" ")
  val key = words(0)
  val pairs = words.slice(1, words.size).map(friend => {
    if (key < friend) (key, friend) else (friend, key)
  })
  pairs.map(pair => (pair, words.slice(1, words.size).toSet))
}

Create Reducer Function

def FacebookReducer(accumulator: Set[String], set: Set[String]) = {
    accumulator intersect set // Accumulator is used to intersect the data to find mutual friends
  }

  // given input file = facebook_combined.txt
  val file = sc.textFile("/Users/gabriellawillis/Desktop/facebook_combined.txt")

  val results = file.flatMap(FacebookMapper)
    .reduceByKey(FacebookReducer)
    .filter(!_._2.isEmpty)
    .sortByKey()

 results.collect.foreach(line => {
     println(s"${line._1} , (${line._2.mkString(" ")})")})

Output Facebook Friends Data

Explain the following questions in the context of your final class project. (Mandatory for all students individually)

  • 1.Explain the idea of your work done for this Exam briefly.

    • I needed to implement a MapReduce function in order to create key value pairs for users and their Facebook Friends. Once I divided up the input data and distributed work among the nodes, I was able to use the mapper function where partitioning would take place. The reducer function would count out the values and then present them as they're written out in the output file.
  • 2.Explain the usage of the above all questions in today’s World.

    • My code could be used to find mutual friends on Facebook or, with other data, find mutual connections between nodes.
  • 3.Mention the portion of the project clearly which you have worked.

    • I worked on Part 1 - Hadoop Map Reduce Spark - Finding Facebook Common Friends
  • 4.What challenges you faced during the development process.

    • Dividing up the data correctly was the most challenging part of this project.
  • 5.Explain the milestones of your project and briefly discuss how did you integrate your part (e.g. based on queries etc.) with other team member work and what issues you faced e.g. compatibility

    • Gabriella Willis | Hadoop MapReduce Algorithm

    • Elizabeth Nastoff | Spark Data Frames (Parts A and B)

    • Jun | Spark Data Frames (Parts C and D)

    • Uyen Dang | Spark Streaming Task

    • Brett Recker | GraphX