ICP 9 - a190884810/Big-Data-Programming GitHub Wiki

SPARK Algorithm implementation

PROBLEM STATEMENT

  1. Create a Map-Reduce Program to perform Merge-Sort Algorithm in Spark.
  2. Implement Depth First Search in Graph in Apache Spark.

Feature

  • Intellij IDE is installed with Scala plugin. Therein, MergeSorting is performed using sortByKey() function. Alongside, BFS is also performed.

Approach

Question 1

  • Merge sort is a divide and conquer approach. There are two functions used to perform this sorting. They are merge() and mergesort(). mergesort() is a function used to sort the array by splitting arrays based on their mid-value. merge() function on the other hand, is used to merge multiple number of splitted arrays. Initially, An array is provided as an input. This array is later parallelised. The following is the screenshot depicting input.

INPUT

  • Now, the array is passed to mergesort() function. Initially, a mid-point is found out by dividing the first and last value numbers by 2. This division of arrays is done consecutively. After division, merge() function is called. In merge() function, the sizes of each sub-array is found. Temporary arrays, Left and Right are created. After loading the data into these temporary arrays, Comparison of terms in these two temporary arrays is done. The element which is less in one array than the other one will be the element that will be put in the merged array. The following screenshots depicts the code.

OUTPUT

QUESTION 2

  • The depth-first search algorithm allows us to determine whether two nodes, node x and node y, have a path between them. The DFS algorithm does this by looking at all of the children of the starting node, node x, until it reaches node y. It does this by recursively taking the same steps, again and again, in order to determine if such a path between two nodes even exists.
def DFS(start: Vertex, g: Graph): List[Vertex] = {

  def DFS0(v: Vertex, visited: List[Vertex]): List[Vertex] = {
    if (visited.contains(v))
      visited
    else {
      val neighbours:List[Vertex] = g(v) filterNot visited.contains 
      neighbours.foldLeft(v :: visited)((b,a) => DFS0(a,b))
    } 
  }
  DFS0(start,List()).reverse
} 

INPUT

OUTPUT

REFERENCE

  1. https://gist.github.com/fancellu/5e00336840098c194551
  2. https://dzone.com/articles/how-could-scala-do-merge-sort
  3. https://medium.com/basecs/deep-dive-through-a-graph-dfs-traversal-8177df5d0f13