ICP 9 - a190884810/Big-Data-Programming GitHub Wiki
SPARK Algorithm implementation
PROBLEM STATEMENT
- Create a Map-Reduce Program to perform Merge-Sort Algorithm in Spark.
- Implement Depth First Search in Graph in Apache Spark.
Feature
- Intellij IDE is installed with Scala plugin. Therein, MergeSorting is performed using sortByKey() function. Alongside, BFS is also performed.
Approach
Question 1
- Merge sort is a divide and conquer approach. There are two functions used to perform this sorting. They are merge() and mergesort(). mergesort() is a function used to sort the array by splitting arrays based on their mid-value. merge() function on the other hand, is used to merge multiple number of splitted arrays. Initially, An array is provided as an input. This array is later parallelised. The following is the screenshot depicting input.
INPUT
- Now, the array is passed to mergesort() function. Initially, a mid-point is found out by dividing the first and last value numbers by 2. This division of arrays is done consecutively. After division, merge() function is called. In merge() function, the sizes of each sub-array is found. Temporary arrays, Left and Right are created. After loading the data into these temporary arrays, Comparison of terms in these two temporary arrays is done. The element which is less in one array than the other one will be the element that will be put in the merged array. The following screenshots depicts the code.
OUTPUT
QUESTION 2
- The depth-first search algorithm allows us to determine whether two nodes, node x and node y, have a path between them. The DFS algorithm does this by looking at all of the children of the starting node, node x, until it reaches node y. It does this by recursively taking the same steps, again and again, in order to determine if such a path between two nodes even exists.
def DFS(start: Vertex, g: Graph): List[Vertex] = {
def DFS0(v: Vertex, visited: List[Vertex]): List[Vertex] = {
if (visited.contains(v))
visited
else {
val neighbours:List[Vertex] = g(v) filterNot visited.contains
neighbours.foldLeft(v :: visited)((b,a) => DFS0(a,b))
}
}
DFS0(start,List()).reverse
}
INPUT
OUTPUT
REFERENCE
- https://gist.github.com/fancellu/5e00336840098c194551
- https://dzone.com/articles/how-could-scala-do-merge-sort
- https://medium.com/basecs/deep-dive-through-a-graph-dfs-traversal-8177df5d0f13