MODULE 2 ICP 2 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki

PROBLEM STATEMENT

Create a Map-Reduce Program to perform Merge-Sort Algorithm in Spark.
Implement Breadth First Search in Graph in Apache Spark.

FEATURES

Intellij IDE is installed with Scala plugin. Therein, MergeSorting is performed using sortByKey() function. Alongside, BFS is also performed.

APPROACH

QUESTION 1

Merge sort is a divide and conquer approach. There are two functions used to perform this sorting. They are merge() and mergesort(). mergesort() is a function used to sort the array by splitting arrays based on their mid-value. merge() function on the other hand, is used to merge multiple number of splitted arrays. Initially, An array is provided as an input. This array is later parallelised. The following is the screenshot depicting input.

INPUT

Now, the array is passed to mergesort() function. Initially, a mid-point is found out by dividing the first and last value numbers by 2. This division of arrays is done consecutively. After division, merge() function is called. In merge() function, the sizes of each sub-array is found. Temporary arrays, Left and Right are created. After loading the data into these temporary arrays, Comparison of terms in these two temporary arrays is done. The element which is less in one array than the other one will be the element that will be put in the merged array. The following screenshots depicts the code.

OUTPUT

QUESTION 2

Breadth First Search is an approach where in the visited node is marked so that coming back to the already visited node is avoided. In the code, The initial configurations is done and spark context is set. A graph is given in the form of lists. List is an immutable sequence of elements implemented in the form of linked list. A function BFS is defined with several parameters. These include, starting vertex as well as the entire graph. Inside, One more function BFS0 is defined with elements of each list as well a visited list containing the vertices that are already visited. Graph is passed to flatmap. Alongside, FilterNot function is used that unfilters all the visited nodes. If the neighbors are empty then it is assumes that the neighbors are already visited. Else, they are traversed and are set to visited. The following screenshots depicts the code,input and output.

MODULE 2 ICP 2 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki

PROBLEM STATEMENT

FEATURES

APPROACH

QUESTION 1

INPUT

OUTPUT

QUESTION 2

INPUT

OUTPUT

REFERENCES