Big_Data_Programming_ICP_2_Module 2 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki

First we need to set the version of spark, scala in build.sbt

Task 1 :

Write a MapReduce program to perform Merge Sort Algorithm.

Procedure :

This algorithm is performed using the mapreduce algorithm. The given input array is divided into two sub-arrays and merged after sorting is done. We use two functions to perform this, namely merge() and mergesort(). Splitting of the array is done based on the middle value of the array, and then sorted later on, which is basically the mergesort() method. The merge() method is used to combine all the split up arrays. Here, a single array is considered as an input. This input array is then parellelised, followed by the defined functions.

Initially, a midpoint is found by dividing the first and last index numbers by 2. The mergesort() function is called for each of the sub-array obtained. Then the split arrays are created and sorted. The element which is the least than the other element will be inserted into the merged array using the merge() function.

Output:

Task 2 :

Write a program to perform Depth First Search Algorithm.

Procedure :

We define a function DFS is defined with several parameters - starting vertex as well as the entire graph. DFS0 is defined with elements of each list as well a visited list containing the vertices that are already visited. Result varies depending upon the starting vertex. For this program I gave the vertex '3' as starting vertex.

Output:

References :

https://codereview.stackexchange.com/questions/29699/bfs-and-dfs-in-scala https://stackoverflow.com/questions/5471234/how-to-implement-a-dfs-with-immutable-data-types