ICP 9 - PallaviArikatla/Big-Data-Programming GitHub Wiki

INTRODUCTION: To perform different algorithms and sorting techniques using Spark and Scala.

Software Required:

Spark.
Intellij with scala plugin installed.

IMPLEMENTATION:

Question 1: K-Means Clustering Algorithm.

K-Means clustering helps in partitioning numerous similar observations and grouping them. Each observation groups to a cluster with the nearest mean.

Here initially number of clusters are randomly selected which will be our K value.
Consider a dataset as input and select the input ranges.
Eliminate all the headers and using Kmeans cluster the data into classes.
Calculate mean square error and centroids of each clusters.

OUTPUT:

Question 2: Merge Sort.

Write a method called merge sort with a given input list and make arrangement of the list with center element of the list as zero.
Divide the input list by 2 and consider the middle index number.
If the middle indexed number is zero then it returns the same input list.
After dividing the list merge sort method will be called and after this two sorts methods are combined as a single list.

OUTPUT:

Question 3: Depth First search.

The Depth First search allows us to identify whether there is any path between one node and another.
Input will be given this way: 1 -> List(7,9), 7 -> List(1,8),8 -> List(7,9), 9 -> List(1,8)
The input starts with 1 and gets passed to DFS method.
This function goes to another node continuously unless it is a new node.
And the output will be as follows.

OUTPUT: