ICP 9 - PallaviArikatla/Big-Data-Programming GitHub Wiki
INTRODUCTION: To perform different algorithms and sorting techniques using Spark and Scala.
Software Required:
-
Spark.
-
Intellij with scala plugin installed.
IMPLEMENTATION:
Question 1: K-Means Clustering Algorithm.
K-Means clustering helps in partitioning numerous similar observations and grouping them. Each observation groups to a cluster with the nearest mean.
- Here initially number of clusters are randomly selected which will be our K value.
- Consider a dataset as input and select the input ranges.
- Eliminate all the headers and using Kmeans cluster the data into classes.
- Calculate mean square error and centroids of each clusters.
OUTPUT:
Question 2: Merge Sort.
- Write a method called merge sort with a given input list and make arrangement of the list with center element of the list as zero.
- Divide the input list by 2 and consider the middle index number.
- If the middle indexed number is zero then it returns the same input list.
- After dividing the list merge sort method will be called and after this two sorts methods are combined as a single list.
OUTPUT:
Question 3: Depth First search.
- The Depth First search allows us to identify whether there is any path between one node and another.
- Input will be given this way: 1 -> List(7,9), 7 -> List(1,8),8 -> List(7,9), 9 -> List(1,8)
- The input starts with 1 and gets passed to DFS method.
- This function goes to another node continuously unless it is a new node.
- And the output will be as follows.
OUTPUT: