ICP 9 - PallaviArikatla/Big-Data-Programming GitHub Wiki

INTRODUCTION: To perform different algorithms and sorting techniques using Spark and Scala.

Software Required:

  • Spark.

  • Intellij with scala plugin installed.

IMPLEMENTATION:

Question 1: K-Means Clustering Algorithm.

K-Means clustering helps in partitioning numerous similar observations and grouping them. Each observation groups to a cluster with the nearest mean.

  • Here initially number of clusters are randomly selected which will be our K value.
  • Consider a dataset as input and select the input ranges.
  • Eliminate all the headers and using Kmeans cluster the data into classes.
  • Calculate mean square error and centroids of each clusters.

OUTPUT:

Question 2: Merge Sort.

  • Write a method called merge sort with a given input list and make arrangement of the list with center element of the list as zero.
  • Divide the input list by 2 and consider the middle index number.
  • If the middle indexed number is zero then it returns the same input list.
  • After dividing the list merge sort method will be called and after this two sorts methods are combined as a single list.

OUTPUT:

Question 3: Depth First search.

  • The Depth First search allows us to identify whether there is any path between one node and another.
  • Input will be given this way: 1 -> List(7,9), 7 -> List(1,8),8 -> List(7,9), 9 -> List(1,8)
  • The input starts with 1 and gets passed to DFS method.
  • This function goes to another node continuously unless it is a new node.
  • And the output will be as follows.

OUTPUT: