ICP 9 - Gnkhakimova/CS5590-BigData GitHub Wiki

ICP 9

Source Code

Tasks

  1. Perform Merge sort using Spark
  2. Perform DFS

Configuration

  • Linux Mint
  • IntelliJ
  • Apache Spark

Features

In this ICP 8 we used IntelliJ IDE to complete task, we had to perform merge sort by defining our own functions, also we implemented DFS for graph.

Merge Sort

For merge sort we created a list of unsorted integers, Parallelized it using RDD. Created two functions which will sort and merge a list and called it using RDD.

Output - Sorted List:

Depth First Search

For DFS we had a graph and we had to perform DFS by visiting each node. DFS - starts at the root node (selecting some arbitrary node as the root node in the case of a graph) and explores as far as possible along each branch before backtracking.

Output - Visited Nodes and their order:

Limitations

  • Had to do more research on RDD and how to pass a function.

References

  1. https://spark.apache.org/docs/latest/rdd-programming-guide.html
  2. https://medium.com/@KerrySheldon/breadth-first-search-in-apache-spark-d274403494ca