Big_Data_Programming_ICP_6_Module2 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki

Procedure :

  • Import the dataset

Import the dataset as a csv file and create data frames directly on import ,then create graph out of the data frame created

Input :

Edges output is as follows

Vertices output is as follows

  • Triangle Count

The trianglecount function computes the number of triangles passing through each vertex.

Input :

Output :

  • Find Shortest Paths w.r.t. Landmarks

Input :

Output :

  • Apply Page Rank algorithm on the dataset

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important is the website.

Input :

Output :

  • Save graphs generated to a file

The graphs that are generated for both the vertices and the edges are to be stored in a seperate vertices and edges folder.

Input :

Output :

Bonus

Procedure :

  1. Apply Label Propagation Algorithm

Input :

Output :

  1. Apply BFS algorithm

This algorithm finds the shortest path from one vertex to another vertex.

Input :

Output :

References :

https://spark.apache.org/docs/latest/graphx-programming-guide.html https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-scala.html