Big_Data_Programming_ICP_6_Module2 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki
Procedure :
-
Import the dataset
Import the dataset as a csv file and create data frames directly on import ,then create graph out of the data frame created
Input :
Edges output is as follows
Vertices output is as follows
-
Triangle Count
The trianglecount function computes the number of triangles passing through each vertex.
Input :
Output :
-
Find Shortest Paths w.r.t. Landmarks
Input :
Output :
-
Apply Page Rank algorithm on the dataset
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important is the website.
Input :
Output :
-
Save graphs generated to a file
The graphs that are generated for both the vertices and the edges are to be stored in a seperate vertices and edges folder.
Input :
Output :
Bonus
Procedure :
- Apply Label Propagation Algorithm
Input :
Output :
- Apply BFS algorithm
This algorithm finds the shortest path from one vertex to another vertex.
Input :
Output :
References :
https://spark.apache.org/docs/latest/graphx-programming-guide.html https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-scala.html