Big_Data_Programming_ICP_6_Module2 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki
Procedure :
-
Import the dataset
Import the dataset as a csv file and create data frames directly on import ,then create graph out of the data frame created
Input :


Edges output is as follows

Vertices output is as follows

-
Triangle Count
The trianglecount function computes the number of triangles passing through each vertex.
Input :

Output :

-
Find Shortest Paths w.r.t. Landmarks
Input :

Output :

-
Apply Page Rank algorithm on the dataset
PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important is the website.
Input :

Output :


-
Save graphs generated to a file
The graphs that are generated for both the vertices and the edges are to be stored in a seperate vertices and edges folder.
Input :

Output :

Bonus
Procedure :
- Apply Label Propagation Algorithm
Input :

Output :


- Apply BFS algorithm
This algorithm finds the shortest path from one vertex to another vertex.
Input :

Output :

References :
https://spark.apache.org/docs/latest/graphx-programming-guide.html https://docs.databricks.com/spark/latest/graph-analysis/graphframes/user-guide-scala.html