Spark ICP6 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 6
- Importing the dataset as a csv file and creating data frames directly on import
- Create graph out of the data frame created
- Performing Triangle Count
- Finding Shortest Paths w.r.t. Landmarks
- Applying Page Rank algorithm on the dataset
- Saving graphs generated to a file
This ICP I am doing the assignment using Graph Frames and GraphX
Graph Frames represent graphs: vertices and edges
Graph Frames are based upon Spark DataFrames
GraphX is based upon RDDs
Importing the dataset as a csv file and creating data frames directly
- Removing duplicates Name Columns, forming vertices and edges
- Creating graph out of the data frame created
Performing Triangle Count
Finding Shortest Paths w.r.t. Landmarks
Applying Page Rank algorithm on the dataset
Saving graphs generated to a file
Bonus 1 LPA
Bonus 2 BFS