Spark ICP6 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 6
Objective
- Importing the dataset as a csv file and creating data frames directly on import
- Create graph out of the data frame created
- Performing Triangle Count
- Finding Shortest Paths w.r.t. Landmarks
- Applying Page Rank algorithm on the dataset
- Saving graphs generated to a file
Introduction
-
This ICP I am doing the assignment using Graph Frames and GraphX
-
Graph Frames represent graphs: vertices and edges
-
Graph Frames are based upon Spark DataFrames
-
GraphX is based upon RDDs
-
Importing the dataset as a csv file and creating data frames directly
- Removing duplicates Name Columns, forming vertices and edges
- Creating graph out of the data frame created
-
Performing Triangle Count
-
Finding Shortest Paths w.r.t. Landmarks
-
Applying Page Rank algorithm on the dataset
-
Saving graphs generated to a file
-
Bonus 1 LPA
-
Bonus 2 BFS