Spark ICP5 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 5
Objective
- Importing the dataset as a CSV file and creating data frames directly
- Creating graph out of the data frame created
- Concatenating chunks into list & converting to DataFrame
- Removing duplicates Name Columns
- To create output DataFrame
- Creating vertices
- Showing some vertices
- Showing some edges
- Showing Vertex in-Degree and Vertex out Degree
- Applying the motif findings
Introduction
- This ICP I am doing the assignment using Graph Frames and GraphX
- Graph Frames represent graphs: vertices and edges
- Graph Frames are based upon Spark DataFrames
- GraphX is based upon RDDs
Approach
- Importing the dataset as a csv file and creating data frames directly
- Concatenating chunks into list & converting to DataFrame
- Removing duplicates Name Columns, forming vertices and edges
- Creating graph out of the data frame created
-
Showing some vertices
-
Showing some edges
-
Showing Vertex in-Degree and Vertex out Degree
-
Applying the motif
-
Save vertices and edges (Bonus)
-
Bonus 1
- Bonus 2