Spark ICP5 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 5
Objective
- Importing the dataset as a CSV file and creating data frames directly
- Creating graph out of the data frame created
- Concatenating chunks into list & converting to DataFrame
- Removing duplicates Name Columns
- To create output DataFrame
- Creating vertices
- Showing some vertices
- Showing some edges
- Showing Vertex in-Degree and Vertex out Degree
- Applying the motif findings
Introduction
- This ICP I am doing the assignment using Graph Frames and GraphX
- Graph Frames represent graphs: vertices and edges
- Graph Frames are based upon Spark DataFrames
- GraphX is based upon RDDs
Approach
- Importing the dataset as a csv file and creating data frames directly

- Concatenating chunks into list & converting to DataFrame

- Removing duplicates Name Columns, forming vertices and edges


- Creating graph out of the data frame created

-
Showing some vertices

-
Showing some edges

-
Showing Vertex in-Degree and Vertex out Degree

-
Applying the motif

-
Save vertices and edges (Bonus)

-
Bonus 1

- Bonus 2
