Spark ICP5 - neerajpadarthi/Big-Data-Programming GitHub Wiki

Name : Neeraj Padarthi

Class ID: 19

Spark ICP : 5

Objective

  • Importing the dataset as a CSV file and creating data frames directly
  • Creating graph out of the data frame created
  • Concatenating chunks into list & converting to DataFrame
  • Removing duplicates Name Columns
  • To create output DataFrame
  • Creating vertices
  • Showing some vertices
  • Showing some edges
  • Showing Vertex in-Degree and Vertex out Degree
  • Applying the motif findings

Introduction

  • This ICP I am doing the assignment using Graph Frames and GraphX
  • Graph Frames represent graphs: vertices and edges
  • Graph Frames are based upon Spark DataFrames
  • GraphX is based upon RDDs

Approach

  • Importing the dataset as a csv file and creating data frames directly

  • Concatenating chunks into list & converting to DataFrame

  • Removing duplicates Name Columns, forming vertices and edges

  • Creating graph out of the data frame created

  • Showing some vertices

  • Showing some edges

  • Showing Vertex in-Degree and Vertex out Degree

  • Applying the motif

  • Save vertices and edges (Bonus)

  • Bonus 1

  • Bonus 2