Spark ICP6 - neerajpadarthi/Big-Data-Programming GitHub Wiki

Name : Neeraj Padarthi

Class ID: 19

Spark ICP : 6

Objective

  • Importing the dataset as a csv file and creating data frames directly on import
  • Create graph out of the data frame created
  • Performing Triangle Count
  • Finding Shortest Paths w.r.t. Landmarks
  • Applying Page Rank algorithm on the dataset
  • Saving graphs generated to a file

Introduction

  • This ICP I am doing the assignment using Graph Frames and GraphX

  • Graph Frames represent graphs: vertices and edges

  • Graph Frames are based upon Spark DataFrames

  • GraphX is based upon RDDs

  • Importing the dataset as a csv file and creating data frames directly

  • Removing duplicates Name Columns, forming vertices and edges

  • Creating graph out of the data frame created

  • Performing Triangle Count

  • Finding Shortest Paths w.r.t. Landmarks

  • Applying Page Rank algorithm on the dataset

  • Saving graphs generated to a file

  • Bonus 1 LPA

  • Bonus 2 BFS