ICP2_5 - Hiresh12/Big-Data-Programming GitHub Wiki

Graph Frames and GraphX

Task:

To write a spark program to import datasets and create graph from the dataset using graphX.

Features:

  • Spark
  • python
  • Jupiter Notebook
  • GraphX and Graph frames

Tasks:

Import the dataset as a csv file and create data frames directly on import then create graph out of the data frame created.

Concatenate chunks into list & convert to DataFrame

Remove duplicates; Name Columns and Output DataFrame

Create vertices and edges

Show some vertices

Show some edges

Vertex in-Degree

Vertex out-Degree

Apply the motif findings.

Bonus

Vertex degree

What are the most common destinations in the dataset from location to location?

What is the station with the highest ratio of in degrees but fewest out degrees? As in, what station acts as almost a pure trip sink? A station where trips end at but rarely start from.

Save graphs generated to a file.

References

https://spark.apache.org/docs/latest/graphx-programming-guide.html

https://mapr.com/blog/how-get-started-using-apache-spark-graphx-scala/

https://databricks.com/blog/2016/03/03/introducing-graphframes.html