ICP2_5 - Hiresh12/Big-Data-Programming GitHub Wiki
Graph Frames and GraphX
Task:
To write a spark program to import datasets and create graph from the dataset using graphX.
Features:
- Spark
- python
- Jupiter Notebook
- GraphX and Graph frames
Tasks:
Import the dataset as a csv file and create data frames directly on import then create graph out of the data frame created.
Concatenate chunks into list & convert to DataFrame
Remove duplicates; Name Columns and Output DataFrame
Create vertices and edges
Show some vertices
Show some edges
Vertex in-Degree
Vertex out-Degree
Apply the motif findings.
Bonus
Vertex degree
What are the most common destinations in the dataset from location to location?
What is the station with the highest ratio of in degrees but fewest out degrees? As in, what station acts as almost a pure trip sink? A station where trips end at but rarely start from.
Save graphs generated to a file.
References
https://spark.apache.org/docs/latest/graphx-programming-guide.html
https://mapr.com/blog/how-get-started-using-apache-spark-graphx-scala/
https://databricks.com/blog/2016/03/03/introducing-graphframes.html