Big_Data_Programming_ICP_5_Module2 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki
Procedure :
- Import the dataset
Import the dataset as a csv file and create data frames directly on import, then create graph out of the data frame created
Input :
- Concatenate chunks into list & convert to DataFrame
I performed concatenation of two columns named 'lat','lan' from the Stations dataframe.
Input :
Output :
3.Remove duplicates
The 'distinct' command is used to check for any duplicate values in the dataframes.
Input :
Output :
- Name Columns, Output DataFrame, Create vertices
5.Show some vertices
Input:
Output :
- Show some edges
Input :
Output :
- Vertex in-Degree
Input :
Output :
- Vertex out-Degree
Input :
Output :
- Apply the motif findings
The motif findings is nothing but finding the sub-graphs which can be traversed in either ways. Here, the pattern which is considered is 'a to b' and 'b to a'.
Input :
Output :
Bonus:
Vertex degree
Input :
Output :
- What are the most common destinations in the dataset from location to location
This is being done using the groupby function and set the limit to 10 inorder to display the top 10 common destinations.
Input :
Output :
- What is the station with the highest ratio of in degrees but fewest out degrees.
As in, what station acts as almost a pure trip sink. A station where trips end at but rarely start from. This is being implemented using the join operation.
Input :
Output :
- Save graphs generated to a file
The graphs that are generated for both the vertices and the edges are to be stored in a seperate vertices and edges folder.