M2 ICP 5 - PavankumarManchala/BigDataProgrammingICPs GitHub Wiki
Submitted By:
Pavankumar Manchala
Class Id: 16
Part-1:
Q1. Import the dataset as a csv file and create data frames directly on import than create graph out of the data frame created. The dataset downloaded and imported and created data frame. Then graph created with that data frames.
Q2. Concatenate chunks into list & convert to DataFrame.
The chunks are created and then concatenate.
Output:
Q3. Remove duplicates.
We removed the duplicates by using distinct.
Q4. Name Columns.
Used the .withcolumnrenamed for renaming the specified columns.
Output:
Q6,Q7,Q8. Create vertices and Show some vertices, edges
Created vertices and edges during import and then displayed those.
Output:
Q9. Vertex in-Degree: The indegree is the number of incoming edges at particular vertex
Q10. Vertex out-Degree: The outdegree is the number of outgoing edges at particular vertex
Output:
Q11. Apply the motif findings.
Finding the edge from source to destination and then different edge from destination to source.
Output:
Bonus:
Q1. Vertex degree
The number of incoming and outgoing edges at each vertex.
Q2. what are the most common destinations in the dataset from location to location.
The most common destinations is found.
Output:
Q3. what is the station with the highest ratio of in degrees but fewest out degrees. As in, what station acts as almost a pure trip sink. A station where trips end at but rarely start from.
Output:
Q4. Save graphs generated to a file.
Output: