ICP_12 - PallaviArikatla/Big-Data-Programming GitHub Wiki
INTRODUCTION: To perform an analysis on Distributed Collection of Data.
IMPLEMENTATION:
Manage the dependencies initially in build.sbt.
QUESTION 1
1. Import the dataset as a csv file and create data frames directly on import then create graph out of the data frame created.
Here we are importing the two datasets stations.csv and trips.csv and we are also creating dataframes directly.
The stations.csv and trips.csv files are imported and dataframes are created.
Schema will be printed.
2. Concatenate chunks into list & convert to DataFrame.
In this code we are concatinating two columns longitude and latitude of the station dataset.
3. Will be eliminating duplicate columns.
4. Renaming the columns.
We will be renaming the columns in both the datasets.
5. OUTPUT DATA FRAME
The dataframes of stations and trips are displayed below.
6. Create vertices.
7. Show some vertices.
8. Show some edges.
We will be creating vertices to the station dataset and edges to the trips dataset.
9. Vertex in-Degree.
10. Vertex out-Degree.
Will display the in-degree and out-degree in descending order with a limit 5.
11. Apply the motif findings.
In this code we are finding the motif as we are writing the pattern for the subgraph we haven taken as the product of a goes to product and vice versa. * We are displaying the possibilities here.
12. Apply Stateful Queries.
This code is to find the motif by carrying state along the path.
13. Subgraphs with a condition.
This is the code to retrieve all the data which has trip duration greater than 600.
BONUS QUESTION:
1. Vertex degree.
2. What are the most common destinations in the dataset from location to location.
3. What is the station with the highest ratio of in degrees but fewest out degrees. As in, what station acts as almost a pure trip sink. A station where trips end at but rarely start from.
In this code we are creating the in and out degree views and selecting the indegree, outdegree and id and joining them with inclusion of ids.
Then creating the view and selecting the id from the view as ordering indegree as ascending and outdegree as descending and displaying them.