Public transit graph generation - smart-fm/simmobility-prod GitHub Wiki
In the real world, when a traveler wants to commute by public transit to his trip destination, he does not plan his route on the road network. He plans in-terms of which bus-line or train-line he needs to take, which stop or station he must board and alight and where he may need to transfer. In other words, a public transit path is not the sequence of roads that starts at an origin location and leads to the destination. It is the sequence of access trip, public transit trip, transfers and egress trip. Apparently, [pathsets] (https://github.com/smart-fm/simmobility-prod/wiki/Route-choice:-public-transit-and-private#pathsets) for public transit route choice cannot be generated from the road network graph. We need a different kind of graph for finding public transit paths which correctly realizes the concept. In SimMobility, we have modeled such a public transit (PT) graph.
A [graph] (https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)) (in mathematical terms) contains a set of vertices and a set of edges connecting pairs of those vertices.
In our SimMobility PT graph, the vertex set consists of
- nodes from the [road network] (https://github.com/smart-fm/simmobility-prod/wiki/SimMobility-road-network)
- bus stops in the network
- train stations in the network.
The edge set which connects pairs of these vertices can be categorized into 3 edge types
- bus edge: If a bus-line serves bus-stop a and then eventually bus-stop b, there is an edge (a,b) in the edge set representing this bus-line service
- rail edge: If a train-line serves station c and then eventually station d, there is an edge (c,d) in the edge set representing this train-line
- walk edge: There is a walk edge connecting each node vertex to each bus-stop or train-stop vertex (and vice-versa) if the distance between the node and the PT stop is "walkable". These walk edges would represent the access and egress trips of PT paths. Similarly, there is a walk edge between every pair of PT (bus or train) stops that are at a "walkable" distance from one-another. These walk edges would represent the walk-transfers between those stops.
- SMS edge: There is SMS edge connecting the node to train-stops within 5km, with default SMS speed 30km/h.
The full details of how the graph is constructed is explained [here] (https://github.com/smart-fm/simmobility-prod/wiki/Public-Route-Choice-modeling). We have a R script, written in [R] (https://www.r-project.org/), to build this PT graph. This page explains the inputs and outputs of the program and the steps to run this program.
The R program takes 5 CSV files as inputs.
File name: P_nodes_.csv
This file gives the node ids and gps co-ordinates of all nodes in the the road network. Sometimes, it so happens that the node-ids and the PT-stop codes have the same value. To avoid confusion, we prefix node ids with the string "N_". Vertex type (stopType) value of 0 is assigned to all nodes in this file.
SQL Query:
select concat('N_', id) as stop_id, concat('N_', id) as stop_code, concat('N_', id) as stop_name, x as stop_lat, y as stop_lon, concat('N_', id) as "EZLink_Name", 0 as "stopType" from supply.node_wgs84;
File name: SimM_bus_stops_.csv
This file lists the stop codes, stop names and gps co-ordinates of all bus stops in the network being simulated. Vertex type value of 1 is assigned to all stops in this file.
SQL Query:
select code as stop_id, code as stop_code, name as stop_name, x as stop_lat, y as stop_lon, code as "EZLink_Name", 1 as "Type", 1 as "stopType" from supply.bus_stop_wgs84 where status = 'OP';
File name: bus_journeytime_.csv
For every bus line that is simulated, this file specifies the bus line id, trip id of the bus line (at least 1 trip must be specified for each bus line), stop sequence numbers, stop codes and arrival times at the stops for each stop for each bus line. The arrival times at each stop for a given trip of a bus line is used to estimate the (initial default) travel time between stops.
File name: SimM_RTS_stops_.csv
This file lists the station codes, station names and gps co-ordinates of all train stops in the rail network being simulated. In case of interchanges, the code for each platform for train lines that meet at this interchange is specified (with a "/" separator). For example, The stop code for Dhoby Ghaut station in Singapore, where CC, NE and NS train lines meet, is specified as "CC1/NE6/NS24". Vertex type value of 2 is assigned to all stops in this file.
SQL Query:
select platform_name as stop_id, platform_name as stop_code, station_name as stop_name, x as stop_lat, y as stop_lon, concat('STN ', station_name) as "EZLink_Name", 2 as "Type", 2 as "stopType" from supply.mrt_stop_wgs84;
File name: weekday_train_seq_.csv
Similar to the stop sequence file for bus lines, this file lists the train line id, trip id, station sequence numbers, station codes and arrival times at the stops for each railway station for each train line.
Among other (non-useful) output files, the R script produces two files - the vertices and edges of the public transit graph.
File name: All_PT_stops_.csv
This file is essentially a union of the input files for nodes, bus stops and train stations.
File name: All_PT_links_.csv
This file lists every edge of the PT graph. Each record of this output contains the stop pair for the edge, a unique edge id, edge type (walk, bus or rail) and the public transit (bus or train) line which this edge represents. Each record also contains default travel time information for this edge extracted from the stop arrival times in the input files. These times are used as initial default values for route choice when no historical information is available.
This section gives the steps to execute the R program.
-
Install R (if not already done)
-
Install the following library dependencies for the program
- stringr
sudo su - -c "R -e \"install.packages('stringr', repos='http://cran.rstudio.com/')\""
- oce
sudo su - -c "R -e \"install.packages('oce', repos='http://cran.rstudio.com/')\""
- data.table
sudo su - -c "R -e \"install.packages('data.table', repos='http://cran.rstudio.com/')\""
-
Download the R program and sample inputs from the links below. A few sample input directories are also provided for reference.
R script - generate Public Transit Graph -
In the folder, R files directory contains the source code for the program. Copy and replace all the files from folder R with RailSMS or R with RTS graph or R without RTS graph into base code R files folder for generating rail sms PT graph, PT graph (bus + rail) and PT graph (bus only, without rail) respectively.
-
Create R project and new directory with the name as today's date in ddmmmyy format. The input directory must be at the same level as the R files directory. The name of the folder can actually be anything. However, we recommend the date format so that we can keep track of when the graph was generated.
-
Prepare input files in the correct format and put them under this input directory. All input files need to be suffixed with the same date in ddmmmyy format. For example, "P_nodes_04May17.csv", "SimM_bus_stops_04May17.csv", "weekday_train_seq_04May17.csv" and so on.
-
Open R files/MainFile_SimM_Network.R and set
vsto the input directory name (which should also be the suffixes of the input files) in the first few lines of code.
#==================INPUT FILES===========================
# 1) #----------update version number here for different back of data
#create a folder named after version
vs<-"ddmmmyy"- After the program finishes execution, you can find the vertices and edges of the public transit graph produced in the input directory among other outputs.