3_input_data - TUM-VT/FleetPy GitHub Wiki
Data files provide structured inputs, which are necessary to describe e.g., networks, zones, customer demand, vehicles, infrastructure, etc. Data files are loaded along with the modules in the initialization of the simulation (e.g., FleetSimulationBase, the parent class for the core simulation).
The files to be loaded for a simulation can be selected in the scenario config files (see here).
Input data files are stored in the data
-folder. Example files can be found there. Generally, data-files are included in git-ignore.
In the following the to the corresponding path specifications for the input files are given while under 'Data Specification' the data format of the files is described.
- {network_name} denotes the title of a network
- there are various routing-modules which are based on different preprocessing scripts; the preprocessed data are also saved in the respective network directory
- the specification of all network csv and geojson files is given in this wiki under
3_input_data/Data Specification/
- each network has to have following mandatory directory and file structure:
data/networks/
data/networks/{network_name}/
data/networks/{network_name}/base/
data/networks/{network_name}/base/nodes.csv
data/networks/{network_name}/base/edges.csv
data/networks/{network_name}/base/nodes_all_infos.geojson
data/networks/{network_name}/base/edges_all_infos.geojson
- if the coordinate frame is not WGS84, an additional file states the used reference system
data/networks/{network_name}/base/crs.info
- in case network travel times are deterministic, but vary over time, the edge travel times are saved in following structure:
data/networks/{network_name}/{scenario_time}/
data/networks/{network_name}/{scenario_time}/edges_td_att.csv
- additionally, the NetworkTable routing module requires fastest node-to-node travel time and distance tables for each travel time directory
data/networks/{network_name}/ff/
data/networks/{network_name}/ff/tables/
data/networks/{network_name}/ff/tables/nn_fastest_distance.npy
data/networks/{network_name}/ff/tables/nn_fastest_travel_time.npy
data/networks/{network_name}/{scenario_time}/tables/
data/networks/{network_name}/{scenario_time}/tables/nn_fastest_distance.npy
data/networks/{network_name}/{scenario_time}/tables/nn_fastest_travel_time.npy
- network dynamics: this file defines loading of time dependent travel time files or(!) travel time factors (optional)
[optional input data]
- spatial aggregation into zones is necessary for several use cases, e.g. vehicle repositioning, pricing, tolling, NFD clustering
- {zone_system_name} denotes the name of a GIS zone division
- in the {network_name} subdirectory, the GIS data are matched to an existing network
- definition of respective file formats in this wiki under
3_input_data/Data Specification/
- data structure:
data/zones/
data/zones/{zone_system_name}/
data/zones/{zone_system_name}/general_information.csv
data/zones/{zone_system_name}/polygon_definition.geojson
data/zones/{zone_system_name}/crs.info
data/zones/{zone_system_name}/{network_name}/
data/zones/{zone_system_name}/{network_name}/node_zone_info.csv
data/zones/{zone_system_name}/{network_name}/edge_zone_info.csv
- {data_title} should be a name reflecting the data source
- raw data and scripts that reduce them to an unmatched trip format (see specification for unmatched trip data) should also remain on the server for clarity
data/
data/demand/
data/demand/{data_title}/
data/demand/{data_title}/raw
- the script matching trip data to a given network {network_name} can be found in src/demand/pp/
- see in this wiki under
3_input_data/Data Specification/trips
for a format specification of trips_X.csv (where "X" can be replaced with any title given to the trip file) - data structure:
data/demand/{data_title}/matched/
data/demand/{data_title}/matched/{network_name}/
data/demand/{data_title}/matched/{network_name}/trips_X.csv
- {zone_system_name} refers to the name of a zone-system definition
- {temporal_resolution} refers to the time aggregation given in "hh_mm"
- different forecasts methods are saved as different columns; "trips" refers to a perfect forecast for the given spatio-temporal resolution
- see in this wiki under
3_input_data/Data Specification/agg_X.csv
for a format specification of agg_X.csv (where "X" can be replaced with any title given to the forecast file) - data structure:
data/demand/{data_title}/aggregated/
data/demand/{data_title}/aggregated/{zone_system_name}/
data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}
data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}/agg_{X}.csv
data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}/agg_od_{X}.csv
- saving vehicle data on the server reduces the time to research for new studies
- specification in this wiki under
3_input_data/Data Specification/vehicle_type.csv
data/vehicles/
data/vehicles/EV_type1_20200411.csv
data/vehicles/EV_type2_20200411.csv
- infrastructure data can be used to add additional information to certain network nodes, e.g. access points for customers, boarding points, charging stations, depots, parking spaces
- the format of these data files are specified in the next Section
Data Specification
- {gis_name} reflects the spatial area of the data and in this directory, all data are referenced by coordinates
- if the data are stored in another reference system then WGS84, a crs.info is saved as well
data/infra/
data/infra/{gis_name}
data/infra/{gis_name}/crs.info
data/infra/{gis_name}/access_points.geojson
data/infra/{gis_name}/boarding_points.geojson
data/infra/{gis_name}/public_charging_stations.geojson
data/infra/{gis_name}/depots.geojson
- After matching to a network, the respective files are saved in csv file in the {network_name} directory subdirectory
data/infra/{gis_name}/{network_name}
data/infra/{gis_name}/{network_name}/access_points.csv
data/infra/{gis_name}/{network_name}/boarding_points.csv
data/infra/{gis_name}/{network_name}/public_charging_stations.csv
data/infra/{gis_name}/{network_name}/depots.csv
tbc
- can be used for simulations with flexible fleet size, where fleet size is time controlled
- specification in this wiki under
3_input_data/Data Specification/active_vehicles.csv
data/fleetctrl/elastic_fleet_size/
data/fleetctrl/elastic_fleet_size/active_vehicle_sample.csv
- can be used to specify the vehicle distribution of the initially created vehicles
- specification in this wiki under
3_input_data/Data Specification/init_veh_dist.csv
- {network_name} corresponds to the network the nodes of the init distribution are matched onto
data/fleetctrl/initial_vehicle_distribution/
data/fleetctrl/initial_vehicle_distribution/{network_name}/init_veh_dist.csv
- can be used to define time dependent elastic pricing or utilization dependent pricing
- {pricing_file} corresponds to the name of the applied pricing_file [possible scenario input]
data/fleetctrl/elastic_pricing/{pricing_file}.csv
- TO-DO
tbc
Mandatory attributes with explanation of data types.
Column Name | Data Type | Description |
---|---|---|
node_index | int | index of node in network which is a access point (BEWARE: should be unique!) |
Definition: access points are all network nodes where customers are allowed to enter/leave the simulation. in most cases, these location are not necessary for the simulation itself, but might be helpfull to create demand files.
Mandatory attributes with explanation of data types.
Column Name | Data Type | Description |
---|---|---|
time | int | simulation time in seconds |
share_active_fleet_size | float | share of fleet that should be active at this time. |
Mandatory attributes with the explanation of data types. {fc method} has to be specified in the scenario inputs with the variable G_FC_TYPE
Column Name | Data Type | Description |
---|---|---|
time | int | simulation time in seconds |
zone_id | int | index of zone in zone system (BEWARE: should be unique!) |
out {fc method} | float | number of outgoing trips for forecast method {fc method} |
in {fc method} | float | number of incoming trips for forecast method {fc method} |
The following forecasts methods are defined until now:
fc method | Description |
---|---|
perfect_trips | these "forecasts" are generated from actual aggregation of trip files, thereby making them perfect forecast for the number of trips with respect to the chosen resolution |
perfect_pax | these "forecasts" are generated from actual aggregation of trip files; instead of aggregating the number of trips, the number of passengers are aggregated, though |
Column Name | Data Type | Description |
---|---|---|
time | int | simulation time in seconds |
out_zone_id | int | index of zone in zone system (BEWARE: should be unique!) |
in_zone_id | int | index of zone in zone system (BEWARE: should be unique!) |
{fc_method} | float | number of trips from out_zone_id to in_zone_id for forecast method {fc method} |
The following forecasts methods are defined until now:
fc method | Description |
---|---|
perfect_trips | these "forecasts" are generated from actual aggregation of trip files, thereby making them perfect forecast for the number of trips with respect to the chosen resolution |
perfect_pax | these "forecasts" are generated from actual aggregation of trip files; instead of aggregating the number of trips, the number of passengers are aggregated, though |
Mandatory attributes with the explanation of data types.
Column Name | Data Type | Description |
---|---|---|
node_index | int | index of node in network which is a boarding point (BEWARE: should be unique!) |
Definition: boarding points are all network nodes where operators can perform boarding processes. if no boarding points are given/treated directly, mostly all network nodes are considered boarding points.
tbc
Mandatory attributes with explanation of data types.
Column Name | Data Type | Description |
---|---|---|
charging_station_id | int | unique identifier for each depot |
node_index | int | index of node in network (BEWARE: should be unique!) |
charging_units | dict_str | power1:number1;power2:number2 |
max_nr_parking | int | maximum number of vehicles that can park |
- vehicle locations and utilization at the end of a simulation period (and possible the start of the next is recorded)
- the vehicles are already positioned to the final vehicle location; if the vehicle remains on the middle of a link, it is positioned at the start node of this link
- the vehicles are simply counted as blocked
- attribute fields:
column_name | data_type | comment |
---|---|---|
operator_id | int | |
vehicle_id | int | |
final_node_index | int | |
final_time | int | in seconds; remember to calculate modulo 24*3600 to not block the vehicle for the full next day |
final_soc | float |
- initial random distribution of vehicle locations after simulation init
- specifies node indices and their corresponding random probability when initializing mod fleet locations
- attribute fields:
column_name | data_type | comment |
---|---|---|
node_index | int | |
probability | float | probability of choosing this node for a vehicle's inititial location |
A routable network consists of nodes and edges. Vehicles travel along edges, which contain the travel information and nodes are the connections between these edges and represent the positions in the network, where different edges can be chosen as next part of the route. Hence, they usually represent junctions of a street network.
- IMPORTANT: the network definition assumes that node indices are numbered from 0..|N-1|!
In the following, the columns of the network data files are described. Please refer to documentation/Data_Directory_Structure.md for the correct placement of the respective files.
Column Name | Data Type | Description |
---|---|---|
node_index | int | ID of node |
is_stop_only | bool | False: normal node; True: node can only be used as first or last part of a route leg |
pos_x | float | x-position in projected coordinate system > unit: meters |
pos_y | float | y-position in projected coordinate system > unit: meters |
Column Name | Data Type | Description |
---|---|---|
node_order | int | only required for contraction hierarchy |
Column Name | Data Type | Description |
---|---|---|
from_node | int | ID of origin node of a street edge |
to_node | int | ID of destination node of a street edge |
distance | float | length of street edge in meters |
travel_time | float | travel duration on street edge in seconds |
Column Name | Data Type | Description |
---|---|---|
shortcut_def | str | only for contraction hierarchy; use “;” as separator between IDs |
source_edge_id | str | OSM-Edge ID, Aimsun Section ID; can be "-" separated elements as well |
epsg:code
- This file only contains one line and contains the epsg-code 'code', which is valid for the pos_x, pos_y in the nodes.csv.
This file specifies edge travel times at specific simulation times.
Column Name | Data Type | Description |
---|---|---|
from_node | int | ID of origin node of a street edge |
to_node | int | ID of destination node of a street edge |
edge_tt | float | travel duration on street edge in seconds |
Fully preprocessed (according to the fastest route) node-to-node travel time or distance tables are saved as 2D-Numpy arrays. The first index (row index) represents the origin node, the second index represents the destination node. The data entries (travel time/distance) are of type float. These files are saved under scenario_dir/tables/x.npy, where scenario_dir=ff for free-flow conditions.
These files can be used
- to define loading of corresponding travel time files at given simulation time (column "travel_time_folder")
or(!)
- to scale all network travel times with certain factors according to the simulation time. This input is used by the 'NetworkTTMatrix' module. (column "travel_time_factor")
Column Name | Data Type | Description |
---|---|---|
simulation_time | int | simulation time in seconds |
travel_time_folder | str | corresponding folder name of travel time directory to be used from this simulation time on |
travel_time_factor | float | general travel time factor that is used for complete network |
- IMPORTANT: only one of the columns travel_time_folder/travel_time_factor is allowed to be given!
Partially preprocessed (according to the fastest route) node-to-node travel time or distance tables are saved as 2D-Numpy arrays.
The first index (row index) represents the origin node, the second index represents the destination node. The data entries (travel time/distance) are of type float.
The travel time matrix is called tt_matrix.npy, the distance matrix is called dis_matrix.npy
These files are saved in the corresponding travel time folders; free-flow condition is stored in the base-folder. Note that only the travel times/distances between the first x nodes are stored. x is defined by the shape of the matrix.
These matrices are used by NetworkPartiallyPreprocessed.py and NetworkPartiallyPreprocessedCpp.py.
- this file specifies the functionality used for dynamic pricing
- the name of the file can be adopted
- depending on the application there can be two versions of the pricing file
- either a time dependent pricing_file (given with the global "op_elastic_price_file")
- or a utilization dependent pricing_file (given with the global "op_util_surge_price_file")
- attribute fields of time dependent pricing file:
column_name | data_type | comment |
---|---|---|
time | int | start time of this pricing regime |
base_fare_factor | float | factor of the base_fare in this pricing regime |
distance_fare_factor | float | factor of the distance_fare in this pricing regime |
general_factor | float | global price factor in this pricing regime |
- attribute fields of utilization dependent pricing file:
column_name | data_type | comment |
---|---|---|
utilization | float | start utilization [0, 1] for this pricing regime |
base_fare_factor | float | factor of the base_fare in this pricing regime |
distance_fare_factor | float | factor of the distance_fare in this pricing regime |
general_factor | float | global price factor in this pricing regime |
Mandatory attributes with explanation of data types. Additional attributes can be added as long column-names of mandatory attributes are maintained.
Column Name | Data Type | Description |
---|---|---|
charging_station_id | int | unique identifier for each depot |
node_index | int | index of node in network (BEWARE: should be unique!) |
charging_units | dict_str | power1:number1;power2:number2 |
public_util | dict_str | hour1:util1;hour2:util2 |
unit of power : kW
Mandatory attributes with explanation data types. Additional attributes can be added as long column-names of mandatory attributes are maintained.
Column Name | Data Type | Description |
---|---|---|
request_id | int | unique identifier for each request |
rq_time | int | time (s) a requests gets active/visible for the system |
start | int | index of origin node in network |
end | int | index of destination node in network |
Additional optional attributes, which can be used to model heterogeneous demand.
Column Name | Data Type | Description |
---|---|---|
latest_arrival_time | int | time a rq must/wants to reach destination; not yet implemented! |
rq_type | str/int (TODO) | specifies e.g. the type of service a request wants to use |
earliest_pickup_time | int | for reservation |
latest_pickup_time | int | latest time for pick-up |
latest_decision_time | int | latest time for request to decide for an operator before leaving system |
max_rel_detour | int | maximum relative detour in percent |
max_fare | int | maximum fare a request is willing to pay (in cent) |
global trip destination | int (TODO) | for intermodal trips (mod-destination vs trip-destination) |
rq_preferences | (TODO) | other parameters for mode choice |
number_passenger | int | passengers within one request |
- fleet costs depend on daily and per km costs of vehicles
- range of vehicles depend on battery size and consumption
parameter_name | data_type | comment |
---|---|---|
vtype_name_full | str | |
daily_fix_cost [cent] | int | Value in Cent to work with integers |
per_km_cost [cent] | int | Value in Cent to work with integers |
battery_size [kWh] | float | |
range [km] | float | |
source | str | Url and Url-Date for documentation in paper |