3_input_data - TUM-VT/FleetPy GitHub Wiki

Input Data

Data files provide structured inputs, which are necessary to describe e.g., networks, zones, customer demand, vehicles, infrastructure, etc. Data files are loaded along with the modules in the initialization of the simulation (e.g., FleetSimulationBase, the parent class for the core simulation).

The files to be loaded for a simulation can be selected in the scenario config files (see here).

Input data files are stored in the data-folder. Example files can be found there. Generally, data-files are included in git-ignore. In the following the to the corresponding path specifications for the input files are given while under 'Data Specification' the data format of the files is described.

Network Data

  • {network_name} denotes the title of a network
  • there are various routing-modules which are based on different preprocessing scripts; the preprocessed data are also saved in the respective network directory
  • the specification of all network csv and geojson files is given in this wiki under 3_input_data/Data Specification/
  • each network has to have following mandatory directory and file structure:
    data/networks/
    data/networks/{network_name}/
    data/networks/{network_name}/base/
    data/networks/{network_name}/base/nodes.csv
    data/networks/{network_name}/base/edges.csv
    data/networks/{network_name}/base/nodes_all_infos.geojson
    data/networks/{network_name}/base/edges_all_infos.geojson
  • if the coordinate frame is not WGS84, an additional file states the used reference system
    data/networks/{network_name}/base/crs.info
  • in case network travel times are deterministic, but vary over time, the edge travel times are saved in following structure:
    data/networks/{network_name}/{scenario_time}/
    data/networks/{network_name}/{scenario_time}/edges_td_att.csv
  • additionally, the NetworkTable routing module requires fastest node-to-node travel time and distance tables for each travel time directory
    data/networks/{network_name}/ff/
    data/networks/{network_name}/ff/tables/
    data/networks/{network_name}/ff/tables/nn_fastest_distance.npy
    data/networks/{network_name}/ff/tables/nn_fastest_travel_time.npy
    data/networks/{network_name}/{scenario_time}/tables/
    data/networks/{network_name}/{scenario_time}/tables/nn_fastest_distance.npy
    data/networks/{network_name}/{scenario_time}/tables/nn_fastest_travel_time.npy
  • network dynamics: this file defines loading of time dependent travel time files or(!) travel time factors (optional)

Zone Systems

[optional input data]

  • spatial aggregation into zones is necessary for several use cases, e.g. vehicle repositioning, pricing, tolling, NFD clustering
  • {zone_system_name} denotes the name of a GIS zone division
  • in the {network_name} subdirectory, the GIS data are matched to an existing network
  • definition of respective file formats in this wiki under 3_input_data/Data Specification/
  • data structure:
    data/zones/
    data/zones/{zone_system_name}/
    data/zones/{zone_system_name}/general_information.csv
    data/zones/{zone_system_name}/polygon_definition.geojson
    data/zones/{zone_system_name}/crs.info
    data/zones/{zone_system_name}/{network_name}/
    data/zones/{zone_system_name}/{network_name}/node_zone_info.csv
    data/zones/{zone_system_name}/{network_name}/edge_zone_info.csv

Demand Data

  • {data_title} should be a name reflecting the data source
  • raw data and scripts that reduce them to an unmatched trip format (see specification for unmatched trip data) should also remain on the server for clarity
    data/
    data/demand/
    data/demand/{data_title}/
    data/demand/{data_title}/raw

Trip/Request/User Data

  • the script matching trip data to a given network {network_name} can be found in src/demand/pp/
  • see in this wiki under 3_input_data/Data Specification/trips for a format specification of trips_X.csv (where "X" can be replaced with any title given to the trip file)
  • data structure:
    data/demand/{data_title}/matched/
    data/demand/{data_title}/matched/{network_name}/
    data/demand/{data_title}/matched/{network_name}/trips_X.csv

Demand Forecast Data

  • {zone_system_name} refers to the name of a zone-system definition
  • {temporal_resolution} refers to the time aggregation given in "hh_mm"
  • different forecasts methods are saved as different columns; "trips" refers to a perfect forecast for the given spatio-temporal resolution
  • see in this wiki under 3_input_data/Data Specification/agg_X.csv for a format specification of agg_X.csv (where "X" can be replaced with any title given to the forecast file)
  • data structure:
    data/demand/{data_title}/aggregated/
    data/demand/{data_title}/aggregated/{zone_system_name}/
    data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}
    data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}/agg_{X}.csv
    data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}/agg_od_{X}.csv

Vehicle Data

  • saving vehicle data on the server reduces the time to research for new studies
  • specification in this wiki under 3_input_data/Data Specification/vehicle_type.csv
    data/vehicles/
    data/vehicles/EV_type1_20200411.csv
    data/vehicles/EV_type2_20200411.csv

Infrastructure Data

  • infrastructure data can be used to add additional information to certain network nodes, e.g. access points for customers, boarding points, charging stations, depots, parking spaces
  • the format of these data files are specified in the next Section Data Specification
  • {gis_name} reflects the spatial area of the data and in this directory, all data are referenced by coordinates
  • if the data are stored in another reference system then WGS84, a crs.info is saved as well
    data/infra/
    data/infra/{gis_name}
    data/infra/{gis_name}/crs.info
    data/infra/{gis_name}/access_points.geojson
    data/infra/{gis_name}/boarding_points.geojson
    data/infra/{gis_name}/public_charging_stations.geojson
    data/infra/{gis_name}/depots.geojson
  • After matching to a network, the respective files are saved in csv file in the {network_name} directory subdirectory
    data/infra/{gis_name}/{network_name}
    data/infra/{gis_name}/{network_name}/access_points.csv
    data/infra/{gis_name}/{network_name}/boarding_points.csv
    data/infra/{gis_name}/{network_name}/public_charging_stations.csv
    data/infra/{gis_name}/{network_name}/depots.csv

Public Transportation Data

tbc

Fleet-Control Data

Active Fleet Size Data

  • can be used for simulations with flexible fleet size, where fleet size is time controlled
  • specification in this wiki under 3_input_data/Data Specification/active_vehicles.csv
    data/fleetctrl/elastic_fleet_size/
    data/fleetctrl/elastic_fleet_size/active_vehicle_sample.csv

Initial Vehicle Distribution

  • can be used to specify the vehicle distribution of the initially created vehicles
  • specification in this wiki under 3_input_data/Data Specification/init_veh_dist.csv
  • {network_name} corresponds to the network the nodes of the init distribution are matched onto
    data/fleetctrl/initial_vehicle_distribution/
    data/fleetctrl/initial_vehicle_distribution/{network_name}/init_veh_dist.csv

Pricing Data

  • can be used to define time dependent elastic pricing or utilization dependent pricing
  • {pricing_file} corresponds to the name of the applied pricing_file [possible scenario input]
    data/fleetctrl/elastic_pricing/{pricing_file}.csv
  • TO-DO

Data Collection

tbc

Data specification

access points

access_points.csv

Mandatory attributes with explanation of data types.

Column Name Data Type Description
node_index int index of node in network which is a access point (BEWARE: should be unique!)

Definition: access points are all network nodes where customers are allowed to enter/leave the simulation. in most cases, these location are not necessary for the simulation itself, but might be helpfull to create demand files.

active vehicles

active_vehicles.csv

Mandatory attributes with explanation of data types.

Column Name Data Type Description
time int simulation time in seconds
share_active_fleet_size float share of fleet that should be active at this time.

agg forecast

agg_X.csv

Mandatory attributes with the explanation of data types. {fc method} has to be specified in the scenario inputs with the variable G_FC_TYPE

Column Name Data Type Description
time int simulation time in seconds
zone_id int index of zone in zone system (BEWARE: should be unique!)
out {fc method} float number of outgoing trips for forecast method {fc method}
in {fc method} float number of incoming trips for forecast method {fc method}

The following forecasts methods are defined until now:

fc method Description
perfect_trips these "forecasts" are generated from actual aggregation of trip files, thereby making them perfect forecast for the number of trips with respect to the chosen resolution
perfect_pax these "forecasts" are generated from actual aggregation of trip files; instead of aggregating the number of trips, the number of passengers are aggregated, though

agg_od_X.csv

Column Name Data Type Description
time int simulation time in seconds
out_zone_id int index of zone in zone system (BEWARE: should be unique!)
in_zone_id int index of zone in zone system (BEWARE: should be unique!)
{fc_method} float number of trips from out_zone_id to in_zone_id for forecast method {fc method}

The following forecasts methods are defined until now:

fc method Description
perfect_trips these "forecasts" are generated from actual aggregation of trip files, thereby making them perfect forecast for the number of trips with respect to the chosen resolution
perfect_pax these "forecasts" are generated from actual aggregation of trip files; instead of aggregating the number of trips, the number of passengers are aggregated, though

boarding points

boarding_points.csv

Mandatory attributes with the explanation of data types.

Column Name Data Type Description
node_index int index of node in network which is a boarding point (BEWARE: should be unique!)

Definition: boarding points are all network nodes where operators can perform boarding processes. if no boarding points are given/treated directly, mostly all network nodes are considered boarding points.

charging events

tbc

depots

depots.csv

Mandatory attributes with explanation of data types.

Column Name Data Type Description
charging_station_id int unique identifier for each depot
node_index int index of node in network (BEWARE: should be unique!)
charging_units dict_str power1:number1;power2:number2
max_nr_parking int maximum number of vehicles that can park

initial state

init_state.csv

  • vehicle locations and utilization at the end of a simulation period (and possible the start of the next is recorded)
  • the vehicles are already positioned to the final vehicle location; if the vehicle remains on the middle of a link, it is positioned at the start node of this link
  • the vehicles are simply counted as blocked
  • attribute fields:
column_name data_type comment
operator_id int
vehicle_id int
final_node_index int
final_time int in seconds; remember to calculate modulo 24*3600 to not block the vehicle for the full next day
final_soc float

initial vehicle distribution

init_veh_dist.csv

  • initial random distribution of vehicle locations after simulation init
  • specifies node indices and their corresponding random probability when initializing mod fleet locations
  • attribute fields:
column_name data_type comment
node_index int
probability float probability of choosing this node for a vehicle's inititial location

network data

A routable network consists of nodes and edges. Vehicles travel along edges, which contain the travel information and nodes are the connections between these edges and represent the positions in the network, where different edges can be chosen as next part of the route. Hence, they usually represent junctions of a street network.

- IMPORTANT: the network definition assumes that node indices are numbered from 0..|N-1|!

In the following, the columns of the network data files are described. Please refer to documentation/Data_Directory_Structure.md for the correct placement of the respective files.

nodes.csv

Necessary Attributes

Column Name Data Type Description
node_index int ID of node
is_stop_only bool False: normal node; True: node can only be used as first or last part of a route leg
pos_x float x-position in projected coordinate system > unit: meters
pos_y float y-position in projected coordinate system > unit: meters

Optional Attributes

Column Name Data Type Description
node_order int only required for contraction hierarchy

edges.csv

Necessary Attributes

Column Name Data Type Description
from_node int ID of origin node of a street edge
to_node int ID of destination node of a street edge
distance float length of street edge in meters
travel_time float travel duration on street edge in seconds

Optional Attributes

Column Name Data Type Description
shortcut_def str only for contraction hierarchy; use “;” as separator between IDs
source_edge_id str OSM-Edge ID, Aimsun Section ID; can be "-" separated elements as well

crs.info

epsg:code

  • This file only contains one line and contains the epsg-code 'code', which is valid for the pos_x, pos_y in the nodes.csv.

edges_td_att.csv

This file specifies edge travel times at specific simulation times.

Column Name Data Type Description
from_node int ID of origin node of a street edge
to_node int ID of destination node of a street edge
edge_tt float travel duration on street edge in seconds

NN_FASTEST_TT.NPY / NN_FASTEST_DISTANCE.NPY

Fully preprocessed (according to the fastest route) node-to-node travel time or distance tables are saved as 2D-Numpy arrays. The first index (row index) represents the origin node, the second index represents the destination node. The data entries (travel time/distance) are of type float. These files are saved under scenario_dir/tables/x.npy, where scenario_dir=ff for free-flow conditions.

Network Dynamics Files

These files can be used

  • to define loading of corresponding travel time files at given simulation time (column "travel_time_folder")

or(!)

  • to scale all network travel times with certain factors according to the simulation time. This input is used by the 'NetworkTTMatrix' module. (column "travel_time_factor")
Column Name Data Type Description
simulation_time int simulation time in seconds
travel_time_folder str corresponding folder name of travel time directory to be used from this simulation time on
travel_time_factor float general travel time factor that is used for complete network
- IMPORTANT: only one of the columns travel_time_folder/travel_time_factor is allowed to be given!

Partially preprocessed data

Partially preprocessed (according to the fastest route) node-to-node travel time or distance tables are saved as 2D-Numpy arrays.

The first index (row index) represents the origin node, the second index represents the destination node. The data entries (travel time/distance) are of type float.

The travel time matrix is called tt_matrix.npy, the distance matrix is called dis_matrix.npy

These files are saved in the corresponding travel time folders; free-flow condition is stored in the base-folder. Note that only the travel times/distances between the first x nodes are stored. x is defined by the shape of the matrix.

These matrices are used by NetworkPartiallyPreprocessed.py and NetworkPartiallyPreprocessedCpp.py.

pricing

pricing_file.csv

  • this file specifies the functionality used for dynamic pricing
  • the name of the file can be adopted
  • depending on the application there can be two versions of the pricing file
  • either a time dependent pricing_file (given with the global "op_elastic_price_file")
  • or a utilization dependent pricing_file (given with the global "op_util_surge_price_file")
  • attribute fields of time dependent pricing file:
column_name data_type comment
time int start time of this pricing regime
base_fare_factor float factor of the base_fare in this pricing regime
distance_fare_factor float factor of the distance_fare in this pricing regime
general_factor float global price factor in this pricing regime
  • attribute fields of utilization dependent pricing file:
column_name data_type comment
utilization float start utilization [0, 1] for this pricing regime
base_fare_factor float factor of the base_fare in this pricing regime
distance_fare_factor float factor of the distance_fare in this pricing regime
general_factor float global price factor in this pricing regime

public charging stations

public_charging_stations.csv

Mandatory attributes with explanation of data types. Additional attributes can be added as long column-names of mandatory attributes are maintained.

Column Name Data Type Description
charging_station_id int unique identifier for each depot
node_index int index of node in network (BEWARE: should be unique!)
charging_units dict_str power1:number1;power2:number2
public_util dict_str hour1:util1;hour2:util2

unit of power : kW

trips / requests

trips_X.csv

Mandatory attributes with explanation data types. Additional attributes can be added as long column-names of mandatory attributes are maintained.

Column Name Data Type Description
request_id int unique identifier for each request
rq_time int time (s) a requests gets active/visible for the system
start int index of origin node in network
end int index of destination node in network

Additional optional attributes, which can be used to model heterogeneous demand.

Column Name Data Type Description
latest_arrival_time int time a rq must/wants to reach destination; not yet implemented!
rq_type str/int (TODO) specifies e.g. the type of service a request wants to use
earliest_pickup_time int for reservation
latest_pickup_time int latest time for pick-up
latest_decision_time int latest time for request to decide for an operator before leaving system
max_rel_detour int maximum relative detour in percent
max_fare int maximum fare a request is willing to pay (in cent)
global trip destination int (TODO) for intermodal trips (mod-destination vs trip-destination)
rq_preferences (TODO) other parameters for mode choice
number_passenger int passengers within one request

vehicles

vehicle_type.md

  • fleet costs depend on daily and per km costs of vehicles
  • range of vehicles depend on battery size and consumption
parameter_name data_type comment
vtype_name_full str
daily_fix_cost [cent] int Value in Cent to work with integers
per_km_cost [cent] int Value in Cent to work with integers
battery_size [kWh] float
range [km] float
source str Url and Url-Date for documentation in paper
⚠️ **GitHub.com Fallback** ⚠️