Input Data

Data files provide structured inputs, which are necessary to describe e.g., networks, zones, customer demand, vehicles, infrastructure, etc. Data files are loaded along with the modules in the initialization of the simulation (e.g., FleetSimulationBase, the parent class for the core simulation).

The files to be loaded for a simulation can be selected in the scenario config files (see here).

Input data files are stored in the data-folder. Example files can be found there. Generally, data-files are included in git-ignore. In the following the to the corresponding path specifications for the input files are given while under 'Data Specification' the data format of the files is described.

Network Data

{network_name} denotes the title of a network
there are various routing-modules which are based on different preprocessing scripts; the preprocessed data are also saved in the respective network directory
the specification of all network csv and geojson files is given in this wiki under 3_input_data/Data Specification/
each network has to have following mandatory directory and file structure:
data/networks/
data/networks/{network_name}/
data/networks/{network_name}/base/
data/networks/{network_name}/base/nodes.csv
data/networks/{network_name}/base/edges.csv
data/networks/{network_name}/base/nodes_all_infos.geojson
data/networks/{network_name}/base/edges_all_infos.geojson
if the coordinate frame is not WGS84, an additional file states the used reference system
data/networks/{network_name}/base/crs.info
in case network travel times are deterministic, but vary over time, the edge travel times are saved in following structure:
data/networks/{network_name}/{scenario_time}/
data/networks/{network_name}/{scenario_time}/edges_td_att.csv
additionally, the NetworkTable routing module requires fastest node-to-node travel time and distance tables for each travel time directory
data/networks/{network_name}/ff/
data/networks/{network_name}/ff/tables/
data/networks/{network_name}/ff/tables/nn_fastest_distance.npy
data/networks/{network_name}/ff/tables/nn_fastest_travel_time.npy
data/networks/{network_name}/{scenario_time}/tables/
data/networks/{network_name}/{scenario_time}/tables/nn_fastest_distance.npy
data/networks/{network_name}/{scenario_time}/tables/nn_fastest_travel_time.npy
network dynamics: this file defines loading of time dependent travel time files or(!) travel time factors (optional)

Zone Systems

[optional input data]

spatial aggregation into zones is necessary for several use cases, e.g. vehicle repositioning, pricing, tolling, NFD clustering
{zone_system_name} denotes the name of a GIS zone division
in the {network_name} subdirectory, the GIS data are matched to an existing network
definition of respective file formats in this wiki under 3_input_data/Data Specification/
data structure:
data/zones/
data/zones/{zone_system_name}/
data/zones/{zone_system_name}/general_information.csv
data/zones/{zone_system_name}/polygon_definition.geojson
data/zones/{zone_system_name}/crs.info
data/zones/{zone_system_name}/{network_name}/
data/zones/{zone_system_name}/{network_name}/node_zone_info.csv
data/zones/{zone_system_name}/{network_name}/edge_zone_info.csv

Demand Data

{data_title} should be a name reflecting the data source
raw data and scripts that reduce them to an unmatched trip format (see specification for unmatched trip data) should also remain on the server for clarity
data/
data/demand/
data/demand/{data_title}/
data/demand/{data_title}/raw

Trip/Request/User Data

the script matching trip data to a given network {network_name} can be found in src/demand/pp/
see in this wiki under 3_input_data/Data Specification/trips for a format specification of trips_X.csv (where "X" can be replaced with any title given to the trip file)
data structure:
data/demand/{data_title}/matched/
data/demand/{data_title}/matched/{network_name}/
data/demand/{data_title}/matched/{network_name}/trips_X.csv

Demand Forecast Data

{zone_system_name} refers to the name of a zone-system definition
{temporal_resolution} refers to the time aggregation given in "hh_mm"
different forecasts methods are saved as different columns; "trips" refers to a perfect forecast for the given spatio-temporal resolution
see in this wiki under 3_input_data/Data Specification/agg_X.csv for a format specification of agg_X.csv (where "X" can be replaced with any title given to the forecast file)
data structure:
data/demand/{data_title}/aggregated/
data/demand/{data_title}/aggregated/{zone_system_name}/
data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}
data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}/agg_{X}.csv
data/demand/{data_title}/aggregated/{zone_system_name}/{temporal_resolution}/agg_od_{X}.csv

Vehicle Data

saving vehicle data on the server reduces the time to research for new studies
specification in this wiki under 3_input_data/Data Specification/vehicle_type.csv
data/vehicles/
data/vehicles/EV_type1_20200411.csv
data/vehicles/EV_type2_20200411.csv

Infrastructure Data

infrastructure data can be used to add additional information to certain network nodes, e.g. access points for customers, boarding points, charging stations, depots, parking spaces
the format of these data files are specified in the next Section Data Specification
{gis_name} reflects the spatial area of the data and in this directory, all data are referenced by coordinates
if the data are stored in another reference system then WGS84, a crs.info is saved as well
data/infra/
data/infra/{gis_name}
data/infra/{gis_name}/crs.info
data/infra/{gis_name}/access_points.geojson
data/infra/{gis_name}/boarding_points.geojson
data/infra/{gis_name}/public_charging_stations.geojson
data/infra/{gis_name}/depots.geojson
After matching to a network, the respective files are saved in csv file in the {network_name} directory subdirectory
data/infra/{gis_name}/{network_name}
data/infra/{gis_name}/{network_name}/access_points.csv
data/infra/{gis_name}/{network_name}/boarding_points.csv
data/infra/{gis_name}/{network_name}/public_charging_stations.csv
data/infra/{gis_name}/{network_name}/depots.csv

Public Transportation Data

tbc

Fleet-Control Data

Active Fleet Size Data

can be used for simulations with flexible fleet size, where fleet size is time controlled
specification in this wiki under 3_input_data/Data Specification/active_vehicles.csv
data/fleetctrl/elastic_fleet_size/
data/fleetctrl/elastic_fleet_size/active_vehicle_sample.csv

Initial Vehicle Distribution

can be used to specify the vehicle distribution of the initially created vehicles
specification in this wiki under 3_input_data/Data Specification/init_veh_dist.csv
{network_name} corresponds to the network the nodes of the init distribution are matched onto
data/fleetctrl/initial_vehicle_distribution/
data/fleetctrl/initial_vehicle_distribution/{network_name}/init_veh_dist.csv

Pricing Data

can be used to define time dependent elastic pricing or utilization dependent pricing
{pricing_file} corresponds to the name of the applied pricing_file [possible scenario input]
data/fleetctrl/elastic_pricing/{pricing_file}.csv
TO-DO

Data Collection

tbc

Data specification

access points

access_points.csv

Mandatory attributes with explanation of data types.

Column Name	Data Type	Description
node_index	int	index of node in network which is a access point (BEWARE: should be unique!)

Definition: access points are all network nodes where customers are allowed to enter/leave the simulation. in most cases, these location are not necessary for the simulation itself, but might be helpfull to create demand files.

active vehicles

active_vehicles.csv

Mandatory attributes with explanation of data types.

Column Name	Data Type	Description
time	int	simulation time in seconds
share_active_fleet_size	float	share of fleet that should be active at this time.

agg forecast

agg_X.csv

Mandatory attributes with the explanation of data types. {fc method} has to be specified in the scenario inputs with the variable G_FC_TYPE

Column Name	Data Type	Description
time	int	simulation time in seconds
zone_id	int	index of zone in zone system (BEWARE: should be unique!)
out {fc method}	float	number of outgoing trips for forecast method {fc method}
in {fc method}	float	number of incoming trips for forecast method {fc method}

The following forecasts methods are defined until now:

fc method	Description
perfect_trips	these "forecasts" are generated from actual aggregation of trip files, thereby making them perfect forecast for the number of trips with respect to the chosen resolution
perfect_pax	these "forecasts" are generated from actual aggregation of trip files; instead of aggregating the number of trips, the number of passengers are aggregated, though

agg_od_X.csv

Column Name	Data Type	Description
time	int	simulation time in seconds
out_zone_id	int	index of zone in zone system (BEWARE: should be unique!)
in_zone_id	int	index of zone in zone system (BEWARE: should be unique!)
{fc_method}	float	number of trips from out_zone_id to in_zone_id for forecast method {fc method}

The following forecasts methods are defined until now:

fc method	Description
perfect_trips	these "forecasts" are generated from actual aggregation of trip files, thereby making them perfect forecast for the number of trips with respect to the chosen resolution
perfect_pax	these "forecasts" are generated from actual aggregation of trip files; instead of aggregating the number of trips, the number of passengers are aggregated, though

boarding points

boarding_points.csv

Mandatory attributes with the explanation of data types.

Column Name	Data Type	Description
node_index	int	index of node in network which is a boarding point (BEWARE: should be unique!)

Definition: boarding points are all network nodes where operators can perform boarding processes. if no boarding points are given/treated directly, mostly all network nodes are considered boarding points.

charging events

tbc

depots

depots.csv

Mandatory attributes with explanation of data types.

Column Name	Data Type	Description
charging_station_id	int	unique identifier for each depot
node_index	int	index of node in network (BEWARE: should be unique!)
charging_units	dict_str	power1:number1;power2:number2
max_nr_parking	int	maximum number of vehicles that can park

initial state

init_state.csv

vehicle locations and utilization at the end of a simulation period (and possible the start of the next is recorded)
the vehicles are already positioned to the final vehicle location; if the vehicle remains on the middle of a link, it is positioned at the start node of this link
the vehicles are simply counted as blocked
attribute fields:

column_name	data_type	comment
operator_id	int
vehicle_id	int
final_node_index	int
final_time	int	in seconds; remember to calculate modulo 24*3600 to not block the vehicle for the full next day
final_soc	float

initial vehicle distribution

init_veh_dist.csv

initial random distribution of vehicle locations after simulation init
specifies node indices and their corresponding random probability when initializing mod fleet locations
attribute fields:

column_name	data_type	comment
node_index	int
probability	float	probability of choosing this node for a vehicle's inititial location

network data

A routable network consists of nodes and edges. Vehicles travel along edges, which contain the travel information and nodes are the connections between these edges and represent the positions in the network, where different edges can be chosen as next part of the route. Hence, they usually represent junctions of a street network.

- IMPORTANT: the network definition assumes that node indices are numbered from 0..|N-1|!

In the following, the columns of the network data files are described. Please refer to documentation/Data_Directory_Structure.md for the correct placement of the respective files.

nodes.csv

Necessary Attributes

Column Name	Data Type	Description
node_index	int	ID of node
is_stop_only	bool	False: normal node; True: node can only be used as first or last part of a route leg
pos_x	float	x-position in projected coordinate system > unit: meters
pos_y	float	y-position in projected coordinate system > unit: meters

Optional Attributes

Column Name	Data Type	Description
node_order	int	only required for contraction hierarchy

edges.csv

Necessary Attributes

Column Name	Data Type	Description
from_node	int	ID of origin node of a street edge
to_node	int	ID of destination node of a street edge
distance	float	length of street edge in meters
travel_time	float	travel duration on street edge in seconds

Optional Attributes

Column Name	Data Type	Description
shortcut_def	str	only for contraction hierarchy; use “;” as separator between IDs
source_edge_id	str	OSM-Edge ID, Aimsun Section ID; can be "-" separated elements as well

crs.info

epsg:code

This file only contains one line and contains the epsg-code 'code', which is valid for the pos_x, pos_y in the nodes.csv.

edges_td_att.csv

This file specifies edge travel times at specific simulation times.

Column Name	Data Type	Description
from_node	int	ID of origin node of a street edge
to_node	int	ID of destination node of a street edge
edge_tt	float	travel duration on street edge in seconds

NN_FASTEST_TT.NPY / NN_FASTEST_DISTANCE.NPY

Fully preprocessed (according to the fastest route) node-to-node travel time or distance tables are saved as 2D-Numpy arrays. The first index (row index) represents the origin node, the second index represents the destination node. The data entries (travel time/distance) are of type float. These files are saved under scenario_dir/tables/x.npy, where scenario_dir=ff for free-flow conditions.

Network Dynamics Files

These files can be used

to define loading of corresponding travel time files at given simulation time (column "travel_time_folder")

or(!)

to scale all network travel times with certain factors according to the simulation time. This input is used by the 'NetworkTTMatrix' module. (column "travel_time_factor")

Column Name	Data Type	Description
simulation_time	int	simulation time in seconds
travel_time_folder	str	corresponding folder name of travel time directory to be used from this simulation time on
travel_time_factor	float	general travel time factor that is used for complete network

- IMPORTANT: only one of the columns travel_time_folder/travel_time_factor is allowed to be given!

Partially preprocessed data

Partially preprocessed (according to the fastest route) node-to-node travel time or distance tables are saved as 2D-Numpy arrays.

The first index (row index) represents the origin node, the second index represents the destination node. The data entries (travel time/distance) are of type float.

The travel time matrix is called tt_matrix.npy, the distance matrix is called dis_matrix.npy

These files are saved in the corresponding travel time folders; free-flow condition is stored in the base-folder. Note that only the travel times/distances between the first x nodes are stored. x is defined by the shape of the matrix.

These matrices are used by NetworkPartiallyPreprocessed.py and NetworkPartiallyPreprocessedCpp.py.

pricing

pricing_file.csv

this file specifies the functionality used for dynamic pricing
the name of the file can be adopted
depending on the application there can be two versions of the pricing file
either a time dependent pricing_file (given with the global "op_elastic_price_file")
or a utilization dependent pricing_file (given with the global "op_util_surge_price_file")
attribute fields of time dependent pricing file:

column_name	data_type	comment
time	int	start time of this pricing regime
base_fare_factor	float	factor of the base_fare in this pricing regime
distance_fare_factor	float	factor of the distance_fare in this pricing regime
general_factor	float	global price factor in this pricing regime

attribute fields of utilization dependent pricing file:

column_name	data_type	comment
utilization	float	start utilization [0, 1] for this pricing regime
base_fare_factor	float	factor of the base_fare in this pricing regime
distance_fare_factor	float	factor of the distance_fare in this pricing regime
general_factor	float	global price factor in this pricing regime

public charging stations

public_charging_stations.csv

Mandatory attributes with explanation of data types. Additional attributes can be added as long column-names of mandatory attributes are maintained.

Column Name	Data Type	Description
charging_station_id	int	unique identifier for each depot
node_index	int	index of node in network (BEWARE: should be unique!)
charging_units	dict_str	power1:number1;power2:number2
public_util	dict_str	hour1:util1;hour2:util2

unit of power : kW

trips / requests

trips_X.csv

Mandatory attributes with explanation data types. Additional attributes can be added as long column-names of mandatory attributes are maintained.

Column Name	Data Type	Description
request_id	int	unique identifier for each request
rq_time	int	time (s) a requests gets active/visible for the system
start	int	index of origin node in network
end	int	index of destination node in network

Additional optional attributes, which can be used to model heterogeneous demand.

Column Name	Data Type	Description
latest_arrival_time	int	time a rq must/wants to reach destination; not yet implemented!
rq_type	str/int (TODO)	specifies e.g. the type of service a request wants to use
earliest_pickup_time	int	for reservation
latest_pickup_time	int	latest time for pick-up
latest_decision_time	int	latest time for request to decide for an operator before leaving system
max_rel_detour	int	maximum relative detour in percent
max_fare	int	maximum fare a request is willing to pay (in cent)
global trip destination	int (TODO)	for intermodal trips (mod-destination vs trip-destination)
rq_preferences	(TODO)	other parameters for mode choice
number_passenger	int	passengers within one request

vehicles

vehicle_type.md

fleet costs depend on daily and per km costs of vehicles
range of vehicles depend on battery size and consumption

parameter_name	data_type	comment
vtype_name_full	str
daily_fix_cost [cent]	int	Value in Cent to work with integers
per_km_cost [cent]	int	Value in Cent to work with integers
battery_size [kWh]	float
range [km]	float
source	str	Url and Url-Date for documentation in paper

3_input_data - TUM-VT/FleetPy GitHub Wiki

Input Data

Network Data

Zone Systems

Demand Data

Trip/Request/User Data

Demand Forecast Data

Vehicle Data

Infrastructure Data

Public Transportation Data

Fleet-Control Data

Active Fleet Size Data

Initial Vehicle Distribution

Pricing Data

Data Collection

Data specification

access points

access_points.csv

active vehicles

active_vehicles.csv

agg forecast

agg_X.csv

agg_od_X.csv

boarding points

boarding_points.csv

charging events

depots

depots.csv

initial state

init_state.csv

initial vehicle distribution

init_veh_dist.csv

network data

nodes.csv

Necessary Attributes

Optional Attributes

edges.csv

Necessary Attributes

Optional Attributes

crs.info

edges_td_att.csv

NN_FASTEST_TT.NPY / NN_FASTEST_DISTANCE.NPY

Network Dynamics Files

Partially preprocessed data

pricing

pricing_file.csv

public charging stations

public_charging_stations.csv

trips / requests

trips_X.csv

vehicles

vehicle_type.md

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️