Data Foundation - Fleet-Analytics-Dashboard/Application GitHub Wiki

Table of Contents

1. Summary
2. Batch data
3. Real-time data

1. Summary

For our dashboard, we decided to focus on generating business value for the potential customer. Therefore, we wanted to enable the fleet manager to quickly get an overview of what is going on within his fleet. That is why we decided against the additional difficulty of implementing real-time data and instead use the NREL Fleet DNA Data set. The main reasons for our decision where the detailed information on acceleration and deceleration for each vehicle, for each drive. We hoped to get information on the driving behaviour and therefore the wear on the truck. For our plans with machine learning algorithms, the size of the data frame was also a significant factor in the decision.

2. Batch data

Processing and analysing blocks of data that have already been stored over a period of time. For example, all the transaction that have been performed by a major financial firm in a week. This data contains millions of records for a day that can be stored as a file. Tools for processing data in batches are for example Hadoop or MapReduce.

Reference: Batch Processing vs Stream Processing

2.1. NREL Fleet DNA project Truck Platooning Data

This data represents fleet DNA data collected between 07/2008 to 07/2014 by NREL. Commercial fleet vehicle operating data including vehicle speed distribution, starts, stops, braking, idling time, kinetic intensity, etc., are collected from 486 vehicles over a span of 4705 days.

The raw data can be found here: NREL Fleet DNA project Truck Platooning Data

Truck Platooning Explanation

Truck platooning is the linking of two or more trucks in convoy, using connectivity technology and automated driving support systems. These vehicles automatically maintain a set, close distance between each other when they are connected for certain parts of a journey, for instance on motorways. The truck at the head of the platoon acts as the leader, with the vehicles behind reacting and adapting to changes in its movement – requiring little to no action from drivers.

Resource: Truck platooning explanation

Data Set Explanation

Information regarding the generation of the Dataset can be found here: Fleet DNA Project – Data Dictionary for Public Download Files

Overview of all attributes and rough summary can be viewed in this excel file: NREL data overview

Findings

Every column represents one day for one vehicle + driver
Everything in American units
No explanation for the calculation of some values
Speed analysis
- Time and distance spend in different speed ranges
- Analysis for the fastest and most efficient route
- Reduce downtimes
- Speed in miles per hour
Acceleration and deceleration events
- Feet per second squared
- Acceleration/deceleration events per mile
- Reduce for less fuel consumption
Number of stops
- Filtered by duration of stops
- Stops per mile
Vehicle elevation
- Elevation gained/lost
- Height change on the track
- Could be used to optimise the route and fuel consumption
Power density
- Kinetic power density
- Aerodynamic power density
- Rolling power density
- Potential energy
Comparison with road data
- Different category of roads
  - C1 (80 mph + speed limit)
  - C2 (70 - 80 mph speed limit)
  - C3 (60 - 70 mph speed limit)
  - C4 (50 - 60 mph speed limit)
  - C5 (40 - 50 mph speed limit)
- Acceleration, Speed and Distance covered on different categories

2.2. Batch Telematic Truck Data from Tening Njie's Master thesis

2020-05-15_11h10_59

The orange marked columns in the truckdata data frame are only used to calculate the respective conditions. These are then removed
The description of the table and its values is under 1.2.2.

3. Real-time data

Stream processing is a golden key if you want analytics results in real time. Stream processing allows you to feed data into analytics tools as soon as they get generated and get instant analytics results. There are multiple open source stream processing platforms such as Apache Kafka, Apache Flink, Apache Storm, WSO2 Stream Processor. These are possible solutions for doing real-time data analytics are:

Reference: https://medium.com/@gowthamy/big-data-battle-batch-processing-vs-stream-processing-5d94600d8103

2.1. SUMO (Simulation of Urban MObility)

SUMO website SUMO is a traffic simulation package. It is meant to be used to simulate networks of a city's size, but you can of course use it for smaller networks and larger, too, if your computer power is large enough. Usage Examples Since 2001, the SUMO package has been used in the context of several national and international research projects. The applications included:

traffic lights evaluation
route choice and re-routing
evaluation of traffic surveillance methods
simulation of vehicular communications
traffic forecast

3.1.1. "TAPAS Cologne" Scenario

SourceForge repository or TAPAS Cologne Information in SUMO documentation

The TAPAS Cologne scenario is assumed to be one of the largest - if not THE largest - freely available traffic simulation data set. Regarding the scenario size, both the road network and the traffic demand are given in a good quality. Nonetheless, much important information is missing or wrong and much further effort is needed to make the scenario realistic and complete.

3.1.2. Simulation/Output

reference All output files written by SUMO are in XML-format by default. However, with the python tool xml2csv.py you can convert any of them to a flat-file (CSV) format which can be opened with most spread-sheet software. If you need a more compressed but still "standardized" binary version, you can use xml2protobuf.py. Furthermore all files can use a custom binary format which is triggered by the file extension .sbx. Available Outputs:

raw vehicle positions dump: all vehicle positions over time contains: positions and speeds for all vehicles for all simulated time steps used for: obtaining movements of nodes (V2V, for ns-2)
fcd output: Floating Car Data includes name, position, angle and type for every vehicle
trajectories output: Trajectory Data following includes name, position, speed and acceleration for every vehicle following the Amitran standard
surrogate safety measures (SSM): Output of safety related measures, headway, brake rates, etc
vehivle-based information:
- trip information: aggregated information about each vehicle's journey (optionally with emission data)
- vehicle routes information: information about each vehicle's routes over simulation run
- stop output: information about vehicle stops and loading/unloading of persons and containers
- battery usage: information about battery state for electric vehicles

Summary Sumo might be a bit of an overkill for our project since can do way more than we need. We would need some time just for the setup of the simulation even if we decide to use the Cologne scenario. As described above the Data-Output of the Simulation is quite informative and could be very useful for our project. Setting up the Simulation with the Colone Scenario could produce realistic and rich Data-Outputs, if we can master the setup in time.

3.2. CARLA Simulator

CARLA has been developed from the ground up to support development, training, and validation of autonomous driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites, environmental conditions, full control of all static and dynamic actors, maps generation and much more.

CARLA is designed as a client-server system. The server runs and renders the CARLA world. The client provides an interface for users to interact with the simulator by controlling the agent vehicle and certain properties of the simulation. Measurements and sensor readings The client receives from the server the following information about the world and the player’s state:

Player Position: The 3D position of the player with respect to the world coordinate system.
Player Speed: The player’s linear speed in kilometres per hour.
Collision: Cumulative impact from collisions with three different types of objects: cars, pedestrians, or static objects.
Opposite Lane Intersection: The current fraction of the player car footprint that overlaps the opposite lane.
Sidewalk Intersection: The current fraction of the player car footprint that overlaps the sidewalk.
Time: The current in-game time.
Player Acceleration: A 3D vector with the agent’s acceleration with respect to the world coordinate system.
Player Orientation: A unit-length vector corresponding to the agent car orientation.
Sensor readings: The current readings from the set of camera sensors.
Non-Client-Controlled agents information: The positions, orientations and bounding boxes for all pedestrians and cars present in the environment.
Traffic Lights information: The position and state of all traffic lights.
Speed Limit Signs information: Position and readings from all speed limit signs. Reference Page 11

2.3. Low key solution

Import data via cron job every x amount of seconds

2.4. Live Telematic Truck Data from Tening Njie's Master thesis

2020-05-15_11h10_59

The orange marked columns in the truckdata data frame are only used to calculate the respective conditions. These are then removed
The description of the table and its values is under 1.2.2.

3.4.1. Data frame "livedata"

contains all the time slices of the individual trucks in adjusted and prepared form (here we are talking about the datasets "livedata_cleaned" and "livedata_categorized")
represents the basis for the data frame "truckdata", which aggregates various information at the level of the individual trucks

Attributes

Attributes	Description
_id
boxId	identifies the truck (and in our case also the driver, 1:1 relationship)
Position	divided in three attributes: longitude, latitude, type (always with the value "point", not much relevant)
altitude	always with the value 0, not relevant
acceleration
brakePedal	boolean value (1 for using the break pedal and 0 for not using it)
consumption	km driven/amount of fuel used
currentSpeed and lastSpeed	not converted
timeStamp, currentTimeStamp and lastTimeStamp
horizontalAccuracy	values from 1 to 7
verticalAccuracy	single value of 20
humidity	values from 31 to 93 (%)
visibility	single value of 0, not useful
id	second id, still no idea why
pressure	values from 969 to 1033
temperature	values from 5.9 to 11.9 (Celsius degrees)
speed	speed in two form, was converted to km/h; we will use speed_kmH

3.4.2. Data frame "truckdata"

derived from the data frame "livedata"
the aggregation of the data set creates, based on the boxId, a data view that looks at the individual trucks
the following data are decisive for the grouping of the control strategies: speed_km, consumption, acceleration, brakePedal

Attributes

Attributes	Description
boxId	identifies the truck (and in our case also the driver, 1:1 relationship)
brakePedal	no average values can be determined here; also no min or max values, since we have boolean values
	this column is used for the extraction of further information
	for each boxId it is determined how many times the breakPedal was used
	the results are saved in the brakePedal_0 and breakPedal_1
	the brake_ration is the frequency, how often it was the break pedal was used
	The result is saved in the truckdata data frame. The two columns fast_count_ratio and slow_count_ratio are formed by setting the two columns fast_count and slow_count in relation to the column id_count (see image1)
acceleration	min, max values and arithmetic mean
consumption	min, max values and arithmetic mean
speed_km	min, max values and arithmetic mean
	using the speed column we can determine how many times the upper or the lower speed limit was exceeded
	Increased fuel consumption values are recorded when certain speed values are exceeded or not reached
	in the master thesis 20 km / h and 80 km / h were set as exemplary threshold values
	the exceeding of one of the limits is checked using the boxId
Id	the number of lines written per truck is determined for the column Id; This can later be used to set values in relation to the number of time slices written

To sum up, the "truckdata" dataset is the most relevant for controlling decisions. It contains cleaned and converted values that could be helpful in the decision-making process.