Data Foundation - Fleet-Analytics-Dashboard/Application GitHub Wiki
Table of Contents
1. Summary
For our dashboard, we decided to focus on generating business value for the potential customer. Therefore, we wanted to enable the fleet manager to quickly get an overview of what is going on within his fleet. That is why we decided against the additional difficulty of implementing real-time data and instead use the NREL Fleet DNA Data set. The main reasons for our decision where the detailed information on acceleration and deceleration for each vehicle, for each drive. We hoped to get information on the driving behaviour and therefore the wear on the truck. For our plans with machine learning algorithms, the size of the data frame was also a significant factor in the decision.
2. Batch data
Processing and analysing blocks of data that have already been stored over a period of time. For example, all the transaction that have been performed by a major financial firm in a week. This data contains millions of records for a day that can be stored as a file. Tools for processing data in batches are for example Hadoop or MapReduce.
Reference: Batch Processing vs Stream Processing
2.1. NREL Fleet DNA project Truck Platooning Data
This data represents fleet DNA data collected between 07/2008 to 07/2014 by NREL. Commercial fleet vehicle operating data including vehicle speed distribution, starts, stops, braking, idling time, kinetic intensity, etc., are collected from 486 vehicles over a span of 4705 days.
The raw data can be found here: NREL Fleet DNA project Truck Platooning Data
Truck Platooning Explanation
Truck platooning is the linking of two or more trucks in convoy, using connectivity technology and automated driving support systems. These vehicles automatically maintain a set, close distance between each other when they are connected for certain parts of a journey, for instance on motorways. The truck at the head of the platoon acts as the leader, with the vehicles behind reacting and adapting to changes in its movement – requiring little to no action from drivers.
Resource: Truck platooning explanation
Data Set Explanation
Information regarding the generation of the Dataset can be found here: Fleet DNA Project – Data Dictionary for Public Download Files
Overview of all attributes and rough summary can be viewed in this excel file: NREL data overview
Findings
-
Every column represents one day for one vehicle + driver
-
Everything in American units
-
No explanation for the calculation of some values
-
Speed analysis
- Time and distance spend in different speed ranges
- Analysis for the fastest and most efficient route
- Reduce downtimes
- Speed in miles per hour
-
Acceleration and deceleration events
- Feet per second squared
- Acceleration/deceleration events per mile
- Reduce for less fuel consumption
-
Number of stops
- Filtered by duration of stops
- Stops per mile
-
Vehicle elevation
- Elevation gained/lost
- Height change on the track
- Could be used to optimise the route and fuel consumption
-
Power density
- Kinetic power density
- Aerodynamic power density
- Rolling power density
- Potential energy
-
Comparison with road data
- Different category of roads
- C1 (80 mph + speed limit)
- C2 (70 - 80 mph speed limit)
- C3 (60 - 70 mph speed limit)
- C4 (50 - 60 mph speed limit)
- C5 (40 - 50 mph speed limit)
- Acceleration, Speed and Distance covered on different categories
- Different category of roads
2.2. Batch Telematic Truck Data from Tening Njie's Master thesis
- The orange marked columns in the truckdata data frame are only used to calculate the respective conditions. These are then removed
- The description of the table and its values is under 1.2.2.
3. Real-time data
Stream processing is a golden key if you want analytics results in real time. Stream processing allows you to feed data into analytics tools as soon as they get generated and get instant analytics results. There are multiple open source stream processing platforms such as Apache Kafka, Apache Flink, Apache Storm, WSO2 Stream Processor. These are possible solutions for doing real-time data analytics are:
Reference: https://medium.com/@gowthamy/big-data-battle-batch-processing-vs-stream-processing-5d94600d8103
2.1. SUMO (Simulation of Urban MObility)
SUMO website SUMO is a traffic simulation package. It is meant to be used to simulate networks of a city's size, but you can of course use it for smaller networks and larger, too, if your computer power is large enough. Usage Examples Since 2001, the SUMO package has been used in the context of several national and international research projects. The applications included:
- traffic lights evaluation
- route choice and re-routing
- evaluation of traffic surveillance methods
- simulation of vehicular communications
- traffic forecast
3.1.1. "TAPAS Cologne" Scenario
SourceForge repository or TAPAS Cologne Information in SUMO documentation
The TAPAS Cologne scenario is assumed to be one of the largest - if not THE largest - freely available traffic simulation data set. Regarding the scenario size, both the road network and the traffic demand are given in a good quality. Nonetheless, much important information is missing or wrong and much further effort is needed to make the scenario realistic and complete.
3.1.2. Simulation/Output
reference All output files written by SUMO are in XML-format by default. However, with the python tool xml2csv.py you can convert any of them to a flat-file (CSV) format which can be opened with most spread-sheet software. If you need a more compressed but still "standardized" binary version, you can use xml2protobuf.py. Furthermore all files can use a custom binary format which is triggered by the file extension .sbx. Available Outputs:
- raw vehicle positions dump: all vehicle positions over time contains: positions and speeds for all vehicles for all simulated time steps used for: obtaining movements of nodes (V2V, for ns-2)
- fcd output: Floating Car Data includes name, position, angle and type for every vehicle
- trajectories output: Trajectory Data following includes name, position, speed and acceleration for every vehicle following the Amitran standard
- surrogate safety measures (SSM): Output of safety related measures, headway, brake rates, etc
- vehivle-based information:
- trip information: aggregated information about each vehicle's journey (optionally with emission data)
- vehicle routes information: information about each vehicle's routes over simulation run
- stop output: information about vehicle stops and loading/unloading of persons and containers
- battery usage: information about battery state for electric vehicles
Summary Sumo might be a bit of an overkill for our project since can do way more than we need. We would need some time just for the setup of the simulation even if we decide to use the Cologne scenario. As described above the Data-Output of the Simulation is quite informative and could be very useful for our project. Setting up the Simulation with the Colone Scenario could produce realistic and rich Data-Outputs, if we can master the setup in time.
3.2. CARLA Simulator
CARLA has been developed from the ground up to support development, training, and validation of autonomous driving systems. In addition to open-source code and protocols, CARLA provides open digital assets (urban layouts, buildings, vehicles) that were created for this purpose and can be used freely. The simulation platform supports flexible specification of sensor suites, environmental conditions, full control of all static and dynamic actors, maps generation and much more.
CARLA is designed as a client-server system. The server runs and renders the CARLA world. The client provides an interface for users to interact with the simulator by controlling the agent vehicle and certain properties of the simulation. Measurements and sensor readings The client receives from the server the following information about the world and the player’s state:
- Player Position: The 3D position of the player with respect to the world coordinate system.
- Player Speed: The player’s linear speed in kilometres per hour.
- Collision: Cumulative impact from collisions with three different types of objects: cars, pedestrians, or static objects.
- Opposite Lane Intersection: The current fraction of the player car footprint that overlaps the opposite lane.
- Sidewalk Intersection: The current fraction of the player car footprint that overlaps the sidewalk.
- Time: The current in-game time.
- Player Acceleration: A 3D vector with the agent’s acceleration with respect to the world coordinate system.
- Player Orientation: A unit-length vector corresponding to the agent car orientation.
- Sensor readings: The current readings from the set of camera sensors.
- Non-Client-Controlled agents information: The positions, orientations and bounding boxes for all pedestrians and cars present in the environment.
- Traffic Lights information: The position and state of all traffic lights.
- Speed Limit Signs information: Position and readings from all speed limit signs. Reference Page 11
2.3. Low key solution
Import data via cron job every x amount of seconds
2.4. Live Telematic Truck Data from Tening Njie's Master thesis
- The orange marked columns in the truckdata data frame are only used to calculate the respective conditions. These are then removed
- The description of the table and its values is under 1.2.2.
3.4.1. Data frame "livedata"
- contains all the time slices of the individual trucks in adjusted and prepared form (here we are talking about the datasets "livedata_cleaned" and "livedata_categorized")
- represents the basis for the data frame "truckdata", which aggregates various information at the level of the individual trucks
Attributes
Attributes | Description |
---|---|
_id | |
boxId | identifies the truck (and in our case also the driver, 1:1 relationship) |
Position | divided in three attributes: longitude, latitude, type (always with the value "point", not much relevant) |
altitude | always with the value 0, not relevant |
acceleration | |
brakePedal | boolean value (1 for using the break pedal and 0 for not using it) |
consumption | km driven/amount of fuel used |
currentSpeed and lastSpeed | not converted |
timeStamp, currentTimeStamp and lastTimeStamp | |
horizontalAccuracy | values from 1 to 7 |
verticalAccuracy | single value of 20 |
humidity | values from 31 to 93 (%) |
visibility | single value of 0, not useful |
id | second id, still no idea why |
pressure | values from 969 to 1033 |
temperature | values from 5.9 to 11.9 (Celsius degrees) |
speed | speed in two form, was converted to km/h; we will use speed_kmH |
3.4.2. Data frame "truckdata"
- derived from the data frame "livedata"
- the aggregation of the data set creates, based on the boxId, a data view that looks at the individual trucks
- the following data are decisive for the grouping of the control strategies: speed_km, consumption, acceleration, brakePedal
Attributes
Attributes | Description |
---|---|
boxId | identifies the truck (and in our case also the driver, 1:1 relationship) |
brakePedal | no average values can be determined here; also no min or max values, since we have boolean values |
this column is used for the extraction of further information | |
for each boxId it is determined how many times the breakPedal was used | |
the results are saved in the brakePedal_0 and breakPedal_1 | |
the brake_ration is the frequency, how often it was the break pedal was used | |
The result is saved in the truckdata data frame. The two columns fast_count_ratio and slow_count_ratio are formed by setting the two columns fast_count and slow_count in relation to the column id_count (see image1) | |
acceleration | min, max values and arithmetic mean |
consumption | min, max values and arithmetic mean |
speed_km | min, max values and arithmetic mean |
using the speed column we can determine how many times the upper or the lower speed limit was exceeded | |
Increased fuel consumption values are recorded when certain speed values are exceeded or not reached | |
in the master thesis 20 km / h and 80 km / h were set as exemplary threshold values | |
the exceeding of one of the limits is checked using the boxId | |
Id | the number of lines written per truck is determined for the column Id; This can later be used to set values in relation to the number of time slices written |
To sum up, the "truckdata" dataset is the most relevant for controlling decisions. It contains cleaned and converted values that could be helpful in the decision-making process.