Data understanding - Kennu76/DataMiningProject GitHub Wiki
Gathering data
Data requirements
For this project and its goals we require the following data:
- GPS data (localization data) of robots
- Accelometer data of robots
- Gyroscope data of robots
- Magnetometer data of robots
Data availablility
Data has been provided by Starship Technology through a protected SSH connection.
Selection criteria
For this project we require only data of robots that travelled in the Tallinn area. From the localization data we require all of it (the data here already describes robots that only travelled in Tallinn). From the sensor data we take rows about robots that are featured in the localization data. From the sensor data which is in JSON we need data of the fields botid, time, orientation_delta, accel_vec and gyroscope data,
Describing data
The data provided by Starship is in two parts: the localisation dataset and the sensor data. Localisation data is from bots in Tallinn with 5 million rows and 6 columns. It describes the GPS coordinates of a bot in a point in time. The dataset is about 278 MB in size.
Example row with header: botid,timestamp,coordinates_long,coordinates_lat,heading,stdev 6E5,1499097601.454,24.663108,59.397973,-0.716641,0.027637 The sensor data is bot telemetry in JSON format, recorded about every .04 seconds. It is in JSON format and the size of the dataset is 100 GB. It contains accelerometer, orientation, magnetometer data along the x, y and z axis. The telemetry also contains raw byte dumps, which are probably not relevant for our project.
Example row: "meta": {"botid":"6D80","secs":1498913579,"nsecs":917155995}, "data":{"stamp":{"secs":1498913579,"nsecs":901259179},"orientation_delta":{"x":4.231929779052734e-06,"y":6.126239895820618e-06,"z":-2.557411789894104e-06,"w":1},"accel_vec":{"x":-0.004735172260552645,"y":0.007237337529659271,"z":-1.00299870967865},"magnetometer_azimuth":2.708369493484497,"magnetometer_azimuth_updated":true,"standstill_detected":true,"gyro_model_name":"mpu6050","nr_of_active_gyros":2,"estimated_gyro_stdev_per_sec":6.170670530991629e-05,"estimated_gyro_systematic_error_stdev_per_sec":2.801343180180993e-05,"estimated_gyro_sensitivity_error_stdev":0.005236493423581123,"gyro_iio_bytes":{"layout":{"dim":[],"data_offset":0},"data":[]},"accel_iio_bytes":{"layout":{"dim":[],"data_offset":0},"data":[]},"magnetometer_iio_bytes":{"layout":{"dim":[],"data_offset":0},"data":[235,255,63,0,239,255]},"invensense_gyro_iio_bytes":{"layout":{"dim":[],"data_offset":0},"data":[150,15,229,111,2,0,128,0,64,178,68,158,253,118,244,33,30,60,149,245,53,205,20,0,32,217,255,13,0,219,255,33,30,60,149,245,53,205,20,0,64,102,68,132,253,100,244,72,195,212,149,245,53,205,20,0,32,218,255,13,0,220,255,72,195,212,149,245,53,205,20,0,64,150,68,194,253,114,244,124,104,109,150,245,53,205,20,0,32,218,255,14,0,219,255,124,104,109,150,245,53,205,20,0,64,102,68,150,253,122,244,171,13,6,151,245,53,205,20,0,32,219,255,13,0,220,255,171,13,6,151,245,53,205,20,3,0,128,0,64,100,197,166,253,20,240,255,230,15,149,245,53,205,20,0,32,213,255,255,255,201,255,255,230,15,149,245,53,205,20,0,64,136,197,214,253,82,240,57,82,168,149,245,53,205,20,0,32,213,255,255,255,200,255,57,82,168,149,245,53,205,20,0,64,104,197,146,253,2,240,104,189,64,150,245,53,205,20,0,32,213,255,255,255,200,255,104,189,64,150,245,53,205,20,0,64,136,197,188,253,254,239,134,40,217,150,245,53,205,20,0,32,213,255,254,255,200,255,134,40,217,150,245,53,205,20]},"reference_gyro_iio_bytes":{"layout":{"dim":[],"data_offset":0},"data":[]}}}
Exploring data
Data may not include some regions of Tallinn Data preperation: · filter out bots that are not in the localization data · change into normal dataframe · merge with localization data · merge by finding the closest nanosecond time of localization data and sensor data,
Verifying data quality
Data is quite good and qualified because we the raw data from the source