Explore Analytics Service data in a notebook - sedgewickmm18/mmfunctions GitHub Wiki
While Analytics Service offers an Explore
Tab to analyse data, it might be worthwhile to employ notebooks to get a deeper understanding of the present data as raw event data is accumulated in DB2 (or alternative SQL database).
The typical analytic python libraries such as pandas and sklearn should have been installed already. Also we need the Analytics Services functions package from its github repository.
For the full notebook see here, for a simple way to run Jupyter locally you could use the Jupyter container found here.
Then start with importing the libraries
Specify credentials as follows to connect to Analytics Services API for metadata and DB2
credentials = {
"tenantId": "AnalyticsServiceDev",
"as_api_host": "https://api-dev.connectedproducts.internetofthings.ibmcloud.com",
"as_api_key": "*******************",
"as_api_token": "******************",
"db2": {
"username": "bluadmin",
"password": "****************",
"databaseName": "BLUDB",
"port": 50000,
"httpsUrl": "https://dashdb-enterprise-yp-dal13-74.services.dal.bluemix.net:50000",
"host": "dashdb-enterprise-yp-dal13-74.services.dal.bluemix.net"
}
}
Now open the database
and load the last 30 days of data from it
Now start looking at what's in it
We have, for example, speed, torque and travel_time as metrics and describe() shows count, max, min, mean, stddev etc.
I'm ignoring dimensions - for example to allow investigating data by country or other dimensions - here.
Maybe there is a correlation between torque and speed
Hmm, maybe there is not ...
Same story for travel time and speed
Travel time is surprisingly well distributed - which it actually is since we generated the data as normally distributed.
Reading raw IoT data
This example reads data from an entire day from DB2 into the notebook - for the full notebook please see here.
table = db.get_table("IOT_SIVASENSORTYPE1")
start_ts = dt.datetime.utcnow() - dt.timedelta(days=1)
end_ts = dt.datetime.utcnow()
df = db.read_table(table, None, None, None, "publishedtime", start_ts, end_ts)
The sensors generated almost 7000 events
with temperature, humidity and pressure as main metrics
Maybe temperature and humidity are related; let's have a closer look at their bivariate distribution