Explore Analytics Service data in a notebook - sedgewickmm18/mmfunctions GitHub Wiki

While Analytics Service offers an Explore Tab to analyse data, it might be worthwhile to employ notebooks to get a deeper understanding of the present data as raw event data is accumulated in DB2 (or alternative SQL database).

The typical analytic python libraries such as pandas and sklearn should have been installed already. Also we need the Analytics Services functions package from its github repository.

For the full notebook see here, for a simple way to run Jupyter locally you could use the Jupyter container found here.

Then start with importing the libraries

import

Specify credentials as follows to connect to Analytics Services API for metadata and DB2

credentials = {
  "tenantId": "AnalyticsServiceDev",
  "as_api_host": "https://api-dev.connectedproducts.internetofthings.ibmcloud.com",
  "as_api_key": "*******************",
  "as_api_token": "******************",
  "db2": {
    "username": "bluadmin",
    "password": "****************",
    "databaseName": "BLUDB",
    "port": 50000,
    "httpsUrl": "https://dashdb-enterprise-yp-dal13-74.services.dal.bluemix.net:50000",
    "host": "dashdb-enterprise-yp-dal13-74.services.dal.bluemix.net"
  }
}

Now open the database

import

and load the last 30 days of data from it

load

Now start looking at what's in it

load

We have, for example, speed, torque and travel_time as metrics and describe() shows count, max, min, mean, stddev etc.

I'm ignoring dimensions - for example to allow investigating data by country or other dimensions - here.

Maybe there is a correlation between torque and speed

plot

Hmm, maybe there is not ...

Same story for travel time and speed

plot

Travel time is surprisingly well distributed - which it actually is since we generated the data as normally distributed.

plot

Reading raw IoT data

This example reads data from an entire day from DB2 into the notebook - for the full notebook please see here.

table = db.get_table("IOT_SIVASENSORTYPE1")
start_ts = dt.datetime.utcnow() - dt.timedelta(days=1)
end_ts = dt.datetime.utcnow()
df = db.read_table(table, None, None, None, "publishedtime", start_ts, end_ts)

The sensors generated almost 7000 events

shape

with temperature, humidity and pressure as main metrics

describe

Maybe temperature and humidity are related; let's have a closer look at their bivariate distribution

bivariate