Home - JohnTigue/idots GitHub Wiki

This is the start page of the wiki for The IDOTS Project, the artifacts of which are archived in this git repository. IDOTS is an acronym for Infectious Disease Outbreak Time Series.

The goal of this project is to enable the expression of infectious disease outbreak time series data in a Web-native, machine readable fashion. The intent is to make outbreak time series data become first class objects of the Web, via encodings based on the Web's native data standards such as JSON and CSVW. In other words, the goal is to define a Web-friendly data interchange mechanism for IDOTS based on CSV files. For further elaboration of the project goals, see Goals.

Introduction

The core objective of this project is to make infectious disease outbreak data a first-class object type in the World Wide Web (the Web). The "first-class" distinction implies data in the Web, rather than simply through the Web: data expressed via Web-native standards that work well both "on line" and "off line."

This project has two main deliverables:

A web-friendly, machine-readable data standard for infectious disease outbreak time series, known as Outbreak Time Series Specification
Open source software which visualizes such time series data

To further the goal of expressing infectious disease outbreak data via Web-native standards, this project is producing Web software which visualizes outbreak data found on the web. The best example of that is the Omolumeter, which is deployed live at http://omolumeter.com. The Outbreak Time Series Specification (the Spec) defines a data interchange format and the Omolumeter covers the case where data is exchanged from a static Web server to a separate web-app which visualizes the data.

Additionally, a RESTful Web data service is maintained containing a library of outbreak time series. The Outbreak Time Series Service API (the Outbreak API) is freely usable by web-apps across the web. The library's data is also maintained in the outbreak_time_series_data git repository. The library contains curated/compiled data on various outbreaks. (Note: git repository changes can be digitally signed by, say, governmental organizations, NGOs, etc.)

The curate data is drawn from existing published data found on the web. That data is transformed (if necessary) to comply with the Outbreak Time Series Specification. Transformation way well not be needed as the Spec is designed to be able to map and link to existing CSV data on the Web, data which may well not have been generated with knowledge of the Spec.

Existing data from around the web is cached and cleaned up, with sources credited. The aggregating cached copies of the CSVs are hosted on a server with HTTP CORS configures, so that the CSVs can be loaded as a data source by web-apps across the Web.

Eventually it is expected that more publishers will choose to host their own CSVW csv-metadata.json files along with their CSV files. Simply by adding a csv-metadata.json file to a Web server (with CORS enabled), a publisher can become a "live, direct" outbreak time series data source to web-apps.

Scope

Together the visualization tools and the data spec can be deployed over non-exotic Web and Internet infrastructure technologies in order to create what could be labeled "a global infectious disease outbreak monitoring network," if one were feeling hyperbolic.

The scope of this project is purely digital: there is nothing physical or biological within scope. This is only about data collection, collation, and distribution as enabled by data standards for outbreak data, and software which works with that standard.

The architecture of the monitoring network referred to is Internet-style, without the architectural weakness of a centralized authority. Such a network can continue to function in times of crisis. A crisis might be solely one of trust. This project addresses that use case: anyone could publish to the network, while optionally signing the data.

In other words, realizing "a global infectious disease outbreak monitoring network" could be boiled down to surprising little novel technology. The novel parts are just a Web data standard and some software to work with data compliant with the standard. The rest of the components of the solution are simply reused existing off-the-shelf free and open technologies. Those components include:

Git (read: distributed version control technology)
digital signatures
Existing disease ontologies
Various Web standards: CSVW, packaging and naming standards, etc.

This project aims to create as little new technology as required (ala Clay Christensen):

Define a simple outbreak time series data model atop the CSV format, with the help of CSVW JSON.
Define a way to take a collection of those CSV and JSON files and package them up in, say, a ZIP file or an HTTP cache.
Produce some Web software (read: HTML and JavaScript) that can read the above well defined, simple data structure.

The above file packaging structure (item 2) amounts to simply having naming patterns for files and directory names. A HTTP cache could be in a Web server or it could be in a client web-app's Service Worker cache for when mobile devices are not connected to a wireless data network.

Scope: any infectious disease

This project initially started during the 2014 Ebola Outbreak in West Africa. As such the data standard and software are specifically designed to work in situations where connectivity to the Internet is limited, intermittent, or nonexistent. This is actually one of the features that differentiates this project from other excellent visualizations tools, which usually require a connection to dynamic content Web servers (although that should improve over time). Additionally, this data and software can be transferred and run via, say, USB flash drives.

The project has been mindfully designed with an eye for multiple uses. Although the Ebola 2014 crisis is what engendered this project, there is nothing Ebola 2014 specific baked into this effort. The next outbreak will happen within three or four years, seemingly. The pace of outbreaks is increasing over the years. Let's get the Web data standards and free software stuff out of the way.

Scope: users

Potential users of Outbreak Time Series Spec include:

Researchers exchanging and/or distributing outbreak data
News organizations large and small: provide free quality web software for interactive journalism on infectious disease outbreaks
Governments large and small
NGOs who need highly engaging tools with which to promote their causes
Local individuals: those special individuals in the heat of things who get things started all by their lonesome

Scope: out of scope

Note that the scope of the project is intentionally limited to population level information only; out of scope are issues like contact tracing, the use of private phone data, or anything that involves "personally identifiable information" (PII), even if such information is anonymized.

Nor is this project about fulfilling roles such as what Oxford's SEEG is working on with the Malaria Atlas Project. SEEG closely collaborates with the WHO and has private agreements with multiple governments to centralize potentially sensitive information. That is more of a backend services going through official channels. The Outbreak Time Series Spec is very much an edge-of-network front end, Web focused effort designed to work even when the is no connection to the global network.

Status

Omolumeter, the primary demo app, is at version 0.4.5 (published on 2016-05-05). It is live, using Ebola data, at http://omolumeter.com. It is mobile web-friendly. It is not yet packaged as a native app (via Cordova), nor does it yet qualify as a PWA. The next enhancement is to add geographical maps.

Currently (as of 2016-06-11), the Outbreak Time Series Specification is going through a major rewrite from v0.0.1 to v0.1.0. Essentially, it was realized that the core serialization format should be based on CSVW. As this is happening the spec is being tested to see if it can handle data on multiple outbreaks from multiple CSV data sources.

Most specifically, an effort is underway to define an Indicator Ontology so the machine readable encodings can be generated for terms such as the following: "all deaths in the last 7 days," or "cumulative suspected cases to date, in children." The Spec needs a vocabulary of domain objects, the domain being infectious disease monitoring.

The outbreak_time_series_data git repository is only trivially populated. Once v0.1.0 of the Spec is further along, that would be the appropriate time to populate that repository.

Further information

A good non-technical summary of how this project started and its status in December 2014 is the AODG Report 1 from John Tigue.

For illustrations of what motivated this project, check out Motivation.

A highly visual Gallery of Ebola Visualizations Found Across the Web is maintained in this wiki.

To learn more about the Outbreak Time Series Specification, start with the Outbreak Time Series Specification Overview.

If you want to get right to coding against some outbreak data, there is JavaScript code to help in the outbreak_time_series_reader module.

Contributors: coders and epidemiological domain experts

Does any of this sound interesting? Well, get some caffeine and check out the To Do List for an overview where things are, and aren't. Of course, the issue tracker is where the nitty gritty details get hashed out.