Data Ingestion - JohnTigue/idots GitHub Wiki

Unbelievably (or more appropriately, disappointingly) the world seems to be at a state where ebola data acquisition revolves around screen scraping of Web pages from the CDC, WHO, etc.

To address this insanity, Luis Capelo banged out a tool:

Simple, ScraperWiki-based email-alert script that sends a warning every time WHO updates their Ebola cases figure.

Then I guess screen scrappers go off and collect the info. E.g. https://github.com/luiscape/hdxscraper-cdc-ebola-historic

HDX

For example, the dataset on HDX seems to be built via multiple scrapers. Here's one: https://github.com/luiscape/hdxscraper-cdc-ebola-historic/blob/master/code/scraper.R

In luiscape's repositories are other scrapers

See Other Coders for more on him.

Future

So, hopefully we can just use their work for a while. But in the end there should be some core, highly typed data repositories (HDX and OHDR will probably be first) where data has passed through acquisition and cleaning, certified by some data lint checker (which can be used solo for devs testing data they generate).

This will need to be addressed... Issue #11

⚠️ **GitHub.com Fallback** ⚠️