Truth Data - reichlab/covid19-forecast-hub GitHub Wiki

Truth Data Developer information

This page contains information for developers interested in how the Forecast Hub truth data are updated and validated.

As of February 20, 2023 we are no longer collecting data or analyzing COVID-19 cases and as of March 6, 2023 we are no longer collecting data or analyzing COVID-19 deaths. As of March 10, 2023, Johns Hopkins University's (JHU) Center for System Science and Engineering (CSSE) will no longer report COVID-19 cases or deaths.

Truth generation schedule and scripts

The automated GitHub Action updates the truth weekly at 12pm on Sundays. The configuration for the workflow can be found here. This workflow calls multiple packages and their functions, as well as stand alone scripts to generate multiple truth data files to be consumed by different endpoints:

  • Deaths and Cases truths: We use the covidData package to get the most recent time series data for COVID-19 from the JHU data repository, then use the preprocess_jhu() method in the covidHubUtils package to transform these data into CSVs truth-Cumulative Deaths.csv, truth-Incident Deaths.csv, truth-Cumulative Cases.csv and truth-Incident Cases.csv.
  • Hospitalization truths: We use the covidData package to get the most recent time series data for hospitalization, then use the preprocess_hospitalization() method in the covidHubUtils package to transform these data into CSVs truth-Cumulative Hospitalizations.csv and truth-Incident Hospitalizations.csv.
  • Visualization truth: get_visualization_truth_json_from_csv.py is the script used to generate the JSON truth file from the CSVs, so they can be consumed by the visualization. Here, the Incidence forecasts are lower bounded to 0.
  • Zoltar truth: save_truth_for_zoltar is the method in covidHubUtils used for generating the truth data for Zoltar. Here, the Incidence forecasts are not lower bounded to 0.

GitHub Actions

This active workflow is responsible for running the truth update weekly, and can be manually activated as well if triggered. The configuration for this workflow is defined here

This deprecated workflow is the previous version used, which did not unit test the truth data before aggregation, but it can still be triggered manually if needed. The configuration for this workflow is defined here

Truth Unit-Test

The JHU truth data is unit-tested through 2 phases

  • Tests in covidData to ensure that package is faithful to the raw source data at the county level, and that aggregation is done correctly.
  • Tests in covidHubUtils to make sure the functions are faithful to the covidData outputs, and that we have all the correct locations, timezeros, etc as required for the specific file