Getting Started Non US Location - HopkinsIDD/COVID19_Minimal GitHub Wiki

Setting up a Non-US Location

This is a tutorial for setting up the Hopkins IDD COVID Scenario Pipeline (CSP) for a country other than the United States. Because the main pipeline is setup for direct application to the U.S. or U.S. states, currently several parts need to be done manually for another country.

To run the COVID Scenario Pipeline for a non-US setting, users need to provide input data that are not integrated into the CSP (for the U.S. these are integrated). These include:

  • Shapefile of country and subunit boundaries
  • Mobility data for movement between subunits
  • Case data, spatially resolved to the geographical subunit level (required for seeding and inference applications only)

Checkout projects from GitHub

The COVIDScenarioPipeline repo will be put inside the COVID19_ repo. They are treated as independent, so push and pull code to them independently, and make sure they are both at the specific commit that you need.

  1. Create a spatial repo from the COVID19_Minimal template by navigating to COVID19_Minimal and clicking "Use this template". For this example, we'll do Westeros. We will name it COVID19_Westeros and create the repository as [yourgithubuser].

  2. Checkout the spatial repo you just created.

    git clone https://github.com/yourgithubuser/COVID19_Westeros.git
    
  3. Checkout COVIDScenarioPipeline repo within the spatial repo.

    cd COVID19_Westeros
    git clone https://github.com/HopkinsIDD/COVIDScenarioPipeline.git
    

Run Docker Image

While working with this pipeline, we recommend that you edit files from your local machine and run scripts from the provided Docker container. The advantage of this container is that it already has all the packages installed, which takes some time.

If you prefer to run the model without the use of Docker, you can see all of the requirements in the COVIDScenarioPipeline repo. See the R requirements in packages.R and local_install.R, OS requirements in Dockerfile, and Python requirements in requirements.txt.

For most users, we strongly recommend the use of the Docker container and the instructions for its installation are described below.

  1. Go to the Docker Hub website (https://hub.docker.com) and create an account.

  2. Start the Docker service on your computer. One way is to download, install, and run Docker Desktop. Google searches can help you with this, potentially more than our support team can, but feel free to ask questions.

  3. Open a terminal. For the docker commands in this section, if you run into permissions problems, you will need to put sudo in front.

  4. Pull the docker image from hub.docker.com. You'll only have to do this the first time.

     `docker pull hopkinsidd/covidscenariopipeline:latest`
    
  5. Run the docker container with your current directory mounted as /home/app/covidsp

    On Linux or Mac:

     `docker run -it --rm -v ~/mysrcdir:/home/app/covidsp hopkinsidd/covidscenariopipeline:latest`
    

    Replace mysrcdir with the path where the code is mounted on your machine (e.g., ~/myuser/COVID19_Westeros)

    On Windows:

     `docker run -it --rm -v %CD%:/home/app/covidsp hopkinsidd/covidscenariopipeline:latest-dev`
    

    You may need to change "%CD%" to your explicit directory

  6. You are now in the docker container in /home/app. The directory you ran step #3 from is mapped to /home/app/covidsp in the container.

     `cd covidsp`
    
  7. For now, the Docker container needs some local R packages installed. Run this:

     Rscript COVIDScenarioPipeline/local_install.R    
    

    *If there's a prompt Enter one or more numbers, or an empty line to skip updates:, just hit <Enter>.


Edit the config

The config file config.yml controls all of the options currently available. (See this page for more details.)

An example config file has been included in the COVID19_Minimal repository. Start by replacing the US version with the non-US version using the following commands:

rm config.yml # remove US-specific config
mv config_nonUS.yml config.yml # rename the non-US default config file to the generic file name

This file has a tabbed outline structure. We will refer to keys using their full position in the outline. For example, we denote

spatial_setup:
  ...
  census_year: 2020       # use latest WorldPop Estimates
  base_path: data         # where all the country-specific data are saved 
  setup_name: westeros    # name of the country/region
  us_model: FALSE         # Whether running model for United States or elsewhere
  modeled_states:         # ISO3 codes of all countries to be included
  - WES
  geoid_len: 6            # User creates `geoid` variable, so length must be specified
  ...
  geoid_params_file: data/geoid_params.csv  # name of file of geounit-specific outcome parameters


Config Item Explanation of Value Example
name Give it a name westeros
spatial_setup::us_model Whether running a model for U.S. FALSE
spatial_setup::modeled_states This should be a list of the countries you want to simulate, with each country on it's own line preceded by - modeled_states:
- WES
spatial_setup::popnodes The name of the column in spatial_setup::geodata file that specifies population pop
spatial_setup::shapefile A path to a shapefile relative to spatial_setup::base_path with a GEOID column. geodata/Westeros/Westeros_Districts.shp
spatial_setup::shapefile_name same as spatial_setup::shapefile geodata/Westeros/Westeros_Districts.shp
spatial_setup::geoid_params_file Path to outcome parameter file; file generated below data/geoid_params.csv

Delete the line this_file_is_unedited, or set it's value to FALSE. This is just to make sure people edit the config.yml.



Initial non-US Data Setup

In non-US CSP applications, setup data are not incorporated into the repositories so must be incorportated manually. For the user, this has been consolidated into a single script to format inputted data correctly for CSP use. As mentioned previously, these include:

  • Shapefile of country and subunit boundaries
  • Mobility data for movement between subunits
  • Case data, spatially resolved to the geographical subunit level (required for seeding and inference applications only)

Only case data will need to be updated after the first setup.

This full setup is done for the user by running the script below. 
The user only needs to specify the variables in the beginning and 
provide the necessary data.

1. Install additional R packages

Open R and run these:

devtools::install_github("HopkinsIDD/globaltoolboxlite")
devtools::install_github("HopkinsIDD/covidSeverity")

*If there's a prompt Enter one or more numbers, or an empty line to skip updates:, just hit <Enter>.


2. Setup preliminary data

  • Shapefile

    • Put shapefile data in the data/geodata directory of the spatial repo.
    • Check shapefile to determine the variable which identifies the name of the spatial units (e.g., "NAME" column). This will be specified as shp_loc_var in the script below.
  • Mobility OD matrix

    • Matrix of origin-destination (OD) counts of daily movement between nodes (e.g., districts) is required
    • This should be saved as data/geodata/mobility_data_counts.csv so it works with the script below.

3. Initial data setup - two options:


OPTION A: Run setup script with arguments from command line

Rscript COVIDScenarioPipeline/R/scripts/setup_initial_nonUS_data.R -c config.yml -w TRUE -v ADMIN2 -j 4
  • -w TRUE (this tells it to download WorldPop geotiffs and only should be set TRUE on first run)
  • -v ADMIN2 (the variable name for the district or other geounits in the shapefile which will serve as the nodes)
  • -j 4 (number of cores to use in parallel)

OPTION B: Copy initial data setup script to spatial repo and modify

  • Copy setup file from COVIDScenarioPipeline

    mkdir R
    cp ./COVIDScenarioPipeline/R/scripts/setup_initial_nonUS_data.R ./R/
    
  • Modify R/setup_initial_nonUS_data.R

  • Run R/setup_initial_nonUS_data.R to generate standardized data needed by the model.

    Rscript R/setup_initial_nonUS_data.R
    

This setup script does the following:

Generate district-specific 10-year age distributions This script downloads necessary geotiff files from WorldPop.org and uses the shapefile to aggregate data to districts. These data are combined to generate 10-year district-specific age distibutions of any country of interest. This is all done automatically in the setup_initial_nonUS_data.R. Users only need to provide their own shapefiles for this.

Calculate age-specific outcomes parameters for each district
Age-adjusted outcomes by nodes can be generated using the covidSeverity R package. See the age-adjustment-example.Rmd in the vignettes for a detailed tutorial. This is all done automatically in the setup_initial_nonUS_data.R script. The age distributions are applied to the severity probabilities to adjust severity and outcome parameters for each node.

Modify shapefile for use with pipeline reports This script also modifies the shapefile to include the automtically generated geoid for each node/district.

Produce population data population_data.csv needed for the model The WorldPop data is used to generate total population estimates by district.



Generate geodata.csv and mobility.csv

For non-US locations:

  1. Files for mobility data and population data for the nodes of the admin level 1 or 2 spatial areas in the directory data/geodata. These should both be .csv files, and GEOIDs must be consistent between them. These files will be used to generate the geodata.csv and mobility.csv files required for the CSP transmission model

    a. Mobility data:

     - Origin-destination table in long-form
    
     - Column names:  
         -- `OGEOID`: GEOID of origin node  
         -- `DGEOID`: GEOID of destination node   
         -- `FLOW`: Number of individuals moving from origin to destination geounit on a daily basis
     
     - If this is not already produced, the `[ADD REPO/PACKAGE]` R package has been designed to generate these data. See the included R script `[ADD R SCRIPT]` to run this. 
    
     - Default file name: `mobility_data.csv`    
    

    b. Population data:
    These data can be generated automatically from WorldPop using the included R script setup_coutry_data.R.

     - Population of each node/geounit.
     - Column names:  
         -- `ADMIN0`: Name or abbreviation for the spatial subunit 1-2 levels up from the nodes (e.g., Province, State, or Country)  
         -- `GEOID`: Unique identifier for all admin level 2 areas that will be treated as nodes. Ideally this will be a 5-6 digit numeric code.  
         -- `POP`: Population of each node
     
     - Default file name: `population_data.csv`
    

  1. Be aware that the model build code will run a script R/scripts/build_nonUS_setup.R and create a geodata and mobility file according to the specifications in your config file. For this setup script to run correctly, you will need to specify spatial_setup::nonUS_mobility_setup and spatial_setup::nonUS_pop_setup in your config file. You will not need to run the script separately.


Seeding

Users have three options for seeding:

  • Importation [not currently available for non-US locations]
  • Case Seeding Manual ("FolderDraw")
  • Case Seeding Stochastic ("PoissonDistributed")

Typically, a country setup currently will use the PoissonDistributed option, using inputted case data. Importation is currently not available for non-US locations, though this is planned for the future. Both case seeding methods can be done using user-defined cases or seeds, or using detected cases. The FolderDraw method requires user generated seeding files (1 for each simulation iteration).

  1. Remove or comment out the the following sections:

    importation:
      ...
      ...    
    seeding: 
      method: FolderDraw
      folder_path: importation/ 
    
  2. Uncomment:

    seeding:  
       method: PoissonDistributed
       lambda_file: data/seeding.csv
    

lambda_file

-- User provided case data are required for this setup --

The COVID19_Minimal repository has an example seeding::lambda_file data/minimal/seeding.csv. This file should have three columns to provide to a poisson distribution to determine the actual number of imported/seeded cases
- place: geoid of seeded cases
- date: date cases occur
- amount: percentage of population infected

Be aware that the model build code will run a script R/scripts/create_seeding.R and create an epidemic seeding file according to the specifications in your config file. You will not need to run the script separately.



Build and run

  1. From the spatial scenario directory, create the Makefile using the R script.

    Rscript COVIDScenarioPipeline/R/scripts/make_makefile.R -c config.yml

  2. Make a report directory and sub-directory because that is the .Rmd expects to be two directories below right now.

    mkdir notebooks    
    cd notebooks    
    mkdir WES_today   
    
  3. Make the initial R markdown (.Rmd) by running the following command.

    Rscript -e 'rmarkdown::draft("notebooks/WES_today/WES_report.Rmd",template="country_report",package="report.generation",edit=FALSE)'

  4. Write the render line into compile_Rmd.R. (compile_Rmd.R is later used by fancy automation.)

    echo 'rmarkdown::render("notebooks/WES_today/WES_report.Rmd", params=list(country_iso3="WES"))' >compile_Rmd.R

  5. Build and run.

    make

If you see the following error during the make command, this is a known bug in the report generation step, which Rscript compile_Rmd.R bypasses.

Execution halted
Makefile:11: recipe for target 'notebooks/blue_hawaii_20200520/blue_hawaii_20200520_report.html' failed
make: *** [notebooks/blue_hawaii_20200520/blue_hawaii_20200520_report.html] Error 1

Commands:

make clean # Removes all the generated files so they can be re-generated
make
Rscript compile_Rmd.R # Getting around report generation bug

Tada! The data that you are looking for is in csv's in the folders model_output and hospitalization.

The report you are looking for is in notebooks/WES_today/WES_report.html. View with a web browser.


Exit the Docker container

To exit without stopping the container so you can attach later, type Ctrl-p; Ctrl-q.

Otherwise, exit and remove the container with Ctrl-c.

⚠️ **GitHub.com Fallback** ⚠️