Getting Started Non US Location - HopkinsIDD/COVID19_Minimal GitHub Wiki
This is a tutorial for setting up the Hopkins IDD COVID Scenario Pipeline (CSP) for a country other than the United States. Because the main pipeline is setup for direct application to the U.S. or U.S. states, currently several parts need to be done manually for another country.
To run the COVID Scenario Pipeline for a non-US setting, users need to provide input data that are not integrated into the CSP (for the U.S. these are integrated). These include:
- Shapefile of country and subunit boundaries
- Mobility data for movement between subunits
- Case data, spatially resolved to the geographical subunit level (required for seeding and inference applications only)
The COVIDScenarioPipeline repo will be put inside the COVID19_ repo. They are treated as independent, so push and pull code to them independently, and make sure they are both at the specific commit that you need.
-
Create a spatial repo from the COVID19_Minimal template by navigating to COVID19_Minimal and clicking "Use this template". For this example, we'll do Westeros. We will name it COVID19_Westeros and create the repository as [yourgithubuser].
-
Checkout the spatial repo you just created.
git clone https://github.com/yourgithubuser/COVID19_Westeros.git
-
Checkout COVIDScenarioPipeline repo within the spatial repo.
cd COVID19_Westeros git clone https://github.com/HopkinsIDD/COVIDScenarioPipeline.git
While working with this pipeline, we recommend that you edit files from your local machine and run scripts from the provided Docker container. The advantage of this container is that it already has all the packages installed, which takes some time.
If you prefer to run the model without the use of Docker, you can see all of the requirements in the COVIDScenarioPipeline repo. See the R requirements in packages.R
and local_install.R
, OS requirements in Dockerfile
, and Python requirements in requirements.txt
.
For most users, we strongly recommend the use of the Docker container and the instructions for its installation are described below.
-
Go to the Docker Hub website (https://hub.docker.com) and create an account.
-
Start the Docker service on your computer. One way is to download, install, and run Docker Desktop. Google searches can help you with this, potentially more than our support team can, but feel free to ask questions.
-
Open a terminal. For the docker commands in this section, if you run into permissions problems, you will need to put
sudo
in front. -
Pull the docker image from hub.docker.com. You'll only have to do this the first time.
`docker pull hopkinsidd/covidscenariopipeline:latest`
-
Run the docker container with your current directory mounted as /home/app/covidsp
On Linux or Mac:
`docker run -it --rm -v ~/mysrcdir:/home/app/covidsp hopkinsidd/covidscenariopipeline:latest`
Replace
mysrcdir
with the path where the code is mounted on your machine (e.g.,~/myuser/COVID19_Westeros
)On Windows:
`docker run -it --rm -v %CD%:/home/app/covidsp hopkinsidd/covidscenariopipeline:latest-dev`
You may need to change "%CD%" to your explicit directory
-
You are now in the docker container in /home/app. The directory you ran step #3 from is mapped to
/home/app/covidsp
in the container.`cd covidsp`
-
For now, the Docker container needs some local R packages installed. Run this:
Rscript COVIDScenarioPipeline/local_install.R
*If there's a prompt
Enter one or more numbers, or an empty line to skip updates:
, just hit<Enter>
.
The config file config.yml controls all of the options currently available. (See this page for more details.)
An example config file has been included in the COVID19_Minimal repository. Start by replacing the US version with the non-US version using the following commands:
rm config.yml # remove US-specific config
mv config_nonUS.yml config.yml # rename the non-US default config file to the generic file name
This file has a tabbed outline structure. We will refer to keys using their full position in the outline. For example, we denote
spatial_setup:
...
census_year: 2020 # use latest WorldPop Estimates
base_path: data # where all the country-specific data are saved
setup_name: westeros # name of the country/region
us_model: FALSE # Whether running model for United States or elsewhere
modeled_states: # ISO3 codes of all countries to be included
- WES
geoid_len: 6 # User creates `geoid` variable, so length must be specified
...
geoid_params_file: data/geoid_params.csv # name of file of geounit-specific outcome parameters
Config Item | Explanation of Value | Example |
---|---|---|
name | Give it a name | westeros |
spatial_setup::us_model | Whether running a model for U.S. | FALSE |
spatial_setup::modeled_states | This should be a list of the countries you want to simulate, with each country on it's own line preceded by -
|
modeled_states: - WES |
spatial_setup::popnodes | The name of the column in spatial_setup::geodata file that specifies population | pop |
spatial_setup::shapefile | A path to a shapefile relative to spatial_setup::base_path with a GEOID column. | geodata/Westeros/Westeros_Districts.shp |
spatial_setup::shapefile_name | same as spatial_setup::shapefile | geodata/Westeros/Westeros_Districts.shp |
spatial_setup::geoid_params_file | Path to outcome parameter file; file generated below | data/geoid_params.csv |
Delete the line this_file_is_unedited
, or set it's value to FALSE. This is just to make sure people edit the config.yml.
In non-US CSP applications, setup data are not incorporated into the repositories so must be incorportated manually. For the user, this has been consolidated into a single script to format inputted data correctly for CSP use. As mentioned previously, these include:
- Shapefile of country and subunit boundaries
- Mobility data for movement between subunits
- Case data, spatially resolved to the geographical subunit level (required for seeding and inference applications only)
Only case data will need to be updated after the first setup.
This full setup is done for the user by running the script below.
The user only needs to specify the variables in the beginning and
provide the necessary data.
Open R and run these:
devtools::install_github("HopkinsIDD/globaltoolboxlite")
devtools::install_github("HopkinsIDD/covidSeverity")
*If there's a prompt Enter one or more numbers, or an empty line to skip updates:
, just hit <Enter>
.
-
Shapefile
- Put shapefile data in the
data/geodata
directory of the spatial repo. - Check shapefile to determine the variable which identifies the name of the spatial units (e.g., "NAME" column). This will be specified as
shp_loc_var
in the script below.
- Put shapefile data in the
-
Mobility OD matrix
- Matrix of origin-destination (OD) counts of daily movement between nodes (e.g., districts) is required
- This should be saved as
data/geodata/mobility_data_counts.csv
so it works with the script below.
OPTION A: Run setup script with arguments from command line
Rscript COVIDScenarioPipeline/R/scripts/setup_initial_nonUS_data.R -c config.yml -w TRUE -v ADMIN2 -j 4
- -w TRUE (this tells it to download WorldPop geotiffs and only should be set TRUE on first run)
- -v ADMIN2 (the variable name for the district or other geounits in the shapefile which will serve as the nodes)
- -j 4 (number of cores to use in parallel)
OPTION B: Copy initial data setup script to spatial repo and modify
-
Copy setup file from COVIDScenarioPipeline
mkdir R cp ./COVIDScenarioPipeline/R/scripts/setup_initial_nonUS_data.R ./R/
-
Modify
R/setup_initial_nonUS_data.R
-
Run
R/setup_initial_nonUS_data.R
to generate standardized data needed by the model.Rscript R/setup_initial_nonUS_data.R
This setup script does the following:
Generate district-specific 10-year age distributions
This script downloads necessary geotiff files from WorldPop.org and uses the shapefile to aggregate data to districts. These data are combined to generate 10-year district-specific age distibutions of any country of interest. This is all done automatically in the setup_initial_nonUS_data.R
. Users only need to provide their own shapefiles for this.
Calculate age-specific outcomes parameters for each district
Age-adjusted outcomes by nodes can be generated using the covidSeverity R package. See the age-adjustment-example.Rmd
in the vignettes for a detailed tutorial. This is all done automatically in the setup_initial_nonUS_data.R
script. The age distributions are applied to the severity probabilities to adjust severity and outcome parameters for each node.
Modify shapefile for use with pipeline reports
This script also modifies the shapefile to include the automtically generated geoid
for each node/district.
Produce population data population_data.csv
needed for the model
The WorldPop data is used to generate total population estimates by district.
For non-US locations:
-
Files for mobility data and population data for the nodes of the admin level 1 or 2 spatial areas in the directory
data/geodata
. These should both be .csv files, and GEOIDs must be consistent between them. These files will be used to generate thegeodata.csv
andmobility.csv
files required for the CSP transmission modela. Mobility data:
- Origin-destination table in long-form - Column names: -- `OGEOID`: GEOID of origin node -- `DGEOID`: GEOID of destination node -- `FLOW`: Number of individuals moving from origin to destination geounit on a daily basis - If this is not already produced, the `[ADD REPO/PACKAGE]` R package has been designed to generate these data. See the included R script `[ADD R SCRIPT]` to run this. - Default file name: `mobility_data.csv`
b. Population data:
These data can be generated automatically from WorldPop using the included R scriptsetup_coutry_data.R
.- Population of each node/geounit. - Column names: -- `ADMIN0`: Name or abbreviation for the spatial subunit 1-2 levels up from the nodes (e.g., Province, State, or Country) -- `GEOID`: Unique identifier for all admin level 2 areas that will be treated as nodes. Ideally this will be a 5-6 digit numeric code. -- `POP`: Population of each node - Default file name: `population_data.csv`
- Be aware that the model build code will run a script
R/scripts/build_nonUS_setup.R
and create a geodata and mobility file according to the specifications in your config file. For this setup script to run correctly, you will need to specifyspatial_setup::nonUS_mobility_setup
andspatial_setup::nonUS_pop_setup
in your config file. You will not need to run the script separately.
Users have three options for seeding:
- Importation [not currently available for non-US locations]
- Case Seeding Manual ("FolderDraw")
- Case Seeding Stochastic ("PoissonDistributed")
Typically, a country setup currently will use the PoissonDistributed
option, using inputted case data. Importation is currently not available for non-US locations, though this is planned for the future. Both case seeding methods can be done using user-defined cases or seeds, or using detected cases. The FolderDraw method requires user generated seeding files (1 for each simulation iteration).
-
Remove or comment out the the following sections:
importation: ... ... seeding: method: FolderDraw folder_path: importation/
-
Uncomment:
seeding: method: PoissonDistributed lambda_file: data/seeding.csv
-- User provided case data are required for this setup --
The COVID19_Minimal repository has an example seeding::lambda_file data/minimal/seeding.csv
. This file should have three columns to provide to a poisson distribution to determine the actual number of imported/seeded cases
- place
: geoid of seeded cases
- date
: date cases occur
- amount
: percentage of population infected
Be aware that the model build code will run a script R/scripts/create_seeding.R
and create an epidemic seeding file according to the specifications in your config file. You will not need to run the script separately.
-
From the spatial scenario directory, create the Makefile using the R script.
Rscript COVIDScenarioPipeline/R/scripts/make_makefile.R -c config.yml
-
Make a report directory and sub-directory because that is the .Rmd expects to be two directories below right now.
mkdir notebooks cd notebooks mkdir WES_today
-
Make the initial R markdown (.Rmd) by running the following command.
Rscript -e 'rmarkdown::draft("notebooks/WES_today/WES_report.Rmd",template="country_report",package="report.generation",edit=FALSE)'
-
Write the render line into compile_Rmd.R. (compile_Rmd.R is later used by fancy automation.)
echo 'rmarkdown::render("notebooks/WES_today/WES_report.Rmd", params=list(country_iso3="WES"))' >compile_Rmd.R
-
Build and run.
make
If you see the following error during the make
command, this is a known bug in the report generation step, which Rscript compile_Rmd.R
bypasses.
Execution halted
Makefile:11: recipe for target 'notebooks/blue_hawaii_20200520/blue_hawaii_20200520_report.html' failed
make: *** [notebooks/blue_hawaii_20200520/blue_hawaii_20200520_report.html] Error 1
Commands:
make clean # Removes all the generated files so they can be re-generated
make
Rscript compile_Rmd.R # Getting around report generation bug
Tada! The data that you are looking for is in csv's in the folders model_output
and hospitalization
.
The report you are looking for is in notebooks/WES_today/WES_report.html. View with a web browser.
To exit without stopping the container so you can attach later, type Ctrl-p; Ctrl-q
.
Otherwise, exit and remove the container with Ctrl-c
.