Config File Options - HopkinsIDD/COVID19_Minimal GitHub Wiki
We recommend reviewing the Getting Started
pages before reading this page on the detailed configuration file options. For an example of a full configuration file, see config.yml
in this repository or the Supplementary Material of our preprint.
Overview
The model configuration file config.yml controls all of the options currently available. This file has a tabbed outline structure. We will refer to keys using their full position in the outline. For example, we denote
spatial_setup:
...
geodata: minimal
as spatial_setup::geodata
having a value of minimal
Description of Types/Formats
- date is [year]-[month]-[day]. (e.g., 2020-01-31)
- boolean is either "TRUE" or "FALSE"
- probability is a float between 0 and 1
- distribution is the following config structure:
Item | Required? | Type/Format |
---|---|---|
distribution | required | "fixed" or "uniform" |
value | required for "fixed" | fraction or probability |
low | required for "uniform" | fraction or probability |
high | required for "uniform" | fraction or probability |
Sections
Global Header
These global configuration options typically sit at the top of the configuration file.
Item | Required? | Type/Format | Description |
---|---|---|---|
file_is_unedited | required to not be there | Remove it! | |
name | required | string | typically named after the region/location you are modeling |
start_date | required | date | model simulation start date |
end_date | required | date | model simulation end date |
nsimulations | required | int | number of simulations to run |
dt | required | float | simulation time step in days |
dynfilter_path | optional | path to file | path to filtering text file |
report_location_name | optional | string |
name: Hawaii
start_date: 2020-01-31
end_date: 2020-12-31
nsimulations: 1000
dt: 0.25
report_location_name: Hawaii
spatial_setup
section
Config Item | Required? | Type/Format | Description |
---|---|---|---|
base_path | required | path to folder | base path for spatial files |
setup_name | required | string | spatial folder name |
geodata | required | path to file relative to base_path |
|
mobility | required | path to file relative to base_path | |
popnodes | required | string | name of population column in geodata |
nodenames | required | string | name of location nodes column in geodata |
census_year | optional | integer (year) | |
modeled_states | optional | list of location codes | vector of locations that will be modeled |
include_in_report | optional | boolean | name of boolean column in geodata |
shapefile_name | optional | path to file relative to base_path | |
shapefile | optional | path to file relative to base_path; identical to shapefile_name | |
nonUS_mobility_setup | required for non-US locations | path to file relative to base_path | |
nonUS_pop_setup | required for non-US locations | path to file relative to base_path | |
geoid_params_file | required for non-US locations if running age-specific hospitalization adjustment | path to file with geoid-specific relative risks of health outcomes |
spatial_setup:
base_path: data/HI
setup_name: HI
geodata: geodata.csv
mobility: mobility.txt
popnodes: population
nodenames: geoid
include_in_report: include_in_report
modeled_states:
- HI
census_year: 2010
shapefile: shp/counties_2010_HI.shp
shapefile_name: shp/counties_2010_HI.shp
geodata
file
geodata
is a .csv with column headers, with at least two columns:nodenames
andpopnodes
.nodenames
is the name of a column ingeodata
that specifies the geo IDs of an area. This column must be unique.popnodes
is the name of a column ingeodata
that specifies the population of thenodenames
column.include_in_report
is the name of an optional column ingeodata
that specifies whichnodenames
are included in the report. Models may include more locations than simply the location of interest.
Example geodata file format
geoid,population,include_in_report
10001,1000,TRUE
20002,2000,FALSE
mobility
file
The mobility
file is a .csv file (it has to contains .csv as extension) with long form comma separated values. Columns have to be named ori, dest, amount
with amount being the amount of individual going from place ori
to place dest
. Unassigned relations are assumed to be zero. ori
and dest
should match exactly the nodenames
column in geodata.csv
Example mobility file format
ori, dest, amount
10001, 20002, 3
20002, 10001, 3
It is also possible, but NOT RECOMMENDED to specify the mobility
file as a .txt with space-separated values in the shape of a matrix. This matrix is symmetric and of size K x K, with K being the number of rows in geodata
:
0 3
3 0
importation
section (optional)
This section is optional. It is used by the covidImportation package to import global air importation data for seeding infections into the United States.
If you wish to include it, here are the options.
Config Item | Required? | Type/Format | Description |
---|---|---|---|
census_api_key | required | string | get an API key |
travel_dispersion | required | number | how dispersed daily travel data is; default = 3. |
maximum_destinations | required | integer | Number of airports to limit importation to |
dest_type | required | categorical | location type |
dest_country | required | string (Country) | ISO3 code for country of importation. Currently only USA is supported |
aggregate_to | required | categorical | location type to aggregate to |
cache_work | required | boolean | whether to save case data |
update_case_data | required | boolean | deprecated; whether to update the case data or used saved |
draw_travel_from_distribution | required | boolean | whether to add additional stochasticity to travel data; default is FALSE |
print_progress | required | boolean | whether to print progress of importation model simulations |
travelers_threshold | required | integer | include airports with at least the travelers_threshold mean daily number of travelers |
airport_cluster_distance | required | numeric | cluster airports within airport_cluster_distance km |
param_list | required | See section below | see below |
importation::param_list
Config Item | Required? | Type/Format | Description |
---|---|---|---|
incub_mean_log | required | numeric | incubation period, log mean |
incub_sd_log | required | numeric | incubation period, log standard deviation |
inf_period_nohosp_mean | required | numeric | infectious period, non-hospitalized, mean |
inf_period_nohosp_sd | required | numeric | infectious period, non-hospitalized, sd |
inf_period_hosp_mean_log | required | numeric | infectious period, hospitalized, log-normal mean |
inf_period_hosp_sd_log | required | numeric | infectious period, hospitalized, log-normal sd |
p_report_source | required | numeric | reporting probability, Hubei and elsewhere |
shift_incid_days | required | numeric | mean delay from infection to reporting of cases; default = -10 |
delta | required | numeric | days per estimations period |
importation:
census_api_key: "fakeapikey00000"
travel_dispersion: 3
maximum_destinations: Inf
dest_type: state
dest_county: USA
aggregate_to: airport
cache_work: TRUE
update_case_data: TRUE
draw_travel_from_distribution: FALSE
print_progress: FALSE
travelers_threshold: 10000
airport_cluster_distance: 80
param_list:
incub_mean_log: log(5.89)
incub_sd_log: log(1.74)
inf_period_nohosp_mean: 15
inf_period_nohosp_sd: 5
inf_period_hosp_mean_log: 1.23
inf_period_hosp_sd_log: 0.79
p_report_source: [0.05, 0.25]
shift_incid_days: -10
delta: 1
seeding
section
There are two different seeding methods: 1) based on air importation (FolderDraw) and 2) based on earliest identified cases (PoissonDistributed)
FolderDraw is required if the importation section is present and requires folder_path
. Otherwise, put PoissonDistributed, which requires lambda_file
.
Config Item | Required? | Type/Format | Description |
---|---|---|---|
method | required | "FolderDraw" or "PoissonDistributed" | |
folder_path | required for FolderDraw | path to folder | |
lambda_file | required for PoissonDistributed | path to file | |
delay_incidC | optional for PoissonDistributed | numeric | Assumption for number of days delay between infection and case confirmation for seeding with the PoissonDistributed method. Default is 5 days. |
ratio_incidC | optional for PoissonDistributed | numeric | Assumption for ratio of infections to confirmed cases for seeding with the PoissonDistributed method. Default is 10 infections per confirmed case. |
casedata_file | required for non-US locations | path to the data file from which the seeding setup file will be created |
If using the importation
section of the config and the air importation model:
seeding:
method: FolderDraw
folder_path: importation/HI/
or if seeding according to the earliest identified cases:
seeding:
method: PoissonDistributed
lambda_file: data/HI/seeding.csv
delay_incidC: 5
ratio_incidC: 10
seir
section
Config Item | Required? | Type/Format | Description |
---|---|---|---|
parameters::alpha | optional | fraction | Transmission dampening parameter; Default is 1.0 and reasonable values for respiratory viruses range from 0.88-0.99 |
parameters::sigma | required | fraction or probability | Inverse of the incubation period in days |
parameters::gamma | required | distribution | Inverse of the infectious period in days |
parameters::R0s | required | distribution | Basic reproduction number |
seir:
parameters:
alpha: 0.5
sigma: 1 / 5.2
gamma:
distribution: uniform
low: 1 / 6
high: 1 / 2.6
R0s:
distribution: uniform
low: 3.5
high: 4
interventions
section
This section lets you specify custom intervention scenarios.
scenarios
specifies which settings to run. This does not need to include all items defined in settings
.
Config Item | Required? | Type/Format | Description |
---|---|---|---|
scripts_path | required | path name | |
scenarios | required | list of strings for scenario names | |
settings | required | See section below |
interventions::settings::[setting_name]
Each string in scenarios
should have a corresponding named setting in settings
.
Right now, there are three types of templates: Reduce, ReduceR0 and Stacked. The ReduceR0 template is a special and redundant case of the Reduce template for the the r0
parameter. The Stacked template allows you to combine multiple interventions (Reduce or Stacked) together into a single intervention scenario.
Item | Required? | Type/Format | Description |
---|---|---|---|
template | required | "Reduce", "ReduceR0" or "Stacked" | |
parameter | required for Reduce | "alpha", "r0", "gamma", "sigma" | Specify the parameter associated with the intervention reduction (alpha = mixing coefficient, r0 = basic reproductive number, gamma = inverse of the infectious period, sigma = inverse of the incubation period |
period_start_date | optional for Reduce, ReduceR0 | date between global start_date and end_date ; default is global start_date |
|
period_end_date | optional for Reduce, ReduceR0 | date between global start_date and end_date ; default is global end_date |
|
value | required for Reduce, ReduceR0 | distribution | |
affected_geoids | optional for Reduce, ReduceR0 | list of geoids, which must be in geodata | |
fatigue_rate::distribution | optional for Reduce, ReduceR0 | distribution | Indicates the rate of intervention fatigue |
fatigue_frequency_days | optional for Reduce, ReduceR0 | numeric | Number of days it takes for the NPI to reach the new value |
fatigue_min | optional for Reduce, ReduceR0 | numeric | Minimum intervention effectivness for fatiguing interventions, clip values below this minimum |
fatigue_type | optional for Reudce, Reduce R0 | "geometric" | If specified, produce a geometric fatiguing rate instead of a linear fatiguing rate |
interventions:
scenarios:
- None
- Scenario1
- Scenario2
settings:
None:
template: ReduceR0
period_start_date: 2020-04-01
period_end_date: 2020-05-15
value:
distribution: fixed
value: 0
Wuhan:
template: Reduce
parameter: r0
period_start_date: 2020-04-01
period_end_date: 2020-05-15
value:
distribution: uniform
low: .81
high: .89
UK:
template: ReduceR0
period_start_date: 2020-05-16
period_end_date: 2020-05-31
value:
distribution: uniform
low: .71
high: .83
Scenario2:
template: Reduce
parameter: r0 # Parameter to reduce
period_start_date: 2020-02-01
period_end_date: 2020-05-15
value:
distribution: uniform # Value of reduction as it was specified before.
low: .6
high: .7
fatigue_rate:
distribution: uniform # Value of fatigue
low: .1
high: .2
fatigue_frequency_days: 4*7 # Number of days for the NPI to reach a new_value
fatigue_min: .2 # 0 if unspecified, clip when the NPI reaches this value.
#fatigue_type: geometric # if there, produce geometric fatigue (so reduction of reduction)
Scenario1:
template: Stacked
scenarios:
- Wuhan
- UK
hospitalization
section
There are two modules for the calculation of health outcomes. One module enables location-specific health outcome risks (e.g., accounting for differences in age distribution between location), by using the risk of various health outcomes relative to a national average. A second module specifies un-adjusted, population-wide health outcome risks and requires slightly different parameters than the location-specific calculation, which is preferred.
hospitalization
calculations
Location-specific A location-specific age-adjustment requires the existence of a "geoid params" file.
- For county-level models in the US, this file is already provided in
COVIDScenarioPipeline/sample_data/geoid-params.csv
and you need only to sethospitalization::run_age_adjust
to TRUE. - For models outside of the US, you will need to create this geoid params file (see the Getting Started page for Non US locations under "Calculate age-specific outcomes parameters for each district"). You will also need to specify the path to this file under
spatial_setup::geoid_params_file
(see above) and sethospitalization::run_age_adjust
to TRUE.
Config Item | Required? | Type/Format | Description |
---|---|---|---|
paths::output_path | required | path to folder | |
paths::run_age_adjust | required | boolean | |
parameters::time_hosp | required | Two numbers (log median, log sd) | time from symptom onset to hospitalization admission (in days) |
parameters::time_disch | required | Two numbers (log median, log sd) | time from hospitalization to hospital discharge |
parameters::time_ICU | required | Two numbers (log median, log sd) | time from hospital admission to ICU admission |
parameters::time_ICUdur | required | Two numbers (log median, log sd) | time spent in ICU |
parameters::time_vent | required | Two numbers (log median, log sd) | time from ICU admission to ventilator use |
parameters::time_ventdur | required | Two numbers (log median, log sd) | time spent on ventilator |
parameters::p_death | required | probability | probability of death given infection |
parameters::p_death_names | required | probability | |
parameters::p_hosp_inf | required | probability | probability of hospitalization given infection |
parameters::time_onset_death | required | Two numbers (log median, log sd) | time from symptom onset to death |
hospitalization:
paths:
output_path: hospitalization
run_age_adjust: TRUE
parameters:
time_hosp: [log(7), 0.3]
time_disch: [log(11.5), log(1.22)]
time_ICU: [log(3), 0.3]
time_ICUdur: [log(8), 0.2]
time_ventdur: [log(7), 0.2]
time_vent: [log(1), 0.4]
p_death: [.0025, .005, .01]
p_death_names: ["low","med","high"]
p_hosp_inf: [0.025, 0.05, 0.1]
time_onset_death: [2.84, 0.52]
hospitalization
calculations
Un-adjusted population-wide config options for Config Item | Required? | Type/Format | Description |
---|---|---|---|
paths::output_path | required | path to folder | |
parameters::time_hosp | required | Two numbers (log mean, log sd) | time from symptom onset to hospitalization admission (in days) |
parameters::time_disch | required | Two numbers (log mean, log sd) | time from hospitalization to hospital discharge |
parameters::time_death | required | Two numbers (log mean, log sd) | time from hospitalization to death |
parameters::time_ICU | required | Two numbers (log mean, log sd) | time from hospital admission to ICU admission |
parameters::time_ICUdur | required | Two numbers (log mean, log sd) | time spent in ICU |
parameters::time_vent | required | Two numbers (log mean, log sd) | time from ICU admission to ventilator use |
parameters::p_death | required | probability | probability of death given infection |
parameters::p_death_names | required | probability | |
parameters::p_death_rate | required | probability | probability of death given hospitalization (single value only) |
parameters::p_ICU | required | probability | probability of ICU admission given hospitalization |
parameters::p_vent | required | probability | probability of ventilation given ICU admission |
hospitalization:
paths:
output_path: hospitalization
parameters:
time_hosp: [1.23, 0.79]
time_disch: [log(11.5), log(1.22)]
time_death: [log(11.25), log(1.15)]
time_ICU: [log(8.25), log(2.2)]
time_ICUdur: [log(16), log(2.96)]
time_vent: [log(10.5), log((10.5-8)/1.35)]
p_death: [.0025, .005, .01]
p_death_names: ["low","med","high"]
p_death_rate: 0.1
p_ICU: 0.32
p_vent: 0.15
report
section
The report
section is completely optional and provides settings for making an R Markdown report. For an example of a report, see the Supplementary Material of our preprint
If you wish to include it, here are the options.
Config Item | Required? | Type/Format | Description |
---|---|---|---|
data_settings::pop_year | integer | ||
plot_settings::plot_intervention | boolean | ||
formatting::scenario_labels_short | list of strings; one for each scenario in interventions::scenarios |
||
formatting::scenario_labels | list of strings; one for each scenario in interventions::scenarios |
||
formatting::scenario_colors | list of strings; one for each scenario in interventions::scenarios |
||
formatting::pdeath_labels | list of strings | ||
formatting::display_dates | list of dates | ||
formatting::display_dates2 | optional | list of dates | a 2nd string of display dates that can optionally be supplied to specific report functions |
report:
data_settings:
pop_year: 2018
plot_settings:
plot_intervention: TRUE
formatting:
scenario_labels_short: ["UC", "S1"]
scenario_labels:
- Uncontrolled
- Scenario 1
scenario_colors: ["#D95F02", "#1B9E77"]
pdeath_labels: ["0.25% IFR", "0.5% IFR", "1% IFR"]
display_dates: ["2020-04-15", "2020-05-01", "2020-05-15", "2020-06-01", "2020-06-15"]
display_dates2: ["2020-04-15", "2020-05-15", "2020-06-15"]