Config File Options - HopkinsIDD/COVID19_Minimal GitHub Wiki

We recommend reviewing the Getting Started pages before reading this page on the detailed configuration file options. For an example of a full configuration file, see config.yml in this repository or the Supplementary Material of our preprint.

Overview

The model configuration file config.yml controls all of the options currently available. This file has a tabbed outline structure. We will refer to keys using their full position in the outline. For example, we denote

spatial_setup:
  ...
  geodata: minimal

as spatial_setup::geodata having a value of minimal

Description of Types/Formats

  • date is [year]-[month]-[day]. (e.g., 2020-01-31)
  • boolean is either "TRUE" or "FALSE"
  • probability is a float between 0 and 1
  • distribution is the following config structure:
Item Required? Type/Format
distribution required "fixed" or "uniform"
value required for "fixed" fraction or probability
low required for "uniform" fraction or probability
high required for "uniform" fraction or probability

Sections

Global Header

These global configuration options typically sit at the top of the configuration file.

Item Required? Type/Format Description
file_is_unedited required to not be there Remove it!
name required string typically named after the region/location you are modeling
start_date required date model simulation start date
end_date required date model simulation end date
nsimulations required int number of simulations to run
dt required float simulation time step in days
dynfilter_path optional path to file path to filtering text file
report_location_name optional string
name: Hawaii 
start_date: 2020-01-31 
end_date: 2020-12-31 
nsimulations: 1000
dt: 0.25
report_location_name: Hawaii

spatial_setup section

Config Item Required? Type/Format Description
base_path required path to folder base path for spatial files
setup_name required string spatial folder name
geodata required path to file relative to base_path
mobility required path to file relative to base_path
popnodes required string name of population column in geodata
nodenames required string name of location nodes column in geodata
census_year optional integer (year)
modeled_states optional list of location codes vector of locations that will be modeled
include_in_report optional boolean name of boolean column in geodata
shapefile_name optional path to file relative to base_path
shapefile optional path to file relative to base_path; identical to shapefile_name
nonUS_mobility_setup required for non-US locations path to file relative to base_path
nonUS_pop_setup required for non-US locations path to file relative to base_path
geoid_params_file required for non-US locations if running age-specific hospitalization adjustment path to file with geoid-specific relative risks of health outcomes
spatial_setup:
  base_path: data/HI
  setup_name: HI
  geodata: geodata.csv
  mobility: mobility.txt
  popnodes: population
  nodenames: geoid
  include_in_report: include_in_report
  modeled_states:
    - HI
  census_year: 2010
  shapefile: shp/counties_2010_HI.shp
  shapefile_name: shp/counties_2010_HI.shp

geodata file

  • geodata is a .csv with column headers, with at least two columns: nodenames and popnodes.
  • nodenames is the name of a column in geodata that specifies the geo IDs of an area. This column must be unique.
  • popnodes is the name of a column in geodata that specifies the population of the nodenames column.
  • include_in_report is the name of an optional column in geodata that specifies which nodenames are included in the report. Models may include more locations than simply the location of interest.

Example geodata file format

geoid,population,include_in_report
10001,1000,TRUE
20002,2000,FALSE

mobility file

The mobility file is a .csv file (it has to contains .csv as extension) with long form comma separated values. Columns have to be named ori, dest, amount with amount being the amount of individual going from place ori to place dest. Unassigned relations are assumed to be zero. ori and dest should match exactly the nodenames column in geodata.csv

Example mobility file format

ori, dest, amount
10001, 20002, 3
20002, 10001, 3

It is also possible, but NOT RECOMMENDED to specify the mobility file as a .txt with space-separated values in the shape of a matrix. This matrix is symmetric and of size K x K, with K being the number of rows in geodata:

0 3
3 0

importation section (optional)

This section is optional. It is used by the covidImportation package to import global air importation data for seeding infections into the United States.

If you wish to include it, here are the options.

Config Item Required? Type/Format Description
census_api_key required string get an API key
travel_dispersion required number how dispersed daily travel data is; default = 3.
maximum_destinations required integer Number of airports to limit importation to
dest_type required categorical location type
dest_country required string (Country) ISO3 code for country of importation. Currently only USA is supported
aggregate_to required categorical location type to aggregate to
cache_work required boolean whether to save case data
update_case_data required boolean deprecated; whether to update the case data or used saved
draw_travel_from_distribution required boolean whether to add additional stochasticity to travel data; default is FALSE
print_progress required boolean whether to print progress of importation model simulations
travelers_threshold required integer include airports with at least the travelers_threshold mean daily number of travelers
airport_cluster_distance required numeric cluster airports within airport_cluster_distance km
param_list required See section below see below

importation::param_list

Config Item Required? Type/Format Description
incub_mean_log required numeric incubation period, log mean
incub_sd_log required numeric incubation period, log standard deviation
inf_period_nohosp_mean required numeric infectious period, non-hospitalized, mean
inf_period_nohosp_sd required numeric infectious period, non-hospitalized, sd
inf_period_hosp_mean_log required numeric infectious period, hospitalized, log-normal mean
inf_period_hosp_sd_log required numeric infectious period, hospitalized, log-normal sd
p_report_source required numeric reporting probability, Hubei and elsewhere
shift_incid_days required numeric mean delay from infection to reporting of cases; default = -10
delta required numeric days per estimations period
importation:
  census_api_key: "fakeapikey00000"
  travel_dispersion: 3
  maximum_destinations: Inf
  dest_type: state
  dest_county: USA
  aggregate_to: airport
  cache_work: TRUE
  update_case_data: TRUE
  draw_travel_from_distribution: FALSE
  print_progress: FALSE
  travelers_threshold: 10000
  airport_cluster_distance: 80
  param_list:
    incub_mean_log: log(5.89)
    incub_sd_log: log(1.74)
    inf_period_nohosp_mean: 15
    inf_period_nohosp_sd: 5
    inf_period_hosp_mean_log: 1.23
    inf_period_hosp_sd_log: 0.79
    p_report_source: [0.05, 0.25]
    shift_incid_days: -10
    delta: 1

seeding section

There are two different seeding methods: 1) based on air importation (FolderDraw) and 2) based on earliest identified cases (PoissonDistributed)

FolderDraw is required if the importation section is present and requires folder_path. Otherwise, put PoissonDistributed, which requires lambda_file.

Config Item Required? Type/Format Description
method required "FolderDraw" or "PoissonDistributed"
folder_path required for FolderDraw path to folder
lambda_file required for PoissonDistributed path to file
delay_incidC optional for PoissonDistributed numeric Assumption for number of days delay between infection and case confirmation for seeding with the PoissonDistributed method. Default is 5 days.
ratio_incidC optional for PoissonDistributed numeric Assumption for ratio of infections to confirmed cases for seeding with the PoissonDistributed method. Default is 10 infections per confirmed case.
casedata_file required for non-US locations path to the data file from which the seeding setup file will be created

If using the importation section of the config and the air importation model:

seeding:
  method: FolderDraw
  folder_path: importation/HI/

or if seeding according to the earliest identified cases:

seeding:
  method: PoissonDistributed
  lambda_file: data/HI/seeding.csv
  delay_incidC: 5
  ratio_incidC: 10

seir section

Config Item Required? Type/Format Description
parameters::alpha optional fraction Transmission dampening parameter; Default is 1.0 and reasonable values for respiratory viruses range from 0.88-0.99
parameters::sigma required fraction or probability Inverse of the incubation period in days
parameters::gamma required distribution Inverse of the infectious period in days
parameters::R0s required distribution Basic reproduction number
seir:
  parameters:
    alpha: 0.5
    sigma: 1 / 5.2
    gamma:
      distribution: uniform
      low: 1 / 6
      high: 1 / 2.6
    R0s:
      distribution: uniform
      low: 3.5
      high: 4

interventions section

This section lets you specify custom intervention scenarios.

scenarios specifies which settings to run. This does not need to include all items defined in settings.

Config Item Required? Type/Format Description
scripts_path required path name
scenarios required list of strings for scenario names
settings required See section below

interventions::settings::[setting_name]

Each string in scenarios should have a corresponding named setting in settings.

Right now, there are three types of templates: Reduce, ReduceR0 and Stacked. The ReduceR0 template is a special and redundant case of the Reduce template for the the r0 parameter. The Stacked template allows you to combine multiple interventions (Reduce or Stacked) together into a single intervention scenario.

Item Required? Type/Format Description
template required "Reduce", "ReduceR0" or "Stacked"
parameter required for Reduce "alpha", "r0", "gamma", "sigma" Specify the parameter associated with the intervention reduction (alpha = mixing coefficient, r0 = basic reproductive number, gamma = inverse of the infectious period, sigma = inverse of the incubation period
period_start_date optional for Reduce, ReduceR0 date between global start_date and end_date; default is global start_date
period_end_date optional for Reduce, ReduceR0 date between global start_date and end_date; default is global end_date
value required for Reduce, ReduceR0 distribution
affected_geoids optional for Reduce, ReduceR0 list of geoids, which must be in geodata
fatigue_rate::distribution optional for Reduce, ReduceR0 distribution Indicates the rate of intervention fatigue
fatigue_frequency_days optional for Reduce, ReduceR0 numeric Number of days it takes for the NPI to reach the new value
fatigue_min optional for Reduce, ReduceR0 numeric Minimum intervention effectivness for fatiguing interventions, clip values below this minimum
fatigue_type optional for Reudce, Reduce R0 "geometric" If specified, produce a geometric fatiguing rate instead of a linear fatiguing rate
interventions:
  scenarios:
    - None
    - Scenario1
    - Scenario2
  settings:
    None:
      template: ReduceR0
      period_start_date: 2020-04-01
      period_end_date: 2020-05-15
      value:
        distribution: fixed
        value: 0
    Wuhan:
      template: Reduce
      parameter: r0
      period_start_date: 2020-04-01
      period_end_date: 2020-05-15
      value:
        distribution: uniform
        low: .81
        high: .89
    UK:
      template: ReduceR0
      period_start_date: 2020-05-16
      period_end_date: 2020-05-31
      value:
        distribution: uniform
        low: .71
        high: .83
    Scenario2:
      template: Reduce      
      parameter: r0                       # Parameter to reduce
      period_start_date: 2020-02-01
      period_end_date: 2020-05-15
      value:
        distribution: uniform            # Value of reduction as it was specified before.
        low: .6
        high: .7
      fatigue_rate:
        distribution: uniform            # Value of fatigue
        low: .1
        high: .2
      fatigue_frequency_days: 4*7         # Number of days for the NPI to reach a new_value
      fatigue_min: .2                  # 0 if unspecified, clip when the NPI reaches this value.
      #fatigue_type: geometric     # if there, produce geometric fatigue (so reduction of reduction)
    Scenario1:
      template: Stacked
      scenarios:
        - Wuhan
        - UK

hospitalization section

There are two modules for the calculation of health outcomes. One module enables location-specific health outcome risks (e.g., accounting for differences in age distribution between location), by using the risk of various health outcomes relative to a national average. A second module specifies un-adjusted, population-wide health outcome risks and requires slightly different parameters than the location-specific calculation, which is preferred.

Location-specific hospitalization calculations

A location-specific age-adjustment requires the existence of a "geoid params" file.

  • For county-level models in the US, this file is already provided in COVIDScenarioPipeline/sample_data/geoid-params.csv and you need only to set hospitalization::run_age_adjust to TRUE.
  • For models outside of the US, you will need to create this geoid params file (see the Getting Started page for Non US locations under "Calculate age-specific outcomes parameters for each district"). You will also need to specify the path to this file under spatial_setup::geoid_params_file (see above) and set hospitalization::run_age_adjust to TRUE.
Config Item Required? Type/Format Description
paths::output_path required path to folder
paths::run_age_adjust required boolean
parameters::time_hosp required Two numbers (log median, log sd) time from symptom onset to hospitalization admission (in days)
parameters::time_disch required Two numbers (log median, log sd) time from hospitalization to hospital discharge
parameters::time_ICU required Two numbers (log median, log sd) time from hospital admission to ICU admission
parameters::time_ICUdur required Two numbers (log median, log sd) time spent in ICU
parameters::time_vent required Two numbers (log median, log sd) time from ICU admission to ventilator use
parameters::time_ventdur required Two numbers (log median, log sd) time spent on ventilator
parameters::p_death required probability probability of death given infection
parameters::p_death_names required probability
parameters::p_hosp_inf required probability probability of hospitalization given infection
parameters::time_onset_death required Two numbers (log median, log sd) time from symptom onset to death
hospitalization:
  paths:
    output_path: hospitalization
    run_age_adjust: TRUE
  parameters:
    time_hosp: [log(7), 0.3]
    time_disch: [log(11.5), log(1.22)]
    time_ICU: [log(3), 0.3]
    time_ICUdur: [log(8), 0.2]
    time_ventdur: [log(7), 0.2]
    time_vent: [log(1), 0.4]
    p_death: [.0025, .005, .01]
    p_death_names: ["low","med","high"]
    p_hosp_inf: [0.025, 0.05, 0.1]
    time_onset_death: [2.84, 0.52]

Un-adjusted population-wide config options for hospitalization calculations

Config Item Required? Type/Format Description
paths::output_path required path to folder
parameters::time_hosp required Two numbers (log mean, log sd) time from symptom onset to hospitalization admission (in days)
parameters::time_disch required Two numbers (log mean, log sd) time from hospitalization to hospital discharge
parameters::time_death required Two numbers (log mean, log sd) time from hospitalization to death
parameters::time_ICU required Two numbers (log mean, log sd) time from hospital admission to ICU admission
parameters::time_ICUdur required Two numbers (log mean, log sd) time spent in ICU
parameters::time_vent required Two numbers (log mean, log sd) time from ICU admission to ventilator use
parameters::p_death required probability probability of death given infection
parameters::p_death_names required probability
parameters::p_death_rate required probability probability of death given hospitalization (single value only)
parameters::p_ICU required probability probability of ICU admission given hospitalization
parameters::p_vent required probability probability of ventilation given ICU admission
hospitalization:
  paths:
    output_path: hospitalization
  parameters:
    time_hosp: [1.23, 0.79]
    time_disch: [log(11.5), log(1.22)]
    time_death: [log(11.25), log(1.15)]
    time_ICU: [log(8.25), log(2.2)]
    time_ICUdur: [log(16), log(2.96)]
    time_vent: [log(10.5), log((10.5-8)/1.35)]
    p_death: [.0025, .005, .01]
    p_death_names: ["low","med","high"]
    p_death_rate: 0.1
    p_ICU: 0.32
    p_vent: 0.15

report section

The report section is completely optional and provides settings for making an R Markdown report. For an example of a report, see the Supplementary Material of our preprint

If you wish to include it, here are the options.

Config Item Required? Type/Format Description
data_settings::pop_year integer
plot_settings::plot_intervention boolean
formatting::scenario_labels_short list of strings; one for each scenario in interventions::scenarios
formatting::scenario_labels list of strings; one for each scenario in interventions::scenarios
formatting::scenario_colors list of strings; one for each scenario in interventions::scenarios
formatting::pdeath_labels list of strings
formatting::display_dates list of dates
formatting::display_dates2 optional list of dates a 2nd string of display dates that can optionally be supplied to specific report functions
report:
  data_settings:
    pop_year: 2018
  plot_settings:
    plot_intervention: TRUE
  formatting:
    scenario_labels_short: ["UC", "S1"]
    scenario_labels:
      - Uncontrolled
      - Scenario 1
    scenario_colors: ["#D95F02", "#1B9E77"]
    pdeath_labels: ["0.25% IFR", "0.5% IFR", "1% IFR"]
    display_dates: ["2020-04-15", "2020-05-01", "2020-05-15", "2020-06-01", "2020-06-15"]
    display_dates2: ["2020-04-15", "2020-05-15", "2020-06-15"]