Guide to Dummy Data - dime-worldbank/Disease-Modelling-SSA GitHub Wiki

Below is a guide to each of the files in the folder Protecs/src/main/resources

Overall input

  1. params.txt This file contains the names of all the input files and their corresponding variable names in the model. This means that instead of having to change the names of input files within the model itself, they can be changed here, once and then the reference remains the same.

Population data

  1. census_dummy.csv This file is a synthetic version that replicates the variables used in the model from the 2012 Zimbabwe National Census. It contains individual level sythetic data, so for each person (agent) we have their age, sex, a household identifier (to give household size at the population level), their district of residence and economic status or occupation. It also has a dummy identifier for whether the individual is a school goer or not. (NB. The fields economic_activity_location_id and manufacturing_workers are not used and can be removed.)

Infection parameters/Case data

  1. covasim_infect_transitions.txt This text file represents the relative infection probability of different age groups. The variables are as follows r_sus = rate of suspected cases p_symp = probability of symptomatic cases p_sev = probability of severe cases p_cri = probability of critical cases p_dea = probability of death

... for each respective age group. The data was taken from the model built by Kerr et al. and explained in the paper "Covasim: an agent-based model of COVID-19 dynamics and interventions" http://medrxiv.org/lookup/doi/10.1101/2020.05.10.20097469

  1. line_list_dummy.txt The original line list was taken from the Ministry of Health and pertained to the cases as they were detected at district level at the start of the pandemic. This dummy data indicates that there was one case in district 2 (Harare) where the epidemic started.

Mobility data

Regular movement (pre-lockdown)

  1. dailyDistrictTransitionProb_preLockdown.csv The probability of movement from each district in the model (districts numbered d_1 to d_60) to each other district pre-lockdowns being applied pre-Feb 2020. Here in the dummy data, only 2 districts are given. The probability of movement of the original data is calculated based on an Origin Destination Matrix received from the World Bank, calculated from the Call Record Data of a mobile phone provider in Zimbabwe.

  2. econStatusMovementProb_otherday.txt The probability of each economic status/occupational category to leave their home on a weekend day. At the moment these parameters are assumption based.

  3. econStatusMovementProb_weekday.txt The probability of each economic status/occupational category to leave their home on a weekday (Monday to Friday). At the moment these parameters are assumption based.

Restricted movement (lockdown)

  1. intra_district_decreased_mobility_rates.csv The difference in the probability of movement from each district in the model (districts numbered d_1 to d_60) to each other district, between before lockdown restrictions were applied in Feb 2020 and after, between March - June 2020. Here in the dummy data, only 2 districts are given. The probability of movement of the original data is calculated based on an Origin Destination Matrix received from the World Bank, calculated from the Call Record Data of a mobile phone provider in Zimbabwe.

  2. lockdownChangelist.txt This file indicates the date that lockdown restrictions are applied, and causes the model to stop using the file dailyDistrictTransitionProb_preLockdown.csv and instead use the movement probabilities that were calculated for the Lockdown period indicated by intra_district_decreased_mobility_rates.csv.

  3. dailyDistrictTransitionProb_Lockdown.csv (this needs to be added to the files in the folder) The probability of movement from each district in the model (districts numbered d_1 to d_60) to each other district during the lockdowns that were applied in March, April and May 2020. Here in the dummy data, only 2 districts are given. The probability of movement of the original data is calculated based on an Origin Destination Matrix received from the World Bank, calculated from the Call Record Data of a mobile phone provider in Zimbabwe.

Social Contact data

  1. interactionMatrixByStatus.txt This matrix provides assumed probability of interactions between each of the economic status/occupations in the model. (*NB. Sarah not sure if we are currently using this?)

  2. numWeeklyInteractionsByStatus.txt This file indicates the number of interactions that each economic status has on average on a weekly basis. This data is based on information from Round 9 of the Manicaland Social Mixing Study, collected in 2023.

The bubble size was also calculated from the same survey in answer to the questions for work: "At the busiest point during your last working day, how many people did you share an indoor space with?", or for school "How big is your biggest class at your educational facility when in session?", or for community spaces "When you were in the community yesterday e.g. shop, church, bar, restaurant, or other community venue, what was the maximum number of people that were in that venue when you were there?" or for transport "If you took any transportation service yesterday, what was the maximum number of people on the busiest transport service you took?

Testing specific model

  1. most_travelled_from_testing.txt This file indicates the districts with the highest mobility in the model. *N.B. Sarah I don't think this is used in the model? Unless Robbie used it?

  2. num_tests_by_date.txt This file is the number of tests conducted in Zimbabwe each day from May 13th 2020, https://ourworldindata.org/coronavirus