raw_input - jzanetti/GradABM_ESR GitHub Wiki

1. Raw input for GradABM_ESR:

There are four types of data that we need for running GradABM_ESR:

  • Synthetic population
  • Diary for the synthetic population
  • Training target (optional)
  • Geography ancillary data (optional)

1.1 Synthetic population

Synthetic population is the dataset contains the basic information of each agent over an area. It can be created from Syspop, and the data looks like:

id area age gender ... secondary_hospital supermarket restauraunt pharmacy
0 110400 0 male ... 111300_hospital_13_0 supermarket_268,supermarket_267 restaurant_3058,restaurant_1098,restaurant_109... pharmacy_205,pharmacy_44
1 110400 0 female ... 111300_hospital_13_0 supermarket_268,supermarket_267 restaurant_3058,restaurant_1098,restaurant_109... pharmacy_205,pharmacy_44
3 110400 0 male ... 111300_hospital_13_0 supermarket_268,supermarket_267 restaurant_3058,restaurant_1098,restaurant_109... pharmacy_205,pharmacy_44
4 110400 0 female ... 111300_hospital_13_0 supermarket_268,supermarket_267 restaurant_3058,restaurant_1098,restaurant_109... pharmacy_205,pharmacy_44
1603016 166400 96 male ... 166100_hospital_32_0 supermarket_197,supermarket_183 restaurant_215,restaurant_2899,restaurant_2056... pharmacy_552,pharmacy_534

The base synthetic population usually contains attributes include: id, area, age, gender, ethnicity, household, social_economics, area_work, travel_mode_work, company, public_transport_trip, school, primary_hospital, secondary_hospital, supermarket, restauraunt, pharmacy.

1.2 Diary for the synthetic population

The diary data is also created by Syspop, and the data looks like:

0 1 2 3 4 5 6 ... 18 19 20 21 22 23 id
household household household household household restaurant household ... household supermarket household supermarket household household 0
household household household household household household household ... restaurant supermarket household household restaurant household 1
household household household household household household restaurant ... household household household household household restaurant 2
household household household household household household household ... household household household supermarket household household 3
household household household household household household household ... household household supermarket household household household 4
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
household household household household household household household ... supermarket household household household household household 1603016

Where each row represents one agent, and each column (except the column id) represents the location of the agent at a particular hour.

1.3 Training target (optional)

The target data is only required if we want to train the model. The target dataset is usually the number of infected cases for a particular area over a period. Currently it only supports weekly data but more can be added in the future if there is a request. The data looks like

Region Week_11 Week_13 Week_15 Week_16 Week_17 Week_18 Week_19 Week_20 Week_21 Week_22 ... Week_40 Week_41 Week_42 Week_43 Week_44 Week_45 Week_46 Week_47 Week_48 Week_49 Week_50
Waitemata 1.0 2 3.0 0.0 4.0 6.0 6 6 5 4 ... 18 6 17 7 4 8 5 3 5 4 1
Counties Manukau 0.0 0 0.0 0.0 0.0 1.0 2 2 1 3 ... 38 53 36 25 14 28 13 12 9 4 5
Waikato 0.0 0 0.0 0.0 0.0 0.0 1 0 0 0 ... 3 1 8 0 0 0 1 0 0 0 0
Bay of Plenty 0.0 0 0.0 2.0 0.0 3.0 2 1 1 0 ... 1 1 0 1 1 0 0 1 0 1 0
Canterbury 11.0 4 0.0 0.0 0.0 0.0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 0 0
Hutt Valley NaN None NaN NaN NaN NaN None None 2 1 ... 0 0 0 0 0 0 0 0 0 0 NaN

As above, if there is no data available, we can just fill the data with NaN or None.

1.4 Geography ancillary data (optional)

Ancillary geographical data becomes essential when extracting specific regions from the base synthetic population or the target dataset. This additional information is particularly valuable when the area types in the target data do not align with those in the synthetic population, necessitating a bridging link provided by geography ancillary data.

Using New Zealand as an illustrative example, consider a scenario where the target dataset categorizes areas as HDB region, while the synthetic population uses the classification sa2 for regions. In such cases, geography ancillary data plays a crucial role in establishing a connection between sa2 and HDB. This link enables the precise selection of agents residing exclusively in the specified areas, allowing focused modelling efforts.

The data looks like:

SA2 DHB_code DHB_name
100100 1 Northland
100200 1 Northland
100500 1 Northland
... ... ...
363500 99 Area outside District Health Board
363600 99 Area outside District Health Board