raw_input - jzanetti/GradABM_ESR GitHub Wiki
1. Raw input for GradABM_ESR:
There are four types of data that we need for running GradABM_ESR:
- Synthetic population
- Diary for the synthetic population
- Training target (optional)
- Geography ancillary data (optional)
1.1 Synthetic population
Synthetic population is the dataset contains the basic information of each agent over an area. It can be created from Syspop, and the data looks like:
id | area | age | gender | ... | secondary_hospital | supermarket | restauraunt | pharmacy |
---|---|---|---|---|---|---|---|---|
0 | 110400 | 0 | male | ... | 111300_hospital_13_0 | supermarket_268,supermarket_267 | restaurant_3058,restaurant_1098,restaurant_109... | pharmacy_205,pharmacy_44 |
1 | 110400 | 0 | female | ... | 111300_hospital_13_0 | supermarket_268,supermarket_267 | restaurant_3058,restaurant_1098,restaurant_109... | pharmacy_205,pharmacy_44 |
3 | 110400 | 0 | male | ... | 111300_hospital_13_0 | supermarket_268,supermarket_267 | restaurant_3058,restaurant_1098,restaurant_109... | pharmacy_205,pharmacy_44 |
4 | 110400 | 0 | female | ... | 111300_hospital_13_0 | supermarket_268,supermarket_267 | restaurant_3058,restaurant_1098,restaurant_109... | pharmacy_205,pharmacy_44 |
1603016 | 166400 | 96 | male | ... | 166100_hospital_32_0 | supermarket_197,supermarket_183 | restaurant_215,restaurant_2899,restaurant_2056... | pharmacy_552,pharmacy_534 |
The base synthetic population usually contains attributes include: id
, area
, age
, gender
, ethnicity
, household
, social_economics
, area_work
, travel_mode_work
, company
, public_transport_trip
, school
, primary_hospital
, secondary_hospital
, supermarket
, restauraunt
, pharmacy
.
1.2 Diary for the synthetic population
The diary data is also created by Syspop, and the data looks like:
0 | 1 | 2 | 3 | 4 | 5 | 6 | ... | 18 | 19 | 20 | 21 | 22 | 23 | id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
household | household | household | household | household | restaurant | household | ... | household | supermarket | household | supermarket | household | household | 0 |
household | household | household | household | household | household | household | ... | restaurant | supermarket | household | household | restaurant | household | 1 |
household | household | household | household | household | household | restaurant | ... | household | household | household | household | household | restaurant | 2 |
household | household | household | household | household | household | household | ... | household | household | household | supermarket | household | household | 3 |
household | household | household | household | household | household | household | ... | household | household | supermarket | household | household | household | 4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
household | household | household | household | household | household | household | ... | supermarket | household | household | household | household | household | 1603016 |
Where each row represents one agent, and each column (except the column id
) represents the location of the agent at a particular hour.
1.3 Training target (optional)
The target data is only required if we want to train the model. The target dataset is usually the number of infected cases for a particular area over a period. Currently it only supports weekly data but more can be added in the future if there is a request. The data looks like
Region | Week_11 | Week_13 | Week_15 | Week_16 | Week_17 | Week_18 | Week_19 | Week_20 | Week_21 | Week_22 | ... | Week_40 | Week_41 | Week_42 | Week_43 | Week_44 | Week_45 | Week_46 | Week_47 | Week_48 | Week_49 | Week_50 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Waitemata | 1.0 | 2 | 3.0 | 0.0 | 4.0 | 6.0 | 6 | 6 | 5 | 4 | ... | 18 | 6 | 17 | 7 | 4 | 8 | 5 | 3 | 5 | 4 | 1 |
Counties Manukau | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 1.0 | 2 | 2 | 1 | 3 | ... | 38 | 53 | 36 | 25 | 14 | 28 | 13 | 12 | 9 | 4 | 5 |
Waikato | 0.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 | 0 | 0 | 0 | ... | 3 | 1 | 8 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Bay of Plenty | 0.0 | 0 | 0.0 | 2.0 | 0.0 | 3.0 | 2 | 1 | 1 | 0 | ... | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 |
Canterbury | 11.0 | 4 | 0.0 | 0.0 | 0.0 | 0.0 | 0 | 0 | 0 | 0 | ... | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Hutt Valley | NaN | None | NaN | NaN | NaN | NaN | None | None | 2 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN |
As above, if there is no data available, we can just fill the data with NaN
or None
.
1.4 Geography ancillary data (optional)
Ancillary geographical data becomes essential when extracting specific regions from the base synthetic population or the target dataset. This additional information is particularly valuable when the area types in the target data do not align with those in the synthetic population, necessitating a bridging link provided by geography ancillary data.
Using New Zealand as an illustrative example, consider a scenario where the target dataset categorizes areas as HDB region
, while the synthetic population uses the classification sa2
for regions. In such cases, geography ancillary data plays a crucial role in establishing a connection between sa2
and HDB
. This link enables the precise selection of agents residing exclusively in the specified areas, allowing focused modelling efforts.
The data looks like:
SA2 | DHB_code | DHB_name |
---|---|---|
100100 | 1 | Northland |
100200 | 1 | Northland |
100500 | 1 | Northland |
... | ... | ... |
363500 | 99 | Area outside District Health Board |
363600 | 99 | Area outside District Health Board |