raw_input - jzanetti/GradABM_ESR GitHub Wiki

1. Raw input for GradABM_ESR:

There are four types of data that we need for running GradABM_ESR:

Synthetic population
Diary for the synthetic population
Training target (optional)
Geography ancillary data (optional)

1.1 Synthetic population

Synthetic population is the dataset contains the basic information of each agent over an area. It can be created from Syspop, and the data looks like:

id	area	age	gender	...	secondary_hospital	supermarket	restauraunt	pharmacy
0	110400	0	male	...	111300_hospital_13_0	supermarket_268,supermarket_267	restaurant_3058,restaurant_1098,restaurant_109...	pharmacy_205,pharmacy_44
1	110400	0	female	...	111300_hospital_13_0	supermarket_268,supermarket_267	restaurant_3058,restaurant_1098,restaurant_109...	pharmacy_205,pharmacy_44
3	110400	0	male	...	111300_hospital_13_0	supermarket_268,supermarket_267	restaurant_3058,restaurant_1098,restaurant_109...	pharmacy_205,pharmacy_44
4	110400	0	female	...	111300_hospital_13_0	supermarket_268,supermarket_267	restaurant_3058,restaurant_1098,restaurant_109...	pharmacy_205,pharmacy_44
1603016	166400	96	male	...	166100_hospital_32_0	supermarket_197,supermarket_183	restaurant_215,restaurant_2899,restaurant_2056...	pharmacy_552,pharmacy_534

The base synthetic population usually contains attributes include: id, area, age, gender, ethnicity, household, social_economics, area_work, travel_mode_work, company, public_transport_trip, school, primary_hospital, secondary_hospital, supermarket, restauraunt, pharmacy.

1.2 Diary for the synthetic population

The diary data is also created by Syspop, and the data looks like:

0	1	2	3	4	5	6	...	18	19	20	21	22	23	id
household	household	household	household	household	restaurant	household	...	household	supermarket	household	supermarket	household	household	0
household	household	household	household	household	household	household	...	restaurant	supermarket	household	household	restaurant	household	1
household	household	household	household	household	household	restaurant	...	household	household	household	household	household	restaurant	2
household	household	household	household	household	household	household	...	household	household	household	supermarket	household	household	3
household	household	household	household	household	household	household	...	household	household	supermarket	household	household	household	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
household	household	household	household	household	household	household	...	supermarket	household	household	household	household	household	1603016

Where each row represents one agent, and each column (except the column id) represents the location of the agent at a particular hour.

1.3 Training target (optional)

The target data is only required if we want to train the model. The target dataset is usually the number of infected cases for a particular area over a period. Currently it only supports weekly data but more can be added in the future if there is a request. The data looks like

Region	Week_11	Week_13	Week_15	Week_16	Week_17	Week_18	Week_19	Week_20	Week_21	Week_22	...	Week_40	Week_41	Week_42	Week_43	Week_44	Week_45	Week_46	Week_47	Week_48	Week_49	Week_50
Waitemata	1.0	2	3.0	0.0	4.0	6.0	6	6	5	4	...	18	6	17	7	4	8	5	3	5	4	1
Counties Manukau	0.0	0	0.0	0.0	0.0	1.0	2	2	1	3	...	38	53	36	25	14	28	13	12	9	4	5
Waikato	0.0	0	0.0	0.0	0.0	0.0	1	0	0	0	...	3	1	8	0	0	0	1	0	0	0	0
Bay of Plenty	0.0	0	0.0	2.0	0.0	3.0	2	1	1	0	...	1	1	0	1	1	0	0	1	0	1	0
Canterbury	11.0	4	0.0	0.0	0.0	0.0	0	0	0	0	...	1	0	0	0	0	0	0	0	0	0	0
Hutt Valley	NaN	None	NaN	NaN	NaN	NaN	None	None	2	1	...	0	0	0	0	0	0	0	0	0	0	NaN

As above, if there is no data available, we can just fill the data with NaN or None.

1.4 Geography ancillary data (optional)

Ancillary geographical data becomes essential when extracting specific regions from the base synthetic population or the target dataset. This additional information is particularly valuable when the area types in the target data do not align with those in the synthetic population, necessitating a bridging link provided by geography ancillary data.

Using New Zealand as an illustrative example, consider a scenario where the target dataset categorizes areas as HDB region, while the synthetic population uses the classification sa2 for regions. In such cases, geography ancillary data plays a crucial role in establishing a connection between sa2 and HDB. This link enables the precise selection of agents residing exclusively in the specified areas, allowing focused modelling efforts.

The data looks like:

SA2	DHB_code	DHB_name
100100	1	Northland
100200	1	Northland
100500	1	Northland
...	...	...
363500	99	Area outside District Health Board
363600	99	Area outside District Health Board