Galapagos v1.0 - midas-isg/data-format-repository GitHub Wiki

Galapagos reads scenarios and realizations that are written in the Galapagos output format. The details of this specific output format are defined below.

Understanding the file format

The output file is a CSV file that has the following format:

simulator_time variable1 variable2 count
1 variable1value1 variable2value1 #
1 variable1value1 variable2value2 #
1 variable1value2 variable2value1 #
1 variable1value2 variable2value2 #

The only required columns in Galapagos output format are “simulator_time” and “count”. One of the strengths of the Galapagos output format is that it is flexible enough to contain almost any other output variables as long as the required columns are specified.

Reporting counts

Here is an example using some fictional data:

simulator_time sex age_range_category_label Infection_state count
1 M 0-50 susceptible 50
1 M 0-50 infectious 10
1 M 0-50 recovered 0
1 M 50-100 susceptible 40
1 M 50-100 infectious 20
1 M 50-100 recovered 0
1 F 0-50 susceptible 59
1 F 0-50 infectious 13
1 F 0-50 recovered 0
1 F 50-100 susceptible 70
1 F 50-100 infectious 12
1 F 50-100 recovered 0

Looking at the example above, we can see that the simulator output is tracking three states of each agent in the population of the model: sex (male or female), the age (less than 50 or greater than or equal to 50), and infection state (susceptible, infectious, recovered). For each timestep, the count of the number of agents with those characteristics is reported. Using this method, we have a complete accounting of every person at each time point in the model.

On variable names and values

While the output format is very flexible and can contain basically any output variables and values that you’d like, we recommend that you use the standard variable and value names defined in the following specification: https://docs.google.com/spreadsheets/d/1ulyjF_pKVlsqAn97t3LDunEJ0_UEFhoj_7zMDZ2pIAs/edit?usp=sharing

The reason we recommending using the standards proposed in that file is that it will enable better comparisons across simulation models. That is, if everyone uses the same output variables and values, it’s going to make comparing output a breeze.

The standards proposed in the above file are derived from the Apollo XSD, which can be found here: https://raw.githubusercontent.com/ApolloDev/apollo-xsd-and-types/master/src/main/resources/apollo_types_3.1.0.xsd

Output granularity

If your simulator output file is tracking all variables in a complicated model, the output files will end up being VERY large. Galapagos has capabilities to handle large file sizes, but it is probably a good idea to only include the information that you want to track. It is up to the simulator developer to decide the level of output that they will provide.