Galapagos v1.0 - midas-isg/data-format-repository GitHub Wiki
Galapagos reads scenarios and realizations that are written in the Galapagos output format. The details of this specific output format are defined below.
Understanding the file format
The output file is a CSV file that has the following format:
simulator_time | variable1 | variable2 | count |
---|---|---|---|
1 | variable1value1 | variable2value1 | # |
1 | variable1value1 | variable2value2 | # |
1 | variable1value2 | variable2value1 | # |
1 | variable1value2 | variable2value2 | # |
The only required columns in Galapagos output format are “simulator_time” and “count”. One of the strengths of the Galapagos output format is that it is flexible enough to contain almost any other output variables as long as the required columns are specified.
Reporting counts
Here is an example using some fictional data:
simulator_time | sex | age_range_category_label | Infection_state | count |
---|---|---|---|---|
1 | M | 0-50 | susceptible | 50 |
1 | M | 0-50 | infectious | 10 |
1 | M | 0-50 | recovered | 0 |
1 | M | 50-100 | susceptible | 40 |
1 | M | 50-100 | infectious | 20 |
1 | M | 50-100 | recovered | 0 |
1 | F | 0-50 | susceptible | 59 |
1 | F | 0-50 | infectious | 13 |
1 | F | 0-50 | recovered | 0 |
1 | F | 50-100 | susceptible | 70 |
1 | F | 50-100 | infectious | 12 |
1 | F | 50-100 | recovered | 0 |
Looking at the example above, we can see that the simulator output is tracking three states of each agent in the population of the model: sex (male or female), the age (less than 50 or greater than or equal to 50), and infection state (susceptible, infectious, recovered). For each timestep, the count of the number of agents with those characteristics is reported. Using this method, we have a complete accounting of every person at each time point in the model.
On variable names and values
While the output format is very flexible and can contain basically any output variables and values that you’d like, we recommend that you use the standard variable and value names defined in the following specification: https://docs.google.com/spreadsheets/d/1ulyjF_pKVlsqAn97t3LDunEJ0_UEFhoj_7zMDZ2pIAs/edit?usp=sharing
The reason we recommending using the standards proposed in that file is that it will enable better comparisons across simulation models. That is, if everyone uses the same output variables and values, it’s going to make comparing output a breeze.
The standards proposed in the above file are derived from the Apollo XSD, which can be found here: https://raw.githubusercontent.com/ApolloDev/apollo-xsd-and-types/master/src/main/resources/apollo_types_3.1.0.xsd
Output granularity
If your simulator output file is tracking all variables in a complicated model, the output files will end up being VERY large. Galapagos has capabilities to handle large file sizes, but it is probably a good idea to only include the information that you want to track. It is up to the simulator developer to decide the level of output that they will provide.