Fundamental entities of a LASER model, their properties, dynamics, and interactions

@KevinMcCarthyAtIDM

Spatial Nodes

Why needed:

Space is special. It’s right in the name of the model! But why not wrap it up into some generic “mixing group properties” as listed below? For most of the diseases we’ll want to model, space is fundamental to the transmission process than those groups – to wit, I can breathe on people of different ages, SES, political leanings, language groups, … but I can’t breathe on somebody who’s in Seattle while I’m in Boston. Spatial boundaries also define political and administrative authority. Interventions might be targeted to people based on age, SES, other properties, but are always targeted by spatial extent.

Number:

N_nodes ~ 10⁰…10⁶

Properties:

In the most minimal sense, possibly none other than number. Most use cases will require some of location, population, birth rate, mortality rate, routine immunization coverage. Membership in target group of an intervention (I think more sensible to think of this as a property of an intervention, though). A degree of user flexibility in defining new node properties and associated dynamics is important.

Dynamics:

Dynamics of the properties of spatial nodes are either static (location) or generally slowly changing (birth/mortality rate, RI coverage). The number of nodes will not need to change over the course of simulation in most use cases.

Performance concerns:

Low. The properties of nodes are generally either static (location) or slowly changing. There are almost always fewer nodes than agents or edges. I’ve never in practice run more than about 3k, though I could imagine running into the millions in “node-as-agent” type configurations.

Spatial Connections

Why needed:

The emergent dynamics of interest are those that arise from the coupling of the internal dynamics of the set of spatial nodes. The set of spatial nodes (vertices) equipped with a set of connections (edges) is, of course, a graph. However, it is useful to consider the edges separately from the spatial nodes because they have different internal properties, scaling properties, and performance concerns. We may also want to have more than one graph for the same set of nodes (e.g., one network for round-trips and another for long-term relocation)

Number:

Generally, $N_{edges} \approx N_{nodes}^2$

Properties:

Vertices
Weight

Can likely restrict ourselves to the types of graphs where edges may only connect 2 vertices, but maybe worth a look if there are reasons to be more general.

Dynamics:

In most use cases, none. There are cases where we would want weights to change over time, but slowly, and cases where we may want to add/delete edges, but infrequently.

Performance concerns:

High. The size of the edge set can get very large for moderately-sized node sets. E.g., there are roughly 20k incorporated towns in the US, not a lot to represent, but there are 400M possible edges between them. Efficient algorithms for computations on graphs exist but are generally optimized for a subset of graph type/properties: efficient representation of and computation on a sparse, directed graph may (probably will) scale horribly to a dense multigraph, and vice versa.

Agents

Why needed:

Somebody has to get the disease, at least until we eradicate them all and we’re out of work. More generally, in a choice between modeling based on agents vs. based on compartments/cohorts, we are guided by scaling: ABMs will scale with number of agents; CMs will scale by the number of available states. The multiplicative scaling of independent dimensions of heterogeneity, and unequal distribution of agents into states, can quite easily lead us to models in which the number of possible states exceeds the number of agents, such there will be either a large fraction of totally unoccupied cohorts, or low average occupancy number in the compartments. We opt here to take the more “fixed” scaling of ABMs to allow user flexibility in designing transmission models with multiple dimensions of heterogeneity.

Number:

N_agents ~ 10⁸ (10⁹?)

Properties:

User flexibility in defining properties and dynamics of those is critical. Infection and immune status, age, home location. Possibly current location depending on spatial connectivity model. States - definitely: maternally protected, susceptible, exposed, infectious, recovered. possible: carrier, partially immune, partially infectious, severe vs. mild outcome. Non-infection/immune possible states: fetal (potentially useful to model maternal interventions), dead.

Dynamics:

Aging, transmission, infection, development of immunity, waning of immunity, migration, birth, death, mixing group membership.

Performance concerns:

High! At scales of 10⁸ agents or more, it’s quite easy to overflow memory or tank performance, and operations over arrays of this size are expensive. We will have more agents than any other entity, except possibly edges, and the dynamics of the agents are faster than those of any other entity.

Individual properties/Mixing groups

Why needed:

We want our agents to be light, but at a certain level of “lightness” there are few advantages that agent-based models offer over discrete stochastic compartmental models. In a sense, agents have to be a little heavy to justify being agents at all. Further, there are important research questions that require some sub-spatial structure: campaigns that reach a random 80% of people each time are more effective than those that reach the same 80% over and over; in order to model the latter, we need some means of labeling the “reached” and “unreached” agents. Case-based surveillance data is quite generally under-reported, and often our most powerful tool for dealing with this in model calibration is finding non-proportional allocation of cases to sub-groupings of people – e.g., age at infection distributions; cases by Vx status; probability of vaccination in SIA given previous vaccination in RI.

Number:

Generally, maybe 1-3 “group types”, with 2-20 groups within each.

Properties:

“group type”
number of groups
transitions between groups
mixing matrix

We should discuss further but I would propose two simplifying decisions on the structure of properties/groups below.

Dynamics:

Few. Most of the dynamics on the “agent” side in terms of being assigned to a group and transitioning from one group to another. But the number of group types, number of groups of each type, the rules for transitioning between groups, and mixing matrices will generally be static over the course of a simulation.

Performance concerns:

Potentially? I know the development of HINT ran into a lot of questions. Labels also have multiple uses – one could just use them to target interventions, vs. implementing heterogeneous transmission. I’d propose a couple of simplifying conditions on the groups that I think should make it easier to guarantee decent performance.

Within each “group type”, an agent can only belong to one group. This is obvious for something like age, but it’s easy to make a group type where individuals can belong to more than one group (e.g., I can be a student, worker, both, or neither). The latter case can always be mapped onto 1-hot encodings over multiple groups (student Y/N, worker Y/N) instead. Forcing exclusive membership is conceptually easier and probably enables more “obvious” use of efficient vectorized operations.
The different group types (and space) are separable. While group membership distribution may vary from place to place, I don’t think we need the extra headache of allowing different dynamics or mixing matrices for example, having different age-mixing matrices in different nodes or for different SE groups. Succinctly, the high-dimensional transmission tensor $M_{sag}^{s’a’g’}$, defining transmission from agents in spatial/age/generic group “s, a, g” to those in “s’, a’, g’” is just the product of $M_s^{s’}M_a^{a’}M_g^{g’}$. This is not generically true! But that tensor has $N_{nodes}^2 \times N_{agegroups}^2 \times N_{genericgroups}^2$ elements, and the latter has $N_{nodes}^2 + N_{agegroups}^2 + N_{genericgroups}^2$ elements. The former is much harder to constrain with data, represent in memory, and feed into the eventual sumprods that have to be done for transmission.

Cohorts

Why needed:

Not strictly needed, but for performance reasons it is worth considering mixed agent/cohort-based modeling. As stated above, we generally opt for agent-based modeling when the dimensionality of heterogeneity we want to represent would lead to a predominance of empty compartments or compartments with low average occupancy. But that doesn’t mean we can’t be clever about taking advantage of structure: some compartments will always have very high occupancy, some have limited participation in dynamics, some are absorbing states that agents will never leave. For example, if we model RI not as random, but with a group of “RI accessible” and “RI inaccessible” agents, then we know that the cohort of “recovered, RI accessible, older than the age at vaccination” agents should be highly enriched.

Number:

Unclear! I’m not sure this space of mixed agent-cohort modeling is particularly well explored. We wouldn’t want it to explode, rather I expect we would have a limited number of “special” cohorts that offer huge payoffs in performance.

Properties:

Some group IDs/indices to identify what agent properties map to this cohort
Occupancy number

Dynamics:

This is a fun space to explore. Agent-cohort interactions; cohort-cohort interactions; cohort-agent interactions. How do agents get removed from agent pop and assigned to cohorts, how do cohorts spawn agents back into the agent pool when appropriate?

Performance concerns:

Interventions

Why needed:

As interesting as open-loop dynamics of pathogens and hosts is, we’re in the business of intervening in the system to control and eliminate the pathogen, so we need some interventions.

Number:

I guess it depends on how we choose to represent them.

Properties:

intervention type
targeting
coverage
timing (start/end/duration?)

Dynamics:

Some interventions (PIRIs, SIAs) are “acute” – they go, they instantaneously or over some short duration change the state of other entities in the simulation, and they go away. These should have no internal dynamics. Others (RI) are persistent and potentially change over time.

Performance concerns:

Real question – should RI be an intervention or a node property?

Pathogens

Why needed:

Potentially not? If there’s only one circulating pathogen, and it doesn’t have properties that change over time, it’s existence can really be implicit to the model. Genotypic evolution can probably be modeled ex post if we can output a transmission tree. So, decision point – will we need to model co-transmission of multiple pathogens, or will we want to model pathogens that feature phenotypic evolution (e.g., changing R₀, vaccine evasion, etc.)?

Number:

1 usually. 2-3 in obvious use cases (measles/rubella co-transmission, WPV1/2/3).

Properties:

Infectivity
Immunogenicity
Mortality

Fundamental LASER Model Entities - laser-base/laser-core GitHub Wiki

Fundamental entities of a LASER model, their properties, dynamics, and interactions

Spatial Nodes

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

Spatial Connections

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

Agents

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

Individual properties/Mixing groups

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

Cohorts

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

Interventions

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

Pathogens

Why needed:

Number:

Properties:

Dynamics:

Performance concerns:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️