Weighted Tabulation - openmpp/openmpp.github.io GitHub Wiki

Home > Model Development Topics > Weighted Tabulation

For case-based models, the weighted_tabulation option creates, for each entity, the built-in attribute entity_weight which scales the entity's contribution to tables.

Related topics

Topic contents

Introduction and Background

Some case-based microsimulation models use micro-data directly from a survey, census, or administrative source. Micro-data from such sources often has a weight associated with each observation which reflects the sampling design and maybe post-stratification or under-count adjustment. Case weights can also be useful in microsimulation models which are not based on micro-data. Such models instead generate cases synthetically from multivariate distributions. They may deliberately over-sample portions of the synthetic population of particular interest, and then adjust for that oversampling by assigning a case weight equal to the reciprocal of the oversampling factor.

OpenM++ contains optional functionality to associate a weight with each entity. That weight scales the contribution of the entity to table counts and sums. The functionality facilitates assigning the same case weight to all of the entities in a case for table coherence. This is important for models which have multiple entities in each case, e.g. ancillary family members of a core entity which may be created later in the simulation of the case. The design integrates with population scaling by computing and using the sum of case weights.

A time-based microsimulation model simulates interacting entities. It is unclear how one might validly represent an interaction of entities which have non-equal weights. Instead, for time-based models based on weighted micro-data, a micro-data record is typically cloned or sampled based on its weight to produce a starting population of entities whose weights are all equal. Such an equal-weighted population can represent a real-world population of a different size by using population scaling, rather than by assigning a weight to each entity with all weights being equal. The end result is the same, but population scaling is more efficient for memory and computation compared to identical entity weights. Also, it is not clear how to implement population scaling in a time-based model with entity weights if the model contains entities of different types, e.g. a single special Ticker entity, or multiple Dwelling, Person, and Family entities, or a fixed number of Region entities. For these reasons, entity weights are forbidden in time-based models in OpenM++. Use population scaling to make a time-based model represent a real population of a different size. See Population Size and Scaling for more information.

[back to topic contents]

Syntax and Use

By default, entities are unweighted. To activate entity weights, include the statement

options weighted_tabulation = on;

in the source code of a case-based model. A natural place to insert this statement is the module ompp_framework.ompp. If weighting is turned on in a time-based model, an error message like the following is emitted:

error : weighted tabulation is not allowed with a time-based model, use population scaling instead.

When weighting is turned on, each entity has a new built-in attribute named entity_weight, of type double. Usually model code does not assign a value directly to entity_weight. Instead, before entities are created for a case, model code sets the initial value of entity_weight for all entities in the case by calling the function set_initial_weight, as in the following contrived example:

void CaseSimulation(case_info &ci)
{
    extern void SimulateEvents(); // defined in a simulation framework module

    // Provide the weight used to initialize the entity_weight attribute for new entities
    set_initial_weight(2.0);

    // For Modgen-compatible models, use the following instead
    //SetCaseWeight(2.0);

    // Initialize the person entity
    auto prPerson = new Person();
    prPerson->Start();

    // Simulate events until there are no more.
    SimulateEvents();
}

Calling set_initial_weight before creating any entities in the case ensures that the built-in attribute entity_weight will have that same value for all entities in the case. The call to set_initial_weight also enables the calculation of the sum of case weights. That sum of weights is used to correctly scale the population to a specified size if the model uses both weights and population scaling. For that to work correctly, set_initial_weight must be called once and only once in the logic of the case, before any entities in the case are created.

If weighted tabulation is not enabled, entities have no attribute named entity_weight, and calls to set_initial_weight have no effect (but are benign).

If weighted tabulation is enabled, but set_initial_weight is not called before creating entities in the case, the entity_weight attribute will be 1.0. However, the total sum of weights used for population scaling will be incorrect because the calculation depends internally on the call to set_initial_weight. Ensure that model code calls set_initial_weight once and only once before creating entities in the case.

[back to topic contents]

Limitations

Weighted tabulation works for table statistics based on counts and sums. It does not work yet for ordinal statistics such as the median or the gini coefficient. Such statistics will be computed ignoring weights, i.e. as though all weights are 1.0. If a table uses an ordinal statistic and weighted_tabulation is on, the OpenM++ compiler will issue a warning. For example, the table

table Person DurationOfLife //EN Duration of Life
{
    {
        value_in(alive),                //EN Population size
        min_value_out(duration()),      //EN Minimum duration of life decimals=4
        max_value_out(duration()),      //EN Maximum duration of life decimals=4
        duration() / value_in(alive),   //EN Life expectancy decimals=4
        P50(value_out(duration()))      //EN Median duration of life decimals=4
    }    //EN Demographic characteristics
};

would emit a warning like

warning : weighting is not supported for ordinal statistic 'P50' in table 'DurationOfLife' ...

[back to topic contents]

Modgen issues

case-based models (Modgen)

Modgen implements similar case weighting functionality and weight-based population scaling to OpenM++ using a function named SetCaseWeight. X-compatible models can call SetCaseWeight instead of set_initial_weight as in the commented statement in the previous example. The OpenM++ framework supplies versions of SetCaseWeight which call set_initial_weight internally.

OpenM++ functions intrinsically at the sub-sample/replicate/member level, so the notion of a distinct total weight and sub-sample weight does not apply in OpenM++.

time-based models (Modgen)

Modgen does not implement population scaling for time-based models. To work around this limitation, model developers have called the Modgen function Set_actor_weight in actor Start functions to scale results to represent a larger population. Consider a time-based model which includes two exogenous parameters, StartingPopulationRealSize for the size of the true real-world population which is represented by the model, and StartingPopulationSize for the size (number of entities) of the synthetic starting population in the model. The Modgen approach might look like this:

void Person::Start()
{
    // Initialize all attributes (OpenM++).
    initialize_attributes();

    // The following function calls implement population scaling for Modgen,
    // using identical weights for each Person entity in the simulation.
    // These calls do nothing in OpenM++.
    // OpenM++ can implement population scaling directly for time-based models.
    
    double dWeight = (double) StartingPopulationRealSize / (double) StartingPopulationSize;
    Set_actor_weight( dWeight );
    Set_actor_subsample_weight( dWeight );
...

The OpenM++ framework includes do-nothing versions of the Modgen functions Set_actor_weight and Set_actor_subsample_weight so this same code will build without error in OpenM++.

To perform the identical population scaling directly in the OpenM++ version of the model (without weights), include the following statement in ompp_framework.ompp:

use "time_based/time_based_scaling_exogenous.ompp";

That use module integrates with the OpenM++ framework to scale table counts and sums by the factor

(double) StartingPopulationRealSize / (double) StartingPopulationSize

using the exogenous parameters StartingPopulationRealSize and StartingPopulationSize.

These two parameters are already declared in the use module time_based_scaling_exogenous.ompp in OpenM++. Declare them in the Modgen version using a Modgen-only source code file name, for example modgen_PopulationSize.mpp, with content

parameters
{
    //EN Simulation population size
    int StartingPopulationSize;

    //EN True population size
    double StartingPopulationRealSize;
};

and then make the values of these two parameters available to both Modgen and OpenM++ by placing them in a file processed by both, for example PopulationSize.dat with contents like

parameters
{
    //EN Simulation population size
    int StartingPopulationSize = 25000;

    //EN True population size
    double StartingPopulationRealSize = 10000000;
};

For more about the visibility of model source code and parameter value files in OpenM++ and Modgen, see Model Code. For more about population scaling in OpenM++, see Population Size and Scaling.

[back to topic contents]

⚠️ **GitHub.com Fallback** ⚠️