Screened Tables - openmpp/openmpp.github.io GitHub Wiki

Home > Model Development Topics > Screened Tables

Screened tables let model developers implement Statistical Disclosure Control (SDC) policies at the cell level of entity tables. Screened tables can also be used for output quality management, for example to suppress statistically unreliable cells or to round values to a given number of digits of precision to avoid the impression of spurious accuracy in model outputs.

Related topics

Topic contents

Introduction

Please see the following references for an overview of Statistical Disclosure Control:

[back to topic contents]

Overview

Screened Tables is an optional capability that allows a model developer to examine and modify table cell values, based on the values themselves and their provenance. It is activated by using a keyword in the entity table declaration, and by providing a C++ function which implements screening at the cell level. That C++ screening function is not necessarily complex, and multiple function arguments are supplied to ease the task. This wiki topic also includes a suite of working examples.

The examples are based on the model SM1 in the OpenM++ distribution at OM_ROOT/models/SM1. SM1 is a simple model which adds several attributes to the NewCaseBased model to provide raw material for the examples in this topic.

The example table declarations in this topic can be copy/pasted directly into the module SM1/code/ScreenedTables.ompp, and the corresponding example screening code can be copy/pasted directly into the body of the function TransformScreened1 in SM1/code/ScreeningCode.ompp. The upper right corner of each code block in this wiki topic should display a browser pop-up which can be clicked to copy the entire block to the system clipboard for subsequent pasting into SM1, for in-depth exploration.

To provide flexibility and facilitate comparison of different methods a model can use up to four different screening functions. The examples in this topic use only a single method, method #1.

[back to topic contents]

Syntax and simple example

To activate one of the four screening methods for a given entity table, include exactly one of the four keywords screened1, screened2, screened3, or screened4 in the table's properties, and supply a definition for the corresponding C++ transformation function. For example, the following table is declared in SM1/code/ScreenedTables.ompp:

table snapshot screened1 Person ExampleTable
[trigger_entrances(integer_age, 50)]
{
    {
        unit,           //EN Persons
        mean(earnings), //EN Average earnings
        P50(earnings)   //EN Median earnings
    }
};

The table property screened1 specifies that it will be screened using screening method #1, which uses the C++ transformation function TransformScreened1.

That means that each of the three accumulators (statistics) in the single cell of EarningsAt50 will be subject to modification by the developer-supplied C++ function TransformScreened1. This function is called automatically when the simulation of each sub/member/replicate completes, for each value in this table.

The developer-supplied function TransformScreened1 takes 10 arguments (described below), whose values are supplied by the framework. The first argument in_value is a value in the table before modification. The function returns the possibly modified value. Here's an example function body of TransformScreened1 defined in the model code module SM1/code/ScreeningCode.ompp:

double TransformScreened1(
    const double in_value,
...
)
{
    /// transformed value, initialized to quiet NaN (shows as empty)
    double out_value = UNDEF_VALUE;

    // notional example of transformation (round to 100's)
    out_value = 100.0 * std::round(in_value / 100.0);

    return out_value;
}

The C++ code in this example rounds all table values to 100's.

Here's a comparison of the table values for both unscreened and screened versions of the table:

Quantity Label unscreened screened
unit Persons 2538 2500
mean(earnings) Average earnings 99203.9 99200
P50(earnings) Median earnings 101191 101200

To examine the transformation process in action, build the Debug version of SM1 and set breakpoints in the function TransformScreened1 in the module ScreeningCode.ompp.

[back to topic contents]

Remarks and limitations

  • only at sub level, indirectly at run level
  • only at cell level (including marginal cells if present) - no checks for residual disclosure
  • only at accumulator/statistic level, indirectly at expression/measure level.
  • only for entity tables, indirectly to derived tables.
  • use the statistic mean to screen averages, i.e. mean(x) instead of sum(x)/unit as in older model code.
  • to easily deactivate screening without editing any table declarations, insert the single line return in_value; as the first line in the screening function.
  • in use cases where model code is released but the screening algorithm is confidential, the screening function can be built separately and supplied as an object file or library to build the model. Alternatively, an alternative module containing a 'do nothing' screening function can be distributed, if the model code is distributed with a non-confidential synthetic version of the microdata file(s).

[back to topic contents]

Extrema collections

The screening function arguments smallest and largest are collections containing the highest M and lowest M observations in the table cell containing the value being screened, where M is a configurable constant for each of the four screening methods. These collections allow implementing 'dominance' rules for cell suppression, e.g. suppress a cell total if the top 3 observations in the cell account for more than 70% of the total. To set an appropriate value for M, use the corresponding option screened[1-4]_extremas_size in model code. For example, the following statement retains the highest 3 and lowest 3 observations in the smallest and largest extrema collections for method #1:

options screened1_extremas_size = 3;

Extrema collections might be smaller than the specified size if there are fewer than that many observations in the cell.

Extrema collections can contain the special floating point values +inf and -inf. They never contain the special floating point value NaN, because a NaN increment is treated as a run-time model error by OpenM++.

Only certain statistics are considered eligible for extrema collections. They are

sum
minimum
maximum
mean

If a quantity is ineligible the associated extrema collections will be empty.

Reducing the size of extrema collections reduces memory and processing requirements.

M is set to 0 by default for all four screening methods, to avoid the computational and memory overhead of maintaining these collections for each cell of each screened table unless needed by the screening method.

[back to topic contents]

Screening function arguments

This subtopic contains several reference sections which are listed and linked here for convenience:

This subtopic describes the arguments to the developer-supplied screening function(s).

It is essential that the definition of the screening function(s) in model code have the correct argument types. Otherwise the model build will fail at the C++ link stage, with an error message like

error LNK2019: unresolved external symbol "double __cdecl TransformScreened1(...

The correct function signature can be copied from the section below, or from the file OM_ROOT/include/omc/globals1.h in the OpenM++ distribution.

The rows of the following table describe the 10 arguments of a screening function. The example column contains values pasted from a debugger session on a breakpoint in the function TransformScreened1 in SM1, on the second invocation of the function, using the table given earlier.

Name Example Notes
in_value 99203.855397951411 The original value in the table, which can be transformed or suppressed by code in the transformation function.
description ExampleTable: accumulator 1: mean(value_out(interval(earnings))) A descriptive string which can be useful in debugging sessions. Don't attempt to parse it for content. Instead, use the enumerator arguments described in rows below.
statistic mean (4) The enumerator for the statistic for use in function code, e.g. omr::stat::mean
increment value_out (7) The enumerator for the increment for use in function code, e.g. omr::incr::value_out. For the unit keyword (count of increments), the value is omr::incr::unused.
table ExampleTable (0) The entity table name as an enumerator, e.g. omr::etbl::ExampleTable
attribute earnings (4) The attribute name as an enumerator, e.g. omr::attr::earnings. If the attribute name is not visible, e.g. duration(), the enumerator is omr::attr::om_none.
observations 2538.0000000000000 The number of observations (increments) in the cell. It is always unweighted, even if the table is weighted. The value is less than the 5,000 cases in the Default SM1 run due to mortality before age 50.
extrema_size 3 Identical to the value supplied in the screened[1-4]_extremas_size option, for use in function code. This is the maximum possible size of the extrema collections. The actual size may be less if there are fewer observations in the cell.
smallest {0.0000000000000000, 0.0000000000000000, 0.0000000000000000} The three smallest observed earnings, in increasing order. The extrema collection is one of the standard C++ container types, specifically std::multiset<double>. See code examples elsewhere in this topic for use. The values are all zero in this example because the distribution of earnings in SM1 is mixed discrete-continuous, with a large subpopulation having zero earnings.
largest {370810.00000000000, 398272.00000000000, 484007.00000000000} The three largest observed earnings, in increasing order.

[back to topic contents]

Screening function signature

/**
 * Table screening transformation function #1
 *
 * @param   in_value     The table value subject to transformation.
 * @param   description  A formatted string describing the table and statistic.
 * @param   statistic    The statistic of the accumulator, e.g. sum, mean.
 * @param   increment    The increment of the accumulator, e.g. delta, value_out.
 * @param   table        The table of the accumulator (model-specific).
 * @param   attribute    The attribute of the accumulator (model-specific).
 * @param   observations The count of observations in the cell (# of increments).
 * @param   extrema_size The maximum size M of the two extrema collections (configurable)
 * @param   smallest     The extrema collection containing the smallest M observations.
 * @param   largest      The extrema collection containing the largest M observations.
 *
 * @returns The transformed version of in_value.
 */
double TransformScreened1(
    const double in_value,
    const char* description,
    const omr::stat statistic,
    const omr::incr increment,
    const omr::etbl table,
    const omr::attr attribute,
    const double observations,
    const size_t extrema_size,
    const std::multiset<double>& smallest,
    const std::multiset<double>& largest
)

[back to screening function arguments]
[back to topic contents]

statistic enumeration

This enumeration is generated by the OpenM++ compiler in the file src/om_types0.h.

namespace omr {
    /// statistic in an entity table
    enum class stat {
        unit,
        sum,
        minimum,
        maximum,
        mean,
        variance,
        stdev,
        P1,
        P2,
        P5,
        P10,
        P20,
        P25,
        P30,
        P40,
        P50,
        P60,
        P70,
        P75,
        P80,
        P90,
        P95,
        P98,
        P99,
        gini,
    };
} // namespace omr

[back to screening function arguments]
[back to topic contents]

increment enumeration

This enumeration is generated by the OpenM++ compiler in the file src/om_types0.h.

namespace omr {
    /// increment in an entity table
    enum class incr {
        unused,
        delta,
        delta2,
        nz_delta,
        value_in,
        value_in2,
        nz_value_in,
        value_out,
        value_out2,
        nz_value_out,
    };
} // namespace omr

[back to screening function arguments]
[back to topic contents]

table enumeration

This model-specific enumeration is generated by the OpenM++ compiler in the file src/om_types0.h. Here it is for the SM1 model:

namespace omr {
    /// entity table in model
    enum class etbl {
        ExampleTable,
        om_none
    };
} // namespace omr

[back to screening function arguments]
[back to topic contents]

attribute enumeration

This model-specific enumeration is generated by the OpenM++ compiler in the file src/om_types0.h. Here it is for the SM1 model:

namespace omr {
    /// visible entity attribute in model
    enum class attr {
        age,
        alive,
        all_earnings,
        benefit,
        case_id,
        case_seed,
        earnings,
        entity_id,
        integer_age,
        lifecycle_counter,
        lifecycle_event,
        positive_earnings,
        region,
        se_earnings,
        time,
        under_audit,
        om_none
    };
} // namespace omr

[back to screening function arguments]
[back to topic contents]

Examples

This subtopic contains some worked examples of screened tables and screening functions.

Each example shows the table declaration, the body of the screening function, and a cell-by-cell comparison of the effects of screening on the table.

The examples are meant to illustrate coding approaches to different kinds of screening requirements. They have not been tested carefully for validity.

Some of these examples use standard math functions which require the header file <cmath>, which is made available to model code through a #include instruction in the file SM1/code/custom_early.h.

  • Example 1 Rounding based on the kind of statistic
  • Example 2 Rounding to 3 digits of precision
  • Example 3 Suppressing cells with few observations
  • Example 4 Suppressing cells dominated by a few large observations

[back to topic contents]

Example 1

This example rounds values to a fixed number of decimal digits, but treats different statistics differently. Specifically, the gini coefficient is not modified, counts are rounded to 100's, means are rounded to 1000's, sums are rounded to 1000000's, and other statistics are suppressed.

Table declaration:

table snapshot screened1 Person ExampleTable
[trigger_entrances(integer_age, 50)]
{
    {
        unit,                   //EN Persons
        nz_value_out(earnings), //EN Persons with earnings
        mean(earnings),         //EN Average earnings
        sum(earnings),          //EN Total earnings
        P50(earnings),          //EN Median earnings
        maximum(earnings),      //EN Maximum earnings
        gini(earnings)          //EN gini of earnings
    }
};

Screening function body:

{
    /// transformed value, initialized to quiet NaN (shows as empty)
    double out_value = UNDEF_VALUE;

    /// the increment is from the count-like nz family
    bool is_nz = 
           (increment == omr::incr::nz_delta)
        || (increment == omr::incr::nz_value_in)
        || (increment == omr::incr::nz_value_out);

    if (statistic == omr::stat::gini) {
        // gini coefficient
        // do not modify
        out_value = in_value;
    }
    else if ((statistic == omr::stat::unit) || is_nz) {
        // count-like value
        // round to 100's
        out_value = 100.0 * std::round(in_value / 100.0);
    }
    else if (statistic == omr::stat::mean) {
        // average-like value
        // round to 1000's
        out_value = 1000.0 * std::round(in_value / 1000.0);
    }
    else if (statistic == omr::stat::sum) {
        // total-like value
        // round to 1000000's
        out_value = 1000000.0 * std::round(in_value / 1000000.0);
    }
    else {
        // suppress other things
        out_value = UNDEF_VALUE;
    }

    return out_value;
}

Effects:

Quantity Label unscreened screened
unit Persons 2538 2500
nz_value_out(earnings) Persons with earnings 2030 2000
mean(earnings) Average earnings 99203.9 99000
sum(earnings) Total earnings 251779000 252000000
P50(earnings) Median earnings 101191
maximum(earnings) Maximum earnings 484007
gini(earnings) gini of earnings 0.375879 0.375879

[back to Examples]
[back to topic contents]

Example 2

This example rounds values to a fixed number of digits of precision (3). The table declaration is identical to example 1 immediately above.

Table declaration:

table snapshot screened1 Person StatsAt50
[trigger_entrances(integer_age, 50)]
{
    {
        unit,                   //EN Persons
        nz_value_out(earnings), //EN Persons with earnings
        mean(earnings),         //EN Average earnings
        sum(earnings),          //EN Total earnings
        P50(earnings),          //EN Median earnings
        maximum(earnings),      //EN Maximum earnings
        gini(earnings)          //EN gini of earnings
    }
};

Screening function body:

{
    // pass through non-finite values and 0.0
    if (!std::isfinite(in_value) || in_value == 0.0) {
        return in_value;
    }

    /// transformed value, initialized to quiet NaN (shows as empty)
    double out_value = UNDEF_VALUE;

    // number of significant digits to retain
    const static double precision = 3; 

    double d = std::ceil(std::log10(std::abs(in_value)));
    /// power of 10 for scaling
    double power = precision - std::trunc(d);
    /// scaling needed before rounding
    double magnitude = std::pow(10.0, power);

    out_value = std::round(in_value * magnitude) / magnitude;

    return out_value;
}

Effects:

Quantity Label unscreened screened
unit Persons 2538 2540
nz_value_out(earnings) Persons with earnings 2030 2030
mean(earnings) Average earnings 99203.9 99200
sum(earnings) Total earnings 251779000 252000000
P50(earnings) Median earnings 101191 101000
maximum(earnings) Maximum earnings 484007 484000
gini(earnings) gini of earnings 0.375879 0.376

[back to Examples]
[back to topic contents]

Example 3

This example rounds counts to 5's, suppresses cells with under 100 observations. The table has a classification dimension and a margin.

Table declaration:

table snapshot screened1 Person ExampleTable //EN High earners by region
[trigger_entrances(integer_age, 50)]
{
    region+
    * {
        high_earner //EN High earners
    }
};

Screening function body:

{
    /// transformed value, initialized to quiet NaN (shows as empty)
    double out_value = UNDEF_VALUE;

    if (observations < 100) {
        // suppress if fewer than 100 observations
        out_value = UNDEF_VALUE;
    }
    else {
        // round to 5's
        out_value = 5.0 * std::round(in_value / 5.0);
    }

    return out_value;
}

Effects:

Region observations unscreened screened
0 1374 269 270
1 675 141 140
2 373 83 85
3 59 5
4 57 9
All 2538 507 505

[back to Examples]
[back to topic contents]

Example 4

This example implements a dominance rule to suppress cells which are dominated by only a few observations in a cell. Specifically, average earnings are suppressed if the top 3 earners account for 60% or more of the earnings in the cell. The code for the screening function also illustrates how to specialize for a specific table.

Table declaration:

table snapshot screened1 Person ExampleTable //EN Average earnings of high earners
[trigger_entrances(integer_age, 50) && high_earner]
{
    region+
    * {
        mean(earnings) //EN Earnings
    }
};

Screening function body:

{
    /// transformed value, initialized to quiet NaN (shows as empty)
    double out_value = UNDEF_VALUE;

    switch (table) {
    case omr::etbl::ExampleTable:
    {
        assert(extrema_size == 3); // code below requires that extrema size is 3

        assert(statistic == omr::stat::mean);     // sanity check
        assert(attribute == omr::attr::earnings); // sanity check

        double sum_top3 = 0.0;
        for (auto& val : largest) {
            sum_top3 += val;
        }
        double sum_all = in_value * observations;

        if ((sum_top3 / sum_all) >= 0.60) {
            // suppress if top 3 earners account for 60% or more of earnings in cell
            out_value = UNDEF_VALUE;
        }
        else {
            // round value to 1000's
            out_value = 1000.0 * std::round(in_value / 1000.0);
        }
        break;
    }
    default:
    {
        // code to handle other screened tables would go here
        break;
    }
    } // switch
    return out_value;
}

Effects:

Region observations unscreened screened
0 236 172678 173000
1 127 168087 168000
2 74 168747 169000
3 5 173615
4 9 210466 210000
All 451 171504 172000

Note that the number of observations is small because the table is filtered on high earners.

[back to Examples]
[back to topic contents]

Annex 1: SM1 Code

Contents of the module SM1/code/Income.ompp:

/* NOTE(Income.mpp, EN)
    This module contains hard-coded notional income dynamics for testing screened tables.
*/

#include "omc/optional_IDE_helper.h" // help an IDE editor recognize model symbols

#if 0 // Hide non-C++ syntactic island from IDE

range REGION //EN Region
{
    0,
    4
};

parameters
{
    double EarningsNonZeroProportion;
    double EarningsScaleFactor;
    double EarningsSigma;
    double SE_EarningsNonZeroProportion;
    double SE_EarningsScaleFactor;
    double SE_EarningsSigma;
    double HighIncomeThreshold;
    double GuaranteedAnnualIncome;
    double AuditThreshold;
    cumrate RegionDistribution[REGION];
};

entity Person
{
    //EN Earnings
    double earnings = { 0.0 };

    //EN Self-employed earnings
    double se_earnings = { 0.0 };

    //EN All earnings
    double all_earnings = earnings + se_earnings;

    //EN Positive earnings
    double positive_earnings = max(0.0, all_earnings);

    //EN Benefit
    double benefit = max(GuaranteedAnnualIncome, GuaranteedAnnualIncome - positive_earnings);

    //EN High earner
    bool high_earner = (all_earnings >= HighIncomeThreshold);

    //EN Under audit
    bool under_audit = (all_earnings >= AuditThreshold) || (positive_earnings != all_earnings);

    //EN Region
    REGION region;

    //EN Notional model of earnings
    void AssignEarnings(void);

    //EN Assign region
    void AssignRegion(void);

    // call EarningsGrowth at each change in integer_age
    hook AssignEarnings, trigger_changes(integer_age);

    // call AssignRegion at Start
    hook AssignRegion, Start, 1;
};

#endif // Hide non-C++ syntactic island from IDE

void Person::AssignEarnings(void)
{
    if (integer_age == 20) {
        // Assign starting earnings at age 20
        if (RandUniform(10) < EarningsNonZeroProportion) {
            double z = RandNormal(11);
            double x = EarningsScaleFactor * std::exp(EarningsSigma * z);
            earnings = std::round(x);
        }
        // else earnings have default value of 0

        // Assign starting se_earnings at age 20
        if (RandUniform(12) < SE_EarningsNonZeroProportion) {
            // 80% have self-employed earnings
            double z = RandNormal(13);
            double x = SE_EarningsScaleFactor * std::exp(SE_EarningsSigma * z);
            se_earnings = std::round(x);
        }
    }
    else if (integer_age > 20) {
        // Annual change to earnings
        {
            double u = RandUniform(14);
            // rescale uniform to to [0.9, 1.1]
            u = 0.9 + 0.2 * u;
            double x = earnings * u;
            x *= 1.03; // career growth with increasing age
            earnings = std::round(x);
        }
        // Annual change to se_earnings
        {
            double u = RandUniform(15);
            // rescale uniform to [0.9, 1.1]
            u = 0.9 + 0.2 * u;
            double x = se_earnings * u;
            x *= 1.03; // career growth with increasing age
            se_earnings = std::round(x);
        }
    }
    else {
        // No earnings before age 20
    }
}

void Person::AssignRegion(void)
{
    double draw = RandUniform(3);
    int nRegion = 0;
    Lookup_RegionDistribution(draw, &nRegion);
    region = (REGION)nRegion;
}

[back to topic contents]

Annex 2: SM1 Parameters

Contents of SM1/parameters/Default/Income.ompp:

parameters
{
    double EarningsNonZeroProportion     = 0.80;
    double EarningsScaleFactor           = 50000.0;
    double EarningsSigma                 = 0.25;
    double SE_EarningsNonZeroProportion  = 0.80;
    double SE_EarningsScaleFactor        = 40000.0;
    double SE_EarningsSigma              = 0.25;
    double HighIncomeThreshold           = 250000.00;
    double GuaranteedAnnualIncome        = 20000.00;
    double AuditThreshold                = 250000.00;
    cumrate RegionDistribution[REGION] =
    {
        200, 100, 50, 10, 10
    };
};

[back to topic contents]

⚠️ **GitHub.com Fallback** ⚠️