Model Output Expressions - openmpp/openmpp.github.io GitHub Wiki

Sub-values: sub-samples, members, replicas

Following terms: "simulation member", "replica", "sub-sample" are often used in micro-simulation conversations interchangeably, depending on context. To avoid terminology discussion openM++ uses "sub-value" as equivalent of all above and some older pages of that wiki may contain "sub-sample" in that case.

Model output tables: sub-values, accumulators and expressions

There are two kind of model output tables:

  • accumulators table: output sub-values (similar to Modgen sub-samples)
  • expressions table: model output value calculated as accumulators aggregated across sub-values (e.g. mean or CV or SE)

All output accumulator tables always contain same number of sub-values, for example model run:

model.exe -OpenM.Subvalues 16

will create 16 sub-values for each accumulator in each output accumulator table.

It is also possible to use parameter(s) in expressions, parameter must be a scalar of float or integer type (see example of OM_COUNT_IF below).

Sub-values (accumulators) output tables

During the simulation OpenM++ model collect the results in "accumulators" and, at the end, write it into output accumulators table(s). Each output accumulator table contains results of model executions for all sub-values.

For example:

Model output table "Salary by Sex" has two accumulators and two dimensions:

  • salary: 0 = "Low", 1 = "Medium", 2 = "High"
  • sex: 0 = "Female", 1 = "Male"

If we run that model twice, first time with one sub-value and second with eight sub-values then output results may look like:

SELECT
  run_id, dim0, dim1, acc_id, sub_id, acc_value
FROM modelone_201208171604590148_a0_salarySex
ORDER BY 1, 2, 3, 4, 5;

run_id  dim0    dim1    acc_id  sub_id  acc_value
------  ------  ------  ------  ------  ----------
11      0       0       0       0       50.0
11      0       0       1       0       1.0
11      0       1       0       0       60.0
11      0       1       1       0       2.0
11      1       0       0       0       51.6
11      1       0       1       0       2.0
11      1       1       0       0       62.0
11      1       1       1       0       3.0
11      2       0       0       0       53.2
11      2       0       1       0       3.0
11      2       1       0       0       64.0
11      2       1       1       0       4.0
12      0       0       0       0       50.0
12      0       0       0       1       100.0
12      0       0       0       2       150.0
12      0       0       0       3       200.0
12      0       0       0       4       250.0
12      0       0       0       5       300.0
12      0       0       0       6       350.0
12      0       0       0       7       400.0
12      0       0       1       0       1.0
....more results....
12      2       1       1       7       11.0

Columns are:

  • run_id: is unique run id for that model execution; all model input parameters and output results can be found by run_id;
  • dim0: salary dimension items;
  • dim1: sex dimension items;
  • acc_id: zero-based accumulator number;
  • sub_id: zero-based sub-value number;
  • acc_value: accumulator value;

Accumulators are low level simulation results and useful mostly to analyze simulation model itself.

Aggregated output values

On top of accumulator values for each sub-value model can produce more meaningful output results by using OpenM++ output expressions, i.e.: average value across all sub-values. To do that model developer (or model user) can specify output aggregation expression, for example, average value is: OM_AVG(acc0).

Each "value" output table can contain unlimited (reasonably unlimited) amount of aggregation expressions. Each expression must include aggregation function(s) with accumulators as argument(s) and, optionally, other arithmetic operators and basic SQL functions, such as ABS or SQRT.

Following OpenM++ sub-values aggregation functions are supported:

  • OM_COUNT(...expr...) - count of values across all sub-values, OM_COUNT(acc0) result in SQL:
    COUNT(acc0)
  • OM_COUNT_IF(...condition...) - count of values matching condition, OM_COUNT_IF(acc0 > param.High) result in SQL:
    COUNT(CASE WHEN acc0 > (....sql to select value of High parameter...) THEN 1 ELSE NULL END)
  • OM_SUM(...expr...) - sum of values across all sub-values, OM_SUM(acc0) result in SQL:
    SUM(acc0)
  • OM_AVG(...expr...) - average value over sub-values, OM_AVG(acc0) result in SQL:
    AVG(acc0)
  • OM_MAX(...expr...) - maximum value over all sub-values, OM_MAX(acc0) result in SQL:
    MAX(acc0)
  • OM_MIN(...expr...) - minimal value over all sub-values, OM_MIN(acc0) result in SQL:
    MIN(acc0)
  • OM_VAR(...expr...) - variance over sub-values, OM_VAR(acc0) result in SQL:
    SUM( (acc0 - AVG(acc0) * (acc0 - AVG(acc0) ) / ( COUNT(acc0) - 1 )
  • OM_SD(...) - standard deviation:
    SQRT(OM_VAR(...expr...))
  • OM_SE(...expr...) - standard error:
    SQRT(OM_VAR(...expr...) / COUNT(...expr...))
  • OM_CV(...expr...) - coefficient of variation:
    100 * ( OM_SD(...expr...) / AVG(...expr...) )

There are also non-aggregation functions available:

  • OM_IF(...condition... THEN ...expr... ELSE ....other...) - if condition is true then return expr else return other (else part is optional). OM_IF(acc0 > 1.5 THEN acc0 ELSE 1.5) result in SQL:
    CASE WHEN acc0 > 1.5 THEN acc0 ELSE 1.5 END
  • OM_DIV_BY(...expr...) - wrap expression to make it suitable for denominator:
    CASE WHEN ABS(acc0) > 1.0e-37 THEN acc0 ELSE NULL END

If your expression include divide by operator then it is recommended to wrap a denominator into OM_DIV_BY() function to prevent an error when divisor is zero or very small value. For example, if your expression is acc1 / acc0 then use do acc1 / OM_DIV_BY(acc0).

Aggregation expression can be more complex than a single function, for example: OM_SUM(acc0) / OM_COUNT(acc0) is equivalent of OM_AVG(acc0). And OM_SD(acc1) can be written as:

SQRT(OM_SUM( (acc1 - OM_AVG(acc1) * (acc1 - OM_AVG(acc1) ) / ( OM_COUNT(acc1) - 1 ))

It is possible, as you can see, combine and nest aggregation functions in the expression.

Important:

  • openM++ does aggregation across the sub-values, or other word, COUNT() is (almost) always number of sub-values.
  • aggregation done by underlying SQL database, so, only non-NULL accumulator values are aggregated, so, COUNT() is number of non-NULL accumultor values across sub-values.

Important: Accumulators always must be inside some aggregation function, i.e. this is an error: acc0 + OM_SUM(acc1) because acc0 is not aggregated. This is also an error: OM_SUM(acc1) / OM_DIV_BY(acc0) because OM_DIV_BY and OM_IF are non-aggregation functions. Output table expression is a single value produced as aggregation over sub-values and therefore accumulators must be wrapped into aggregation functions. For example:

  • OM_SUM( acc1 + acc0 ) is correct
  • OM_SUM( acc1 ) + acc0 incorrect
  • OM_SUM( acc2 / OM_DIV_BY(acc0) ) is correct
  • OM_SUM(acc2) / OM_DIV_BY(acc0) incorrect
  • OM_AVG(acc2) + OM_SUM( OM_AVG(acc1) + acc0 ) is correct
  • OM_AVG(acc2) + OM_SUM( OM_AVG(acc1) ) + acc0 incorrect

If you want to aggregate simulation results in your own way then it is always possible to combine openM++ and standard SQL functions in some custom expression. For example, if sub-values of your model is parts of large population then your may want to collect count and sum in separate accumulators and instead of OM_AVG(...) use custom average expression, like:

OM_SUM(acc0) / OM_SUM(acc1)

Also it is recommended to warp denominator part into OM_DIV_BY() function and result is:

OM_SUM(acc0) / OM_DIV_BY( OM_SUM(acc1) )

Examples of aggregation expressions

OpenM++ output table expressions translated into SQL aggregation queries. For example, if we have accumulator table:

CREATE TABLE out4_sub
(
  run_id    INT        NOT NULL,
  dim0      INT        NOT NULL,
  dim1      VARCHAR(8) NOT NULL,
  sub_id    INT        NOT NULL,
  acc0      FLOAT NULL,
  PRIMARY KEY (run_id, dim0, dim1, sub_id)
);

SELECT run_id, dim0, dim1, sub_id, acc0 FROM out4_sub ORDER BY run_id, dim0, dim1 DESC;

run_id  dim0      dim1     sub_id   acc0
------  ----      ----     ------   ----
2        10        M        0        1
2        10        M        1        2
2        10        M        2        3
2        10        M        3        4
2        10        F        0        1.5
2        10        F        1        2.5
2        10        F        2        3.5
2        10        F        3        4.5
2        20        M        0        10
2        20        M        1        20
2        20        M        2        30
2        20        M        3        40
2        20        F        0        10.5
2        20        F        1        20.5
2        20        F        2        30.5
2        20        F        3        40.5
3        10        M        0        5
3        10        M        1        6
3        10        F        0        7
3        10        F        1        8
3        20        M        0        50
3        20        M        1        60
3        20        F        0        70
3        20        F        1        80

Please, keep in mind: this is simplified example and in real openM++ database sub-value tables look like as described at the top of the article.

Then following results would be produced by openM++ aggregation functions:

Count, Average, Sum, Min and Max:

SELECT
  S.run_id, S.dim0, S.dim1,
  COUNT(S.acc0) AS "cnt",
  AVG(S.acc0)   AS "avg",
  SUM(S.acc0)   AS "sum",
  MIN(S.acc0)   AS "min",
  MAX(S.acc0)   AS "max"
FROM out4_sub S
GROUP BY S.run_id, S.dim0, S.dim1
ORDER BY S.run_id, S.dim0, S.dim1 DESC;

run_id dim0   dim1   cnt     avg      sum      min    max
------ ----   ----   ---     ---      ---      ---    ---
2       10     M      4       2.5      10       1       4
2       10     F      4       3        12       1.5     4.5
2       20     M      4      25       100      10      40
2       20     F      4      25.5     102      10.5    40.5
3       10     M      2       5.5      11       5       6
3       10     F      2       7.5      15       7       8
3       20     M      2      55       110      50      60
3       20     F      2      75       150      70      80

Count, Average and Variance:

SELECT
  S.run_id, S.dim0, S.dim1,
  COUNT(S.acc0) AS "cnt",
  AVG(S.acc0)   AS "avg",
  SUM(
    (S.acc0 - (SELECT AVG(VM1.acc0) FROM out4_sub VM1 WHERE VM1.run_id = S.run_id AND VM1.dim0 = S.dim0 AND VM1.dim1 = S.dim1) ) *
    (S.acc0 - (SELECT AVG(VM2.acc0) FROM out4_sub VM2 WHERE VM2.run_id = S.run_id AND VM2.dim0 = S.dim0 AND VM2.dim1 = S.dim1) )
  ) /
  ( (SELECT COUNT(VC1.acc0) FROM out4_sub VC1 WHERE VC1.run_id = S.run_id AND VC1.dim0 = S.dim0 AND VC1.dim1 = S.dim1) - 1) AS "var"
FROM out4_sub S
GROUP BY S.run_id, S.dim0, S.dim1
ORDER BY S.run_id, S.dim0, S.dim1 DESC;

run_id dim0     dim1    cnt     avg      var
------ ---      ----    ---     ---      ---
2       10       M       4       2.5      1.66666666666667
2       10       F       4       3        1.66666666666667
2       20       M       4      25      166.666666666667
2       20       F       4      25.5    166.666666666667
3       10       M       2       5.5      0.5
3       10       F       2       7.5      0.5
3       20       M       2      55       50
3       20       F       2      75       50

Count, Average and Standard Deviation:

SELECT
  S.run_id, S.dim0, S.dim1,
  COUNT(S.acc0) AS "cnt",
  AVG(S.acc0)   AS "avg",
  SQRT(
    SUM(
      (S.acc0 - (SELECT AVG(SDM1.acc0) FROM out4_sub SDM1 WHERE SDM1.run_id = S.run_id AND SDM1.dim0 = S.dim0 AND SDM1.dim1 = S.dim1) ) *
      (S.acc0 - (SELECT AVG(SDM2.acc0) FROM out4_sub SDM2 WHERE SDM2.run_id = S.run_id AND SDM2.dim0 = S.dim0 AND SDM2.dim1 = S.dim1) )
    ) /
    ( (SELECT COUNT(SDC1.acc0) FROM out4_sub SDC1 WHERE SDC1.run_id = S.run_id AND SDC1.dim0 = S.dim0 AND SDC1.dim1 = S.dim1) - 1)
  ) AS "sd"
FROM out4_sub S
GROUP BY S.run_id, S.dim0, S.dim1
ORDER BY S.run_id, S.dim0, S.dim1 DESC;

run_id dim0     dim1    cnt     avg     sd
------ ----     ---     ---     ---     --
2       10       M       4       2.5     1.29099444873581
2       10       F       4       3       1.29099444873581
2       20       M       4      25      12.9099444873581
2       20       F       4      25.5    12.9099444873581
3       10       M       2       5.5     0.707106781186548
3       10       F       2       7.5     0.707106781186548
3       20       M       2      55       7.07106781186548
3       20       F       2      75       7.07106781186548

Count, Average, and Standard Error:

SELECT
  S.run_id, S.dim0, S.dim1,
  COUNT(S.acc0) AS "cnt",
  AVG(S.acc0)   AS "avg",
  SQRT(
    SUM(
      (S.acc0 - (SELECT AVG(SEM1.acc0) FROM out4_sub SEM1 WHERE SEM1.run_id = S.run_id AND SEM1.dim0 = S.dim0 AND SEM1.dim1 = S.dim1) ) *
      (S.acc0 - (SELECT AVG(SEM2.acc0) FROM out4_sub SEM2 WHERE SEM2.run_id = S.run_id AND SEM2.dim0 = S.dim0 AND SEM2.dim1 = S.dim1) )
    ) /
    ( (SELECT COUNT(SEC1.acc0) FROM out4_sub SEC1 WHERE SEC1.run_id = S.run_id AND SEC1.dim0 = S.dim0 AND SEC1.dim1 = S.dim1) - 1) /
    (SELECT COUNT(SEC2.acc0) FROM out4_sub SEC2 WHERE SEC2.run_id = S.run_id AND SEC2.dim0 = S.dim0 AND SEC2.dim1 = S.dim1)
  ) AS "se"
FROM out4_sub S
GROUP BY S.run_id, S.dim0, S.dim1
ORDER BY S.run_id, S.dim0, S.dim1 DESC;

run_id dim0     dim1    cnt     avg     se
------ ----     ----    ---     ---     --
2       10       M       4       2.5     0.645497224367903
2       10       F       4       3       0.645497224367903
2       20       M       4      25       6.45497224367903
2       20       F       4      25.5     6.45497224367903
3       10       M       2       5.5     0.5
3       10       F       2       7.5     0.5
3       20       M       2      55       5
3       20       F       2      75       5

Count, Average, an Coefficient of Variation:

SELECT
  S.run_id, S.dim0, S.dim1,
  COUNT(S.acc0) AS "cnt",
  AVG(S.acc0)   AS "avg",
  100.0 * (
    SQRT(
      SUM(
        (S.acc0 - (SELECT AVG(CVM1.acc0) FROM out4_sub CVM1 WHERE CVM1.run_id = S.run_id AND CVM1.dim0 = S.dim0 AND CVM1.dim1 = S.dim1) ) *
        (S.acc0 - (SELECT AVG(CVM2.acc0) FROM out4_sub CVM2 WHERE CVM2.run_id = S.run_id AND CVM2.dim0 = S.dim0 AND CVM2.dim1 = S.dim1) )
      ) /
      ( (SELECT COUNT(CVC1.acc0) FROM out4_sub CVC1 WHERE CVC1.run_id = S.run_id AND CVC1.dim0 = S.dim0 AND CVC1.dim1 = S.dim1) - 1)
    ) /
    (SELECT AVG(CVM3.acc0) FROM out4_sub CVM3 WHERE CVM3.run_id = S.run_id AND CVM3.dim0 = S.dim0 AND CVM3.dim1 = S.dim1)
  ) AS "cv"
FROM out4_sub S
GROUP BY S.run_id, S.dim0, S.dim1
ORDER BY S.run_id, S.dim0, S.dim1 DESC;

run_id dim0     dim1    cnt     avg     cv
------ ----     ----    ---     ---     --
2       10       M       4       2.5    51.6397779494322
2       10       F       4       3      43.0331482911935
2       20       M       4      25      51.6397779494322
2       20       F       4      25.5    50.6272332837571
3       10       M       2       5.5    12.8564869306645
3       10       F       2       7.5     9.42809041582064
3       20       M       2      55      12.8564869306645
3       20       F       2      75       9.42809041582063

SQL implementation details

In the previous section we are using simplified representation of accumulator table and SQL dialect, which is not compatible across all vendors. Real SQL aggregation queries can be found in expr_sql column of table_expr metadata table. For example if source model expression is:

(OM_SUM(acc0) / OM_SUM(acc2))

then result look like:

SELECT
  M1.run_id, M1.dim0, (SUM(M1.acc_value) / SUM(L1A2.acc2)) AS expr1
FROM RiskPaths_201410071856440009_a2_T03_FertilityByAge M1
INNER JOIN
(
  SELECT run_id, dim0, sub_id, acc_value AS acc2
  FROM RiskPaths_201410071856440009_a2_T03_FertilityByAge
  WHERE acc_id = 2
) L1A2
ON (L1A2.run_id = M1.run_id AND L1A2.dim0 = M1.dim0 AND L1A2.sub_id = M1.sub_id)
WHERE M1.acc_id = 0
GROUP BY M1.run_id, M1.dim0
⚠️ **GitHub.com Fallback** ⚠️