Local Random Streams - openmpp/openmpp.github.io GitHub Wiki
Home > Model Development Topics > Local Random Streams
Model code can optionally specify that the state of random number streams be maintained locally for each entity rather than shared among entities. This can significantly reduce variance in run comparisons of models which simulate multiple instances of entities together, such as time-based models or case-based models with multiple entities in a case. It can also make run comparisons at the microdata level feasible for such models. Local random streams are not relevant for case-based models with a single entity per case. To jump to charts comparing the effect with and without local random streams in a highly interacting time-based model, click here.
- Background and overview
- Syntax and use
- Illustrative example Shows decoherence reduction graphically in a time-based model
Model code can draw random values from selected statistical distributions using built-in random number generator (RNG) functions, for example:
double x = RandUniform(1);
double y = RandNormal(2);
double z = RandPoisson(3);
These functions return pseudo-random streams of numbers. The streams appear random but are actually produced by a deterministic algorithm which generates a fixed sequence of values. That algorithm knows which value to return next by maintaining an internal state which changes from one function call to the next.
The sequence of numbers returned depends on the SimulationSeed
for the run, on the run member (aka sub, replicate), on the case for case-based models, and on the random stream number (the small integer argument in RNG function calls).
The random stream number in an RNG function call specifies a distinct underlying random stream which produces values independent of those produced by other random streams.
This avoids spurious interactions among unrelated random processes in the model.
For example, values returned by calling RandUniform(4)
in a Fertility
module will not affect values returned by calling RandUniform(6)
in a Migration
module.
Independent random streams can reduce statistical noise in the difference of two model runs, reducing the run size needed to obtain reliable results for run differences.
They also make microdata comparisons of two runs correspond better with model logic.
For example, if there is no logical dependence between Fertility
and Migration
in the model,
changing a Fertility
parameter should not, logically, affect Migration
.
Had the same random stream, e.g. stream 4 in RandUniform(4)
, been used in both Fertility
and Migration
, a call to RandUniform(4)
in Fertility
would affect the value returned in a subsequent call to RandUniform(4)
in Migration
.
That would produce a spurious (but statistically neutral) interaction between Fertility
and Migration
.
That can be avoided by using a different random stream in Migration
, e.g. by calling RandUniform(6)
to specify stream 6 instead of stream 4.
Spurious correlation of random streams can be avoided by specifying a distinct random stream in each call to an RNG function throughout model code.
However, a model which simulates multiple instances of an entity together, e.g. multiple Person
entities, could have spurious interactions of random streams among those entities.
For example, a call to RandUniform(4)
in Fertility
in Person
A will affect the result from a subsequent call in Fertility
to RandUniform(4)
in Person
B, because the same random stream 4 is used in both.
In a time-based model with many entities, a spurious interaction could extend from one entity to the entire population.
Such spurious interactions do not affect the statistical validity of aggregate model results, but they can create additional statistical noise in run comparisons, and produce differences at the microdata level which are not explained by model logic.
This issue can be resolved by maintaining independent local random streams in each entity, rather than global random streams shared among entities.
For example, using local random streams, a call to RandUniform(4)
in Person
A uses a different random stream from a call to RandUniform(4)
in Person
B.
Local random streams require additional memory in each entity to maintain the state of the pseudo-random number generator for each stream. This additional memory can be significant for time-based models with many entities and many random streams.
Local random streams must also be initialized distinctly in each entity so that different entities produce different random streams. That requirement is met by providing a unique key for each entity. The entity key is used to initialize local random streams independently in each each entity before it enters the simulation. The entity key needs to be stable from one run to another so that local random streams are the same for the same entity in two different runs. The code to specify the unique entity key is, in general, model dependent.
Given these trades, local random streams are not implemented by default in OpenM++.
However, a model coded to use local random streams can turn them off by changing a single line of model code and rebuilding, and reduce memory requirements when local random streams are not required for analysis.
Sections:
- Activation
- Entity key
- RNG use in entity initialization
- RNG use before simulation
- Memory use
- Internals
Use the option local_random_streams
to implement local random streams for all instances of a given entity, e.g.
options local_random_streams = Host;
to implement local random streams for all instances of the Host
entity.
Use multiple options
statements to implement local random streams for more than one kind of entity.
During model build, a log message like
Entity 'Host' has 11 local random streams, of which 1 are Normal
is issued for each entity with local random streams. The message indicates how many of the local streams use a Normal distribution, because those use additional memory in the entity to maintain state.
[back to syntax and use]
[back to topic contents]
As mentioned in Background and overview, an entity with local random streams needs a unique identifier to produce random streams which differ among entities.
The entity key needs to be not only unique in a run, but must also uniquely identify the same entity in the model runs being compared.
The entity key is provided by the entity member function get_entity_key()
.
get_entity_key
returns a 64-bit value which is used internally by the built-in function initialize_local_random_streams()
to initialize the local random streams of the entity before it enters the simulation.
If model code does not supply a definition for get_entity_key()
a definition like the following is generated by the OpenM++ compiler:
uint64_t Host::get_entity_key()
{
// This function definition was generated by omc because none was supplied in model code.
// It returns the value of the built-in attribute entity_id.
uint64_t entity_key = entity_id;
return entity_key;
}
This default definition uses the built-in attribute entity_id
as the entity key.
entity_id
is guaranteed to be unique for all entities in a run,
making it a natural candidate for the entity key.
However, entity_id
might not identify the same entity in two different runs.
That can happen if model code generates entities dynamically, or if the runs being compared have different numbers of entities or members (aka subs, replicates).
A model might already have an attribute which uniquely identifies the same entity in two different runs. For example, the model might be based on microdata which contains a unique personal identifier for each record/entity.
If the definition of get_entity_key()
is supplied in model code, it would typically create the entity key using the values of one or more entity attributes before the entity enters the simulation.
The following hypothetical definition of get_entity_key()
uses the helper function xz_crc64
to combine the starting value of age
and time
to create the entity key.
xz_crc64
creates a 64-bit key using the crc-64
open source checksum (hash) algorithm, and can use one value or combine multiple values together using successive calls.
uint64_t Host::get_entity_key()
{
uint64_t key64 = 0;
key64 = xz_crc64((uint8_t*)&age, sizeof(age), key64);
key64 = xz_crc64((uint8_t*)&time, sizeof(time), key64);
return key64;
}
[back to syntax and use]
[back to topic contents]
This section applies only to calls to RNG functions in entity context. See section RNG use before simulation for information about using RNG functions outside of entity context before the simulation starts.
If an entity uses local random streams they must be initialized before use. If they are not initialized a fatal error like the following will be raised when the model is run:
Simulation error: RandUniform called with uninitialized local random streams.
By default, initialization is performed automatically when an entity enters the simulation, after starting values of all attributes have been assigned.
However, if model code in entity scope calls an RNG function before the entity enters the simulation, the local random streams of the entity will not have been initialized, causing the run-time error.
For example, the following code
void Host::Start()
{
// Initialize all attributes (OpenM++).
initialize_attributes();
z_score = RandNormal(11);
// Have the entity enter the simulation (OpenM++).
enter_simulation();
}
assigns a random value to the Host
attribute z_score
before the Host
enters the simulation.
If Host
uses local random streams, the run-time error would occur.
However, model code can explicitly initialize local random streams before the entity enters the simulation by calling the provided function initialize_local_random_streams()
.
The following modified code
void Host::Start()
{
// Initialize all attributes (OpenM++).
initialize_attributes();
// Need to explicitly initialize local random streams here
// because they are used before the entity enters the simulation.
initialize_local_random_streams();
z_score = RandNormal(11);
// Have the entity enter the simulation (OpenM++).
enter_simulation();
}
initializes local random streams for the Host
before the call to RandNormal
, avoiding the error.
initialize_local_random_streams()
uses get_entity_key()
internally.
If the definition of get_entity_key()
is supplied in model code,
assign all attributes used in get_entity_key()
before the call to initialize_local_random_streams()
.
initialize_local_random_streams()
can be called (with no effect) even if the entity does not use local random streams,
so there is no need to remove the call from model code if local random streams are not used in the entity.
Modgen specific: Modgen does not recognize the function initialize_local_random_streams
.
That causes an error when building the Modgen version of a x-compatible model.
For x-compatible models, use code like:
#if !defined(MODGEN)
initialize_local_random_streams();
#endif
[back to syntax and use]
[back to topic contents]
Use of local random streams in one or more entities does not preclude use of global (shared) random streams.
Global random streams work normally in an entity not specified in a local_random_streams
options statement.
Global streams also work normally in PreSimulation
and in Simulation
, before the simulation starts.
For example, the IDMM
model used in the
Illustrative example
uses two shared RNG streams in Simulation
,
one (random stream 5) to randomly infect a portion of the population at the beginning of the simulation:
// Create the initial population of Hosts, and infect some probabilistically.
for ( int nJ = 0; nJ < NumberOfHosts; nJ++ ) {
auto prHost = new Host();
prHost->Start();
if (InitialDiseasePrevalence > RandUniform(5)) {
prHost->Infect();
}
}
and another (random stream 3) to construct the starting social network:
// Construct the initial social network.
int nHosts = asAllHosts->Count();
for ( int nJ = 0; nJ < nHosts; nJ++ ) {
auto prHost = asAllHosts->Item( nJ );
for (int nK = 0; nK < ContactsOutPerHost; nK++ ) {
// Choose a host randomly from all hosts
auto prContact = asAllHosts->GetRandom( RandUniform(3) );
// Add it to the outgoing contacts link.
// Note that if the contact happens to already be a contact, it will not be added.
// because links contain no duplicates.
if (prContact != prHost) {
// do not link to self
prHost->mlContactsOut->Add( prContact );
}
}
}
A global random stream does not affect a local random stream with the same number. However, best practice uses a unique random stream for each call to an RNG function in model code.
[back to syntax and use]
[back to topic contents]
Memory impacts of local random streams are incorporated into the Resource Use Report in the run log, if resource use monitoring is on. To turn resource use monitoring on, add the following statement to model code:
options resource_use = on;
For example, here's the Resource Use Summary table from a 1,000,000 Hosts
run of IDMM
with local random streams:
+---------------------------+
| Resource Use Summary |
+-------------------+-------+
| Category | MB |
+-------------------+-------+
| Entities | 448 |
| Host | 448 |
| Ticker | 0 |
| Multilinks | 63 |
| Events | 95 |
| Sets | 48 |
| Tables | 0 |
+-------------------+-------+
| All | 655 |
+-------------------+-------+
The same table without local random streams looks like this:
+---------------------------+
| Resource Use Summary |
+-------------------+-------+
| Category | MB |
+-------------------+-------+
| Entities | 384 |
| Host | 384 |
| Ticker | 0 |
| Multilinks | 63 |
| Events | 95 |
| Sets | 48 |
| Tables | 0 |
+-------------------+-------+
| All | 591 |
+-------------------+-------+
In this example, memory requirements for the run increased from 591 MB to 655 MB in the version built with local random streams.
Additional lines appear in the entity member detail section of the report in a model with local random streams.
An additional row for the random state arrays is present in the member count table.
The count is 1 if the entity does not use RandNormal
or 3 if it does.
This line counts the number of arrays used to maintain the state of local random streams, not the size of those arrays.
+-----------------------------+
| Host Members |
+---------------------+-------+
| Member | Count |
+---------------------+-------+
| Attributes | 19 |
| Built-in | 6 |
| Simple | 12 |
| Maintained | 1 |
| Link | 0 |
| Events | 3 |
| Increments | 2 |
| Multilink | 2 |
| Internal | 6 |
| Array | 0 |
| Foreign | 0 |
| Random state arrays | 3 |
+---------------------+-------+
| All | 35 |
+---------------------+-------+
Additional lines appear in the entity member detail table.
Here's an extract from IDMM
with local random streams:
+---------------------------------------------------+
| Host Members (detail) |
+-------------------------------------------+-------+
| member | bytes |
+-------------------------------------------+-------+
| Attributes: | |
| Built-in: | |
| age | 8 |
| case_seed | 8 |
...
| Internal: | |
| X01_History (in) om_duration | 8 |
| X02_Host_Events (inevent) event_count | 4 |
| asAllHosts (current cell) | 4 |
| event_count (lagged) | 4 |
| event_count (counter at lagged) | 8 |
| om_local_rng_streams_initialized | 1 |
| Array: | |
| Foreign: | |
| Random state arrays: | |
| om_stream_X | 40 |
| om_other_normal | 8 |
| om_other_normal_valid | 1 |
+-------------------------------------------+-------+
| Sum of member bytes | 372 |
| Bytes per entity | 448 |
| Storage efficiency (%) | 83.0 |
+-------------------------------------------+-------+
An additional Internal
member om_local_rng_streams_initialized
is used to manage the entity life cycle of local random streams.
Additional lines towards the end of the table show the memory used by the arrays which maintain random state.
[back to syntax and use]
[back to topic contents]
Local random streams are implemented using a multiplicative congruential generator (MCG), with an unused multiplier mentioned in the module OM_ROOT/use/random/random_lcg200.ompp
.
The multiplier of the MCG is 1187848453 and the modulus is the Mersenne prime 2^31-1.
For memory efficiency, the state of local random streams is maintained only for streams used in the entity.
For example, IDMM
model code in the
Illustrative example
uses 12 random streams,
but only 10 are used in the Host
entity.
Local random state is maintained in each Host
for just the 10 streams used in Host
.
The other 2 random streams are global and are used in Simulation
to create the starting population before the simulation starts, as described in
RNG use before simulation.
A random stream associated with the RandNormal
RNG function requires additional memory to maintain state,
because the underlying algorithm produces a pair of draws from N(0,1),
and memory is required to save the second value of the pair which is returned by the next call to RandNormal
for the stream.
Local random streams are seeded before the entity enters the simulation using 4 integer values:
- the value returned by
get_entity_key()
, - the run seed for a time-based model or the case seed for a case-based model,
- the run member (aka sub, replicate), and
- the random stream number.
A local random stream thus depends on all 4 of these values, as required by the role of each.
These 4 values are combined using the xz_crc64
hash function and the 64-bit result reduced to the allowed range of the MCG using the remainder of an appropriate integer division.
For more details, see the commented function definition of initialize_local_random_streams
in the C++ file src/om_declarations.h
which is generated by the OpenM++ compiler.
[back to syntax and use]
[back to topic contents]
This example illustrates how local random streams can improve run coherence in a time-based model. Click here to jump directly to the graphical comparison.
Sections:
This example illustrates the effect of local random streams vs. global random streams on run coherence.
It uses the time-based IDMM
model with minor modifications.
Microdata is output at 100 time points during the simulation, and later merged and compared between Base
and Variant
runs to measure how run coherence evolves during the simulation.
Four runs are used in this example
, a Base
and a Variant
with two versions of the model:
-
Base
run with shared random streams -
Variant
run with shared random streams -
Base
run with local random streams -
Variant
run with local random streams
The same version of IDMM
was used for runs 1 and 2.
A different version of IDMM
was used for runs 3 and 4,
the only difference being activation of local random streams.
All 4 runs have the same number of Hosts
and an identical contact network created before the simulation starts using global random streams.
That means that entity_id
refers to the same Host
in all 4 runs, so the default version of get_entity_key()
can be used.
The Base
runs have identical inputs in runs 1 and 3.
The Variant
runs have identical inputs in runs 2 and 4.
A single parameter differs slightly between the Variant
runs and the Base
runs.
That change causes only 2 entities to differ at the start of the Variant
runs compared to the Base
runs.
Aggregate outputs over time for the 4 runs are so similar that they are visually indistinguishable (see below). However, the degree of coherence at the microdata level between runs 3 and 4 (with local random streams) is noticeably different than that between runs 1 and 2 (with shared random streams).
Run-time resource reports were used to compare memory use for the two versions of IDMM
.
Without local random streams, each Host
entity was 384 bytes in size.
With local random streams, each Host
entity was 448 bytes in size, an increase of 64 bytes or 17%.
[back to illustrative example]
[back to topic contents]
IDMM
simulates an interacting dynamic contact network of Host
entities, together with a disease which can be transmitted over that contact network.
The contact network is initialized randomly at the start of the simulation.
During the simulation, each Host
interacts with other Hosts
in contact events.
Each Host
can change its connected Hosts
in contact change events.
Optionally, a Host
can change a connected Host
in a contact event, if that host is infectious.
During a contact event, the disease can propagate between the two Hosts
, depending on the disease status of each Host
.
An infected Host
progresses through 4 fixed length disease phases: susceptible, latent, infectious, and immune.
On infection, a Host
enters the latent phase, during which it is both asymptomatic and non-infectious.
After the latent phase, the Host
enters an infectious phase during which it can infect another Host
during a contact event.
After the infectious phase, the Host
enters an immune phase.
After the immune phase, the Host
returns to the susceptible state.
Before the simulation starts, all Host
entities are in the susceptible state.
After a Host
enters the simulation but before the first simulated event, it is randomly infected at a given probability.
For this example, some mechanical changes were made to the version of IDMM
in the OpenM++ distribution.
[back to illustrative example]
[back to topic contents]
The Base
run consists of 5,000 Hosts
simulated for 100 time units, with the initial probability of infection set to 0.1000.
All other parameters are left at Default
values, which are as follows:
parameters {
int NumberOfHosts = 5000; //EN Number of hosts
int ContactsOutPerHost = 4; //EN Number of outgoing contacts per host
double MeanContactInterval = 1.00; //EN Mean time between social interactions
double MeanChangeContactsInterval = 5.00; //EN Mean time between changing social contact network
double DumpSickContactProbability = 0.0; //EN Probability of dumping a sick contact
double InitialDiseasePrevalence = 0.1000; //EN Disease prevalence at start of simulation
double TransmissionEfficiency = 0.10; //EN Probability of transmitting disease during a social contact
double LatentPhaseDuration = 5.0; //EN Duration of latent phase
double ContagiousPhaseDuration = 10.0; //EN Duration of contagious phase
double ImmunePhaseDuration = 20.0; //EN Duration of immune phase
logical EnableChangeContactsEvent = TRUE; //EN Enable the change contacts event
};
The ini
file for the Base
run looks like this:
[OpenM]
RunName = Base
[Parameter]
InitialDiseasePrevalence = 0.1000
[Microdata]
ToDb = yes
Host = report_time, disease_phase, age_infected
In the Base
run, 501 Hosts
are infected at the beginning of the simulation.
The time evolution of the susceptible population in Run 1 (Base
with shared random streams) looks like this:
Base susceptibles, with shared random streams
Before the simulation starts, all 5,000 Hosts
are in the susceptible state.
Next, 501 Hosts
are randomly infected and change disease status from susceptible to latent.
This is the cause of the immediate drop of the count of susceptible Hosts
from 5,000 to 4,499 in the chart.
The following plateau is caused by the length 5.0 latent period of the 501 infected hosts during which no forward infections occur.
At the end of the plateau, the 501 Hosts
leave the latent phase and enter the infectious phase.
Then the infection spreads through the social network, progressively reducing the susceptible population.
At around time 30.0, almost all Hosts
have been infected and less than 25 susceptible Hosts
remain.
At time 35.0 the 501 initially infected Hosts
lose protective immunity and become susceptible once again, causing a new plateau in the count of susceptible Hosts
.
Starting at time 41.0 progressively more Hosts
lose protective immunity.
The disease then propagates among the new and growing population of susceptible Hosts
.
The number of susceptible Hosts
peaks at around time 57, and decreases subsequently as the disease infects more and more susceptibles.
At time 82 less than 100 susceptible hosts remain.
A new epidemic wave, with initial conditions smoothed out, commences towards the end of the simulation.
The same chart for Run 3 (Base
with local random streams) looks like this:
Base susceptibles, with local random streams
This Base
run with local random streams looks very similar to the Base
run with shared random streams immediately above,
despite all RNG draws differing during the simulation phase of the two runs.
This is because the size of the population and the number of RNG draws dominates the effects of individual idiosyncratic RNG draws in the two runs.
Put another way, the two versions of IDMM
are statistically equivalent.
Exactly the same 501 Hosts
are infected in Run 1 and Run 3, despite Run 3 using the version of IDMM
with local random streams.
That's because the initial infections are implemented during the creation of the starting population, before the simulation begins. Calls to RNG functions before the simulation starts use the usual shared random streams, not local random streams, as explained in RNG use before simulation.
[back to illustrative example]
[back to topic contents]
The Variant
runs are the same as the Base
runs, except for a very slightly higher probability of initial infection of 0.1001 versus 0.1000 in the Base
runs.
The ini
file for the Variant
runs looks like this:
[OpenM]
RunName = Variant
[Parameter]
InitialDiseasePrevalence = 0.1001
[Microdata]
ToDb = yes
Host = report_time, disease_phase, age_infected
503 Hosts
are infected at the beginning of the simulation in the Variant
run.
That's 2 more than in the Base
run.
In the Variant
runs, 501 of the 503 infected Hosts
are identical to those infected at the beginning of the Base
runs.
The same 2 additional Hosts
are infected in Variant
runs 2 and 4.
The time evolution of the susceptible population in Run 2 (Variant
with shared random streams) looks like this:
Variant susceptibles, with shared random streams
The aggregate effect of 503 initially infected Hosts
vs. 501 initially infected Hosts
is not visually perceptible, as might be expected.
The time evolution of the susceptible population in Run 4 (Variant
with local random streams) looks like this:
Variant susceptibles, with local random streams
Run 2 and Run 4 are indistinguishable visually, despite having entirely different RNG draws in the simulation phase. They are by construction statistically equivalent.
[back to illustrative example]
[back to topic contents]
Base-Variant coherence is measured by counting the number of Hosts
which have an identical infection history in the Base
and Variant
runs.
The infection history of each Host
is summarized in a 64-bit hash which combines the exact (fractional) ages of all previous infections experienced by the Host
.
Base-Variant coherence is evaluated at each of 101 time points in the runs by merging Base
and Variant
microdata population snapshots which are output at each time point.
The time evolution of Base-Variant coherence with shared random streams (runs 1 and 2) looks like this:
Base-Variant population coherence with shared random streams
Base-Variant coherence from time 0 to time 5 is constant at 4,998.
As explained previously, exactly 2 Hosts
differ between Base
and Variant
at the beginning of the runs,
and no forward infections occur during the latent phase.
Base-Variant coherence is lost precipitously when the infectious phase begins,
because of shared random streams among all Hosts
in the population.
Virtually no coherence remains by time 72.
The time evolution of Base-Variant coherence with local random streams (runs 3 and 4) looks like this:
Base-Variant population coherence with local random streams
As in the previous chart, Base-Variant coherence starts at 4,998, with exactly 2 Hosts
differing at the beginning of the simulation.
The same initial plateau is present, but is difficult to distinguish visually.
Coherence remains very high but is gradually lost as the effects of the tiny difference in initial conditions propagate in a causative way through the contact network and affect progressively more of the population.
Coherence remains above 99% until time 56. Coherence then decreases more rapidly until time 75, perhaps because of surging infections during that interval. Coherence drops to 90% by the end of the simulation.
The two versions of IDMM
in this example, the one with and the one without local random streams, are statistically equivalent at the aggregate level.
However, the difference in coherence at the micro level is striking.
The population of Hosts
in IDMM
is highly connected and interactive.
Models with less connected and less interactive populations may well have very high coherence if they use local random streams.
[back to illustrative example]
[back to topic contents]
The version of IDMM
used in this example differs slightly from the version in the OpenM++ distribution in OM_ROOT/models/IDMM
.
The differences are as follows:
- The
Host
attributesage_infected
andhistory_hash
were added and implemented to measure coherence between runs. - Code to infect a portion of the population at the beginning of the simulation was moved from
Host::Start
to theSimulation
function, so that the same global random streams are used in all 4 runs in the example, ensuring that the starting populations are the same. - A custom version of
Host::get_microdata_key()
was added to provide a unique microdata key for eachHost
at each value ofreport_time
. -
initialize_local_random_streams
was called explicitly inHost::Start
to permit use ofRandUniform
later inHost::Start
to schedule the firstContactEvent
and the firstChangeContactsEvent
. -
Ticker::TickEvent
was modified to write microdata for eachHost
at each time tick. - The
REPORT_TIME
range was enlarged from 20 to 100. - The options
local_random_streams
,microdata_output
, andresource_use
were enabled. - Two unused
Host
attributesz_score
andl_score
were added and implemented to test the local random streams implementation ofRandNormal
andRandLogistic
.