Run Memory Prediction - openmpp/openmpp.github.io GitHub Wiki
Home > Model Development Topics > Run memory prediction
This topic is under construction and/or revision.
Topic summary, two sentences max.
Content to follow.
This subtopic describes a series of experiments probing memory stress in a complex time-based model (c. 2023). The model stores and calculates a significant amount of data for each member of the population. Data requirement increase as time progresses in a run. As a time-based model, data requirements vary with the size of the starting population.
The model runs fine with a small starting population size. However, anomalies occur when the size of the starting population is large.
Both platforms were virtual machines (VM).
The memory in the Linux experiments was varied by configuring the VM before boot.
All runs in the following table were identical, with a starting population of ~900k.
Test # | Platform | Memory | Run status | Elapsed time | Notes |
---|---|---|---|---|---|
1 | Windows | 8 GB | Success | 1d 3h 38m | Some anomalies at end of run, probably from exceeding OS-imposed maximum time for an open idle file. |
2 | Linux | 4 GB | Crash | - | Model crash during creation of the starting population. |
3 | Linux | 8 GB | Crash | 50m | Model crash after progressive slowdown (~50m at 51% complete). |
4 | Linux | 12 GB | Crash | 55m | Model crash after progressive slowdown (~55m at 97% complete). |
5 | Linux | 16 GB | Success | 29m | |
6 | Linux | 20 GB | Success | 30m |
Model crash is the ultimate in memory stress, but progressive slowdown is also a sign. As the amount of free memory decreases, it becomes more difficult for memory allocation algorithms to find a free chunk of memory of a given size. The population is continually increasing during the run, so increasingly critical memory stress can occur progressively as simulation time advances and physical memory limits are approached.
In test #3 (8 GB), a point of no return was reached at 51%, after a progressive then a pronounced slowdown.
In test #4 (12 GB), the point of return was reached at 97%, after a progressive then a pronounced slowdown.
Test #5 (16 GB) and test #6 (20 GB) have nearly identical elapsed times,
showing that 16 GB is sufficient to avoid memory stress (on Liux, probably on Windows, too).
The total amount of memory used by a model are similar in Windows and Linux versions. However, the symptoms of memory stress clearly differ depending on the OS and its configuration. The very different behaviour between test #1 and test #3 (both 8 GB but #1 Windows and #3 Linux) probably reflects the different ways the OS's handle application memory requests which exceed physical memory. The Window platform swaps physical memory to disk, with no apparent limit, and succeeds eventually with very long run times (disk is hugely slower than physical memory) The Linux platform tries to meet the progressively increasing memory demand somehow, slows down, then eventually kills the task which is "demanding too much".
The direct conclusion from these tests is that a run of this model with 900k starting population needs to run on a machine with 16 GB or more of available physical memory, whether on Windows or on Linux.
A more general conclusion is that one needs to predict, before launching a model run on a given platform/server/grid, how much memory it will need. For this model, that requires predicting memory required as a function of starting population size, which can vary from one run to another.
To predict memory use of a run, the calculation needs to be performed outside of the model, before it launches.
This is done by providing, through a set of options in model code, qualitative and and quantitative information to perform the estimation.
When the model is built this information is published as model metadata. That makes the information available outside of a model run, to estimate run memory requirements before a run is launched.
Here's an example:
options memory_popsize_parameter = MicroDataSize;
options memory_MB_constant_per_instance = 3;
options memory_MB_constant_per_sub = 0;
options memory_MB_popsize_coefficient = 0.023445;
options memory_adjustment_factor = 1.34;
More to follow.
Values for these options can be produced with a probing run,
possibly with just a small population,
turning the option resource_use
on
and specifying in the option memory_popsize_parameter
the name of a model parameter whose value scales with memory use:
options resource_use = on; // collect and report resource use information at run-time.
options memory_popsize_parameter = MicroDataSize;
With these options, the model will gather memory information during the probing run, perform some secondary calculations in addition to the normal tables produced by Model Resource Use.
Towards the beginning of the resource report in the model log,
you'll find a section Resource Use Prediction
containing a summary table and proposed generated option statements to use for run memory prediction for subsequent runs.
Here's an example extract from a run log:
2025-03-31 01:10:49.954 *****************************
2025-03-31 01:10:49.955 * Resource Use Prediction *
2025-03-31 01:10:49.957 *****************************
2025-03-31 01:10:49.958
2025-03-31 01:10:49.959 +--------------------------------+
2025-03-31 01:10:49.960 | Resource Use by Persistence |
2025-03-31 01:10:49.962 +------------------------+-------+
2025-03-31 01:10:49.964 | Persistence | MB |
2025-03-31 01:10:49.966 +------------------------+-------+
2025-03-31 01:10:49.967 | Constant per instance | 3 |
2025-03-31 01:10:49.969 | Constant per sub | 0 |
2025-03-31 01:10:49.970 | Variable by popsize | 2387 |
2025-03-31 01:10:49.972 +------------------------+-------+
2025-03-31 01:10:49.973 | Total | 2390 |
2025-03-31 01:10:49.973 +------------------------+-------+
2025-03-31 01:10:49.975
2025-03-31 01:10:49.976 // The following memory prediction options statements
2025-03-31 01:10:49.977 // for PASSAGES were generated on 2025-03-31 01:09:49.852
2025-03-31 01:10:49.977 // using as popsize the parameter MicroDataSize = 101824
2025-03-31 01:10:49.978 options memory_popsize_parameter = MicroDataSize;
2025-03-31 01:10:49.979 options memory_MB_constant_per_instance = 3; // was 3
2025-03-31 01:10:49.981 options memory_MB_constant_per_sub = 0; // was 0
2025-03-31 01:10:49.983 options memory_MB_popsize_coefficient = 0.023445; // was 0.023445
2025-03-31 01:10:49.985 options memory_adjustment_factor = 1.34;
Trim the date-time stamp off the beginning of the generated model code lines then paste them into model code, e.g. in options_ompp.ompp
.
After the probing run, be sure to comment out the line which activated resource use.
After the probing run is done and the new code statements pasted,
an extract from ompp_options.ompp
might look like this:
//
// model resource options
//
//options resource_use = on; // collect and report resource use information at run-time.
// The following memory prediction options statements
// for PASSAGES were generated on 2025-03-28 06:43:02.778
// using as popsize the parameter MicroDataSize = 101824
options memory_popsize_parameter = MicroDataSize;
options memory_MB_constant_per_instance = 3; // was 3
options memory_MB_constant_per_sub = 0; // was 0
options memory_MB_popsize_coefficient = 0.023445; // was 0.023872
options memory_adjustment_factor = 1.34;
Under construction. Worked example
This subtopic contains the following sections.
- Manual calculation - Run 1
- Manual calculation - Run 2
- Manual calculation - Run 3
- Manual calculation - Run 4
GMM Run #1 10k ompp_options.ompp:
//
// model resource options
//
options resource_use = on; // collect and report resource use information at run-time.
// The following memory prediction options statements
// for GMM were generated on 2023-04-01 18:03:36.340
// using as popsize the parameter StartingPopulationSize = 10000
options memory_popsize_parameter = StartingPopulationSize;
options memory_MB_constant_per_instance = 0; // was 1
options memory_MB_constant_per_sub = 4; // was 5
options memory_MB_popsize_coefficient = 0.002142; // was 0.002000
options memory_adjustment_factor = 1.10;
GMM Run #1 10k log:
2025-03-24 00:41:28.423 *****************************
2025-03-24 00:41:28.424 * Resource Use Prediction *
2025-03-24 00:41:28.425 *****************************
2025-03-24 00:41:28.426
2025-03-24 00:41:28.427 +--------------------------------+
2025-03-24 00:41:28.428 | Resource Use by Persistence |
2025-03-24 00:41:28.429 +------------------------+-------+
2025-03-24 00:41:28.431 | Persistence | MB |
2025-03-24 00:41:28.433 +------------------------+-------+
2025-03-24 00:41:28.434 | Constant per instance | 0 |
2025-03-24 00:41:28.435 | Constant per sub | 6 |
2025-03-24 00:41:28.436 | Variable by popsize | 21 |
2025-03-24 00:41:28.437 +------------------------+-------+
2025-03-24 00:41:28.438 | Total | 28 |
2025-03-24 00:41:28.440 +------------------------+-------+
2025-03-24 00:41:28.441
2025-03-24 00:41:28.442 // The following memory prediction options statements
2025-03-24 00:41:28.443 // for GMM were generated on 2025-03-24 00:33:34.852
2025-03-24 00:41:28.444 // using as popsize the parameter StartingPopulationSize = 10000
2025-03-24 00:41:28.445 options memory_popsize_parameter = StartingPopulationSize;
2025-03-24 00:41:28.446 options memory_MB_constant_per_instance = 0; // was 0
2025-03-24 00:41:28.448 options memory_MB_constant_per_sub = 6; // was 4
2025-03-24 00:41:28.449 options memory_MB_popsize_coefficient = 0.002169; // was 0.002142
2025-03-24 00:41:28.451 options memory_adjustment_factor = 1.10;
[back to manual calculation]
[back to topic contents]
GMM Run #2 100k ompp_options.ompp:
//
// model resource options
//
//options resource_use = on; // collect and report resource use information at run-time.
// The following memory prediction options statements
// for GMM were generated on 2025-03-24 00:33:34.852
// using as popsize the parameter StartingPopulationSize = 10000
options memory_popsize_parameter = StartingPopulationSize;
options memory_MB_constant_per_instance = 0; // was 0
options memory_MB_constant_per_sub = 6; // was 4
options memory_MB_popsize_coefficient = 0.002169; // was 0.002142
options memory_adjustment_factor = 1.10;
GMM Run #2 100k Log:
2025-03-24 00:55:43.392 member=0 Predicted memory required = 245 MB per parallel sub and 0 MB per instance
...
2025-03-24 02:20:05.173 Process peak memory usage: 637.01 MB
...
Clearly, for this run of GMM
, the predicted memory use of 245
MB was very different from the actual peak memory use of 637
MB.
A manual calculation is called for.
[back to manual calculation]
[back to topic contents]
GMM Run #3 110k Log:
2025-03-24 15:27:01.431 Process peak memory usage: 662.32 MB
Calculations:
Run #3 population size was 110,000
compared to 100,000
for Run #2.
The peak memory use reported in the line Process peak memory usage:
in the run logs was
637.01 MB
for Run #2 and 662.32 MB
for Run #3.
So, an additional 10000
in population size required an additional 25.31
MB of memory.
Combining these numbers gives a marginal requirement of 0.002531
MB of memory per unit of population size.
That produces the manual calculation of the variable component:
options memory_MB_popsize_coefficient = 0.002531;
The constant portion of memory use can be computed using that coefficient assuming linearity.
Using the marginal coefficient calculated above, the variable component of memory use for Run #2 is
0.002531 * 100000 = 253.10 MB
.
The constant portion of memory use is the difference between peak memory use and variable memory use:
637.01 - 253.10 = 383.91 MB
.
This is a mixture of per instance and per sub memory use.
The per instance and per sub portions of constant memory use could be distinguished using additional probing runs with and without multiple subs in an instance.
In this example, that might be a Run #2b with two subs in a single instance, each of size 100,000
.
In practice for most model designs, the per instance portion can be rolled into the per sub portion with only minor effects. Moreover, that is a conservative assumption for predicting maximum memory requirements.
As it turns out, each sub in a GMM
multi-sub needs to run in its own distinct process anyway, for technical reasons,
so there is no practical distinction between per instance and per sub memory use for GMM
.
So, the option settings for GMM
constant memory are:
options memory_MB_constant_per_instance = 0;
options memory_MB_constant_per_sub = 384;
Putting it all together,
here is the model code fragment containing all the option settings
for predicting memory requirements for GMM
runs like Run #2:
// The following memory prediction options statements
// for GMM were estimated manually on 2025-03-24
// using as popsize the parameter StartingPopulationSize.
// A pair of runs were used in the estimation,
// the first with a population size of 100000 and the second with 110000.
options memory_popsize_parameter = StartingPopulationSize;
options memory_MB_constant_per_instance = 0;
options memory_MB_constant_per_sub = 384;
options memory_MB_popsize_coefficient = 0.002531;
options memory_adjustment_factor = 1.10;
These option settings were tested with Run #4 with these option settings and a starting population size of 250,000
,
which is a typical starting population size for a run of GMM
.
Here's an extract of the run log:
[back to manual calculation]
[back to topic contents]
GMM Run #4 250k Log:
...
2025-03-27 10:31:13.071 member=0 Predicted memory required = 1118 MB per parallel sub and 0 MB per instance
...
2025-03-27 14:17:02.819 Process peak memory usage: 1005.64 MB
...
The predicted memory requirements worked well, for a run with over 2x the population of those used to manually compute the option values for memory prediction.
The predicted memory requirements are deliberately higher than actual peak memory use
because the option memory_adjustment_factor
increased the estimated memory use by 10%
.