Caching Input Population Dataframes (Measles) - laser-base/laser-core GitHub Wiki

In the current LASER-NNMM (model), we have developed the ability to cache the input population "dataframe" -- set of input vectors.

The following is a plot of total run times for values of EULA age from 3 to ~35. (More data coming soon.) We ran the model on a VM with 12 cores. Most step functions were still in numba.

We see that the wall-clock simulation time can get down to ~75 seconds on this particular platform. That compares to a full-simulation (no EULA) wall-clock time of ~6 minutes 30 seconds (~400s). There's a broadly linear relationship between the age of eula-gizing and wall-clock time, as one might expect, though that linear relationship really starts at ~5yo, not below.

Caching the input file provides an additional benefit. Caching seems to get a 75s sim down to 35s. And the absolute benefit grows with sim time/EULA age.

It's worth mentioning that one limitation of this experiment/test was that we were adding cached files as we went, and the process of discovering whether the current set of inputs has a cached input file already involves searching through all the hdf5 files on disk in the cache dir until one is found. We only have to inspect the attributes section, but the mean worst case scenario (is that a thing?), with 100 cached files, is that we have to search through 50 files each time. It would be worth testing wallclock times of all EULA ages with the full set of cached files present. Note that these files are rather large and some platforms have run out of disk space during the test.

The "net net" conclusion, at this point least, is that with EULA (age=5) and caching, a 400s simulation can be brought down to ~45s (35-50s observed).

CAVEAT: I'm honestly still not 100% confident that my test environment is completely valid.

Time to see what compiled-C acceleration (with OpenMP) can do from here.