Started working on data pipelining revisions, mainly support for restartable model runs
For example, we have a model with three sub-models A, B, and C. Yesterday we ran sub-models A, B, and C and the data store kept track of the fact that sub-models A, B, and C were run. Today when we run just sub-model C, ActivitySim first checks to make sure sub-models A and B were run and then it runs sub-model C.
Not using a new framework, but instead creating our own so it can work with orca
We are reducing our dependency on orca though, and also not abandoning it since that would be too expensive
Orca tables are being saved to the datastore as pandas data frames and then being wrapped as orca tables on I/O
@RSG will have more to share on our next call and draft a [Design | Data-Pipelining-and-Random-Number-Sequencing-Design] page
Started working on as-stable-as-possible random number sequencing across scenarios, sample rates, etc.
This feature works in conjunction with the restartable data pipeline
A random number stream is attached to each household, person, and trip and then an offset is used for each sub-model that is run
For example, when the model runs sub-model A it uses the first offset, sub-model B uses the second offset, and sub-model C uses the third offset. If we restart the model run at sub-model C, it sees in the datastore that sub-models A and B were run and that offsets 1 and 2 have already been used as well.
The offsets are by sub-model run order, not sub-model name; this is more flexible and avoids requiring an a priori dictionary
@RSG will have more to share on our next call and draft a [Design | Data-Pipelining-and-Random-Number-Sequencing-Design] page