Project Meeting 2021.01.12 - ActivitySim/activitysim GitHub Wiki
Technical Call
Discuss Doyle completing estimation mode for trip mode choice
Trip mode choice now done and was quite complex and needed to update the 2 and 3 zone examples as well
Results not exactly the same since we merged in some other changes that changed results
I can now review the estimation branch
Doyle to now work on performance tuning (see below)
Discuss Jeff Newman's nmtf estimation notebook
Will rebase his cdap and nmtf notebooks off of estimation branch
And send me note to review when ready
Jeff will also test the other estimation notebooks and update them to be more automatable/testable since they're sort of software and sort of training materials right now
Then we'll pause the task and discuss how to make fewer smarter / more generic notebooks rather than a bunch of separate notebooks for each submodel
Add a Contributor License Agreement like this one and have GitHub automatically manage it with Pull Requests
Refactor out the orca code under a bench contract work order
The license will stay BSD-3
Let's discuss the partner MOU/payment agreement on Thursday
Discuss PSRC progress and need for location sampling improvements
40k MAZ model with 100k HHs is up and running from start to finish
Ran into issues with location sampling since it's considering all 40k alternatives every time for every chooser
Would be good to do something like DaySim's two stage sampling approach - basically do it at the taz level and then pick an maz within each taz based on the maz's share of the size term. It also pre-calculates the taz to taz probability matrix and then just draws random numbers and picks a zone.
We could also filter on size term > 0 before solving expressions, which would help with sparse alternative sets like school/university
These issues are similar to the performance improvements tasks below so let's add this to that discussion for consideration
Using importance sampling like DaySim is a good approach and we'll need to implement something like it for the SANDAG cross border model
Could do it at the start of each process since its fast and then we don't have to persist / share across processes if done earlier before multiprocessing
Discuss Doyle's list of potential performance improvements
expression file optimization
good to make these templates as smart as possible since new users rely on them
could speed up the model maybe 50%?
finish adaptive chunking
automatically determine chunksize based on available memory
explore adaptive chunking based on actual memory usage rather than ‘registered’ data objects
deduplicate alternatives as discussed for ARC
cache logsums, which needs categories/market segments defined
would help a lot; would require a lot of plumbing updates
data format and size optimization.
e.g. rightsize numbers and convert strings to factors
pipeline optimization
alternative pipeline file format (e.g. feather)
improve control over checkpointing (pipeline footprint and read/write time)
Alex likes this one, see his email on not having multiple growing copies of the pipeline tables as the submodels run, but one version of the tables with fields tied to submodels
two stage location sampling for PSRC like DaySim
run ARC and optimize where appropriate
run PSRC and optimize where appropriate
We will run the ARC and PSRC versions of the model and review submodel and component runtimes and memory usage to help inform our discussion of what to work on