Project Meeting 2021.01.12 - ActivitySim/activitysim GitHub Wiki

Technical Call

Discuss Doyle completing estimation mode for trip mode choice
- Trip mode choice now done and was quite complex and needed to update the 2 and 3 zone examples as well
- Results not exactly the same since we merged in some other changes that changed results
- I can now review the estimation branch
- Doyle to now work on performance tuning (see below)
Discuss Jeff Newman's nmtf estimation notebook
- Will rebase his cdap and nmtf notebooks off of estimation branch
- And send me note to review when ready
- Jeff will also test the other estimation notebooks and update them to be more automatable/testable since they're sort of software and sort of training materials right now
- Then we'll pause the task and discuss how to make fewer smarter / more generic notebooks rather than a bunch of separate notebooks for each submodel
Discuss Joe's copyright and licensing history
- The basic plan is to:
- Update the license so it says Copyright AMPORF
- Make sure work for hire is in the contracts
- Add a Contributor License Agreement like this one and have GitHub automatically manage it with Pull Requests
- Refactor out the orca code under a bench contract work order
- The license will stay BSD-3
- Let's discuss the partner MOU/payment agreement on Thursday
Discuss PSRC progress and need for location sampling improvements
- 40k MAZ model with 100k HHs is up and running from start to finish
- Ran into issues with location sampling since it's considering all 40k alternatives every time for every chooser
- Would be good to do something like DaySim's two stage sampling approach - basically do it at the taz level and then pick an maz within each taz based on the maz's share of the size term. It also pre-calculates the taz to taz probability matrix and then just draws random numbers and picks a zone.
- We could also filter on size term > 0 before solving expressions, which would help with sparse alternative sets like school/university
- These issues are similar to the performance improvements tasks below so let's add this to that discussion for consideration
- Using importance sampling like DaySim is a good approach and we'll need to implement something like it for the SANDAG cross border model
- Could do it at the start of each process since its fast and then we don't have to persist / share across processes if done earlier before multiprocessing
Discuss Doyle's list of potential performance improvements
- expression file optimization
  - good to make these templates as smart as possible since new users rely on them
  - could speed up the model maybe 50%?
- finish adaptive chunking
  - automatically determine chunksize based on available memory
  - explore adaptive chunking based on actual memory usage rather than ‘registered’ data objects
- deduplicate alternatives as discussed for ARC
- cache logsums, which needs categories/market segments defined
  - would help a lot; would require a lot of plumbing updates
- data format and size optimization.
  - e.g. rightsize numbers and convert strings to factors
- pipeline optimization
  - alternative pipeline file format (e.g. feather)
  - improve control over checkpointing (pipeline footprint and read/write time)
  - Alex likes this one, see his email on not having multiple growing copies of the pipeline tables as the submodels run, but one version of the tables with fields tied to submodels
- two stage location sampling for PSRC like DaySim
- run ARC and optimize where appropriate
- run PSRC and optimize where appropriate
- We will run the ARC and PSRC versions of the model and review submodel and component runtimes and memory usage to help inform our discussion of what to work on
- Will review logs and snakeviz profiler results
- Stefan and Guy/Clint to share setups
- It would be good to understand level-of-effort since maybe we can do a few smaller easier ones and then a big one or two?
- We'll continue the performance tuning todo discussion next week with Jeff's findings and additional comments from the team