Project Meeting 2020.10.27 - ActivitySim/activitysim GitHub Wiki

Technical Call

Continue discussion on TVPB implementation
- Bill and Joel talked and the memory needs for TVPB are vectors for maz-tap and matrices for tap-tap and so the memory implications are less than discussed last week
- A primary motivation for TVPB is to use the more refined spatial system for better modeling of non-motorized travel
- MTC has a liberal maz-tap ratio that has not really been optimized on the network side
- So max walk of 3 miles is generous but trimmed to 1.2 miles in ct-ramp
- MTC has the same TVPB code as SANDAG
- Important to include the tapLines file, which lists lines served by TAP in order to prune the maz-tap possibilities list. Taps further away without new service are dropped from consideration
- TM2 transit network is pretty disaggregate since it was based on GTFS and therefore has route variations (skipped stops) and so lots of routes are retained
- The transit routes are currently being rebuilt to make the routes more planning like (i.e. abstract)
- For efficiency, maz-tap and tap-tap utilities are calculated just once and on-demand (and could be pre-calculated)
- Code then loops across possible access and egress tap pairs, adds the already calculated path utility components, ranks the tap-pairs, and selects the best N
- If transit selected, then makes a choice from the best N
- It uses generic utilities for ranking and then re-calcs person specific utilities in mode choice for just the N best
- We might have a formal write-up on the design from when we spec'd this out with Dave
- Follow-up - here's the TM2 design papers and I don't see a really useful doc
- In terms of pruning the possible paths, the tapLines idea and also skipping tap-tap pairs with IVT==0 can be done
- The pre-exponential of utilities was done in the old SANDAG version but not in the current version for TM2, SANDAG, CMAP since we introduced the person specific calcaluations
- If we pre-define market segments (which by the way we have done for asim) then we could exponentiate
- What we need is a speedy ranking procedure, good logic to skip non-relevant tap pairs, and pre-calculating utility components
- Doyle's understanding of the problem is inline with Joel's
- We have implemented the tapLines functionality
- There is no current max walk distance, but we could add this as a settings (say 1.2 miles like Marin)
- We're not currently caching maz-tap calculations since they are very small and already super fast
- But if the maz-tap calculations get more complex, then we may want to cache
- tap-tap utility is being computed on demand and saved to a cache for future use
- Asim code is working in chunks and so we need to de-duplicate calculations within chunks; this is now working
- You can also retain your cache for a later subsequent run if desired
- We're not doing person specific utilities, just using pre-defined market (demographic) segments, as spec'd
- We've not implemented the optimization of skipping tap pairs with no IVT
- CT-RAMP had a UEC feature where it skipped the 2+ alternatives if an expression NA'd alt 1 and the expression applies to all alts
- Maybe we add a tap-tap utility filter expression file; like a constraint matrix in EMME
- This would be a good generic improvement that applies to all activitysim expression solving
- TM2 has a pre-processor to turn off duplicate tap pairs across skim sets as well
- Testing on both Marin and SF county since they have different maz-tap ratios
- Currently implemented optimizations with runtimes
  - Test example - 18 minutes
  - plus Remove redundant calcs within chunks - 7 minutes
  - plus tap-tap caching - 4 minutes
  - plus tap line pruning - 45 seconds
  - All together - 23 secs
- Arrow/feather being used for the caching
- Saves a lot of memory and runs super fast
- The question now is how to implement dynamic and growing cache across multiprocesses
- Memory requirements / performance / synchronization across processes and the need to avoid blockages
- Can store tables in memmapped way on disk to free up RAM
- May work for other shared data - skims and shadow prices
- Basically use arrow for shared memory objects
- Arrows will likely be the replacement for pandas backend
- pandas uses numpy as a backend today
- Here's the key article from the pandas created that spawn arrow; thanks Stefan
- arrow includes things like native support for null values, better support for columns of different types, etc; its basically a better pandas backend
- arrow in-memory like pandas, for tables, but no helper functions
- feather is the file format and is super smart and uses memory mapped
- arrow is 1D arrays so to wrap with numpy you reshape on-demand
- This would add new dependencies to activitysim, which we need to be mindful of
- Next steps
  - ct-ramp behavior has been replicated
  - figure out how to multiprocess and share / update data
  - maybe arrow/feather
  - maybe replace string operations with factors for more efficient data storage in numpy
    - factors not supported in pandas hdf5 storage so would need to wrap I/O currently
    - need to know universe of factors when creating
    - factors have better support in arrow
- We're at the point where we're trying the few possible good ideas based on the abstract architecture design
- Jeff Newman says using arrow for skims really works
- He treated them as a column and reshaped with I/O
- Can't compress on disk so files same size as in memory
- Here's Newman's prototype; thanks Jeff
- Going with an on-demand approach since geographic organization doesn't really work
- TM2 disaggregate accessibilities will eventually use this code as well
- Basic idea - create a small carefully controlled synpop that covers the markets, run the models to get the destination choice logsums, and then use these instead of the aggregate accessibilities
- This is beyond this exercise, but its a good idea since it means consistent mode choice models for accessibility and actual mode choice, and planned for semcog
- Jeff to soon share example with me so I can start comparing results to Marin TM2 and Jeff can continue with performance tuning
Discuss CDAP larch integration progress
- EDB larch reader working and notebook drafted
- Doyle to update the cdap coefficient files and code so we have named coefficients as opposed to just values
- Do this after TVPB is in a good place
- Will do our best to transform duplicate values into one coefficient so estimation is more stable
- Joel share an example of the CDAP model since its complicated; thanks Joel for sharing the slides via email
- Still need to write out the updated coefficient file; waiting on Doyle to update the format and then will implement
- Now turn to the non-mandatory tour frequency model, which is the only model that implements the interaction_simulate EDB
Discuss ARC progress, questions, etc.
- Everything stood-up, including new trip scheduling choice submodel, trip departure choice submodel, and cbd parking location submodel
- ARC model is running from start to finish!
- The trip departure choice model is very slow at this point, still working on performance, it builds many alternatives
- Need to create tests cases for all three models for contribution
- Have code and docs done
- PSRC's RAM issues were actually chunk size related
- ARC is running slower than expected; maybe due to chunk size?
- The adaptive chunker should help here; its in the multi-zone branch
Joel join next week to discuss telecommuting