Project Meeting 2020.10.27 - ActivitySim/activitysim GitHub Wiki
Technical Call
Continue discussion on TVPB implementation
Bill and Joel talked and the memory needs for TVPB are vectors for maz-tap and matrices for tap-tap and so the memory implications are less than discussed last week
A primary motivation for TVPB is to use the more refined spatial system for better modeling of non-motorized travel
MTC has a liberal maz-tap ratio that has not really been optimized on the network side
So max walk of 3 miles is generous but trimmed to 1.2 miles in ct-ramp
MTC has the same TVPB code as SANDAG
Important to include the tapLines file, which lists lines served by TAP in order to prune the maz-tap possibilities list. Taps further away without new service are dropped from consideration
TM2 transit network is pretty disaggregate since it was based on GTFS and therefore has route variations (skipped stops) and so lots of routes are retained
The transit routes are currently being rebuilt to make the routes more planning like (i.e. abstract)
For efficiency, maz-tap and tap-tap utilities are calculated just once and on-demand (and could be pre-calculated)
Code then loops across possible access and egress tap pairs, adds the already calculated path utility components, ranks the tap-pairs, and selects the best N
If transit selected, then makes a choice from the best N
It uses generic utilities for ranking and then re-calcs person specific utilities in mode choice for just the N best
We might have a formal write-up on the design from when we spec'd this out with Dave
In terms of pruning the possible paths, the tapLines idea and also skipping tap-tap pairs with IVT==0 can be done
The pre-exponential of utilities was done in the old SANDAG version but not in the current version for TM2, SANDAG, CMAP since we introduced the person specific calcaluations
If we pre-define market segments (which by the way we have done for asim) then we could exponentiate
What we need is a speedy ranking procedure, good logic to skip non-relevant tap pairs, and pre-calculating utility components
Doyle's understanding of the problem is inline with Joel's
We have implemented the tapLines functionality
There is no current max walk distance, but we could add this as a settings (say 1.2 miles like Marin)
We're not currently caching maz-tap calculations since they are very small and already super fast
But if the maz-tap calculations get more complex, then we may want to cache
tap-tap utility is being computed on demand and saved to a cache for future use
Asim code is working in chunks and so we need to de-duplicate calculations within chunks; this is now working
You can also retain your cache for a later subsequent run if desired
We're not doing person specific utilities, just using pre-defined market (demographic) segments, as spec'd
We've not implemented the optimization of skipping tap pairs with no IVT
CT-RAMP had a UEC feature where it skipped the 2+ alternatives if an expression NA'd alt 1 and the expression applies to all alts
Maybe we add a tap-tap utility filter expression file; like a constraint matrix in EMME
This would be a good generic improvement that applies to all activitysim expression solving
TM2 has a pre-processor to turn off duplicate tap pairs across skim sets as well
Testing on both Marin and SF county since they have different maz-tap ratios
Currently implemented optimizations with runtimes
Test example - 18 minutes
plus Remove redundant calcs within chunks - 7 minutes
plus tap-tap caching - 4 minutes
plus tap line pruning - 45 seconds
All together - 23 secs
Arrow/feather being used for the caching
Saves a lot of memory and runs super fast
The question now is how to implement dynamic and growing cache across multiprocesses
Memory requirements / performance / synchronization across processes and the need to avoid blockages
Can store tables in memmapped way on disk to free up RAM
May work for other shared data - skims and shadow prices
Basically use arrow for shared memory objects
Arrows will likely be the replacement for pandas backend
pandas uses numpy as a backend today
Here's the key article from the pandas created that spawn arrow; thanks Stefan
arrow includes things like native support for null values, better support for columns of different types, etc; its basically a better pandas backend
arrow in-memory like pandas, for tables, but no helper functions
feather is the file format and is super smart and uses memory mapped
arrow is 1D arrays so to wrap with numpy you reshape on-demand
This would add new dependencies to activitysim, which we need to be mindful of
Next steps
ct-ramp behavior has been replicated
figure out how to multiprocess and share / update data
maybe arrow/feather
maybe replace string operations with factors for more efficient data storage in numpy
factors not supported in pandas hdf5 storage so would need to wrap I/O currently
need to know universe of factors when creating
factors have better support in arrow
We're at the point where we're trying the few possible good ideas based on the abstract architecture design
Jeff Newman says using arrow for skims really works
He treated them as a column and reshaped with I/O
Can't compress on disk so files same size as in memory
Going with an on-demand approach since geographic organization doesn't really work
TM2 disaggregate accessibilities will eventually use this code as well
Basic idea - create a small carefully controlled synpop that covers the markets, run the models to get the destination choice logsums, and then use these instead of the aggregate accessibilities
This is beyond this exercise, but its a good idea since it means consistent mode choice models for accessibility and actual mode choice, and planned for semcog
Jeff to soon share example with me so I can start comparing results to Marin TM2 and Jeff can continue with performance tuning