Project Meeting 2020.12.01 - ActivitySim/activitysim GitHub Wiki

Technical Call

  • TVPB update
    • Update from Jeff Doyle on transit virtual path building performance improvements
    • TVPB Pre-computer and mode choice now running multiprocessed on Windows
    • The existing TM1 example, toy 2 zone and toy3 zone complete model runs also running on Windows
    • Below are some settings and results for discussion
    • Note previous single threaded on-demand / redo calculations runtime was 3 hours
    • We don't have a comparable runtime for a 775k work tours mode choice from Marin TM2
    • Result confirmed to be the same as before the multiprocessing work
    • Now working on verification summaries, cleaning up the examples for distribution, and user documentation
set MKL_NUM_THREADS=1

python simulation.py -c configs_3_zone_marin_full -c configs_3_zone_marin -c configs -d data_3_marin_full -o output_3_marin_full -s settings_mp.yaml

begin: initialize_tvpb
num_processes: 20
chunk_size: 2376277344 # num_taps * num_taps * rowsize / desired_num_chunks = 2376277344

begin: tour_mode_choice_simulate
num_processes: 32
chunk_size: 0

Time to execute run_sub_simulations step mp_tvpb : 670.767 seconds (11.2 minutes)
Time to execute run_sub_simulations step mp_mode_choice : 327.657 seconds (5.5 minutes)
Time to execute all models : 1333.193 seconds (22.2 minutes)
  • TVPB discussion
    • Runtimes at 22 minutes for the example, 11 minutes for the TVPB pre-computer and 5 minutes for the 775k work tours mode choice from Marin TM2
    • Note mode choice calculations are used many times in the form of logsums so this is a key part to speed up
    • Could probably be sped up even more but there's not much left in this example to bite onto
    • Separate from this work, RSG is now starting on the SANDAG cross border ActivitySim model which will use the TVPB so I expect we'll make some improvements on that project as well. It will be good to have a full scale example to continue to develop with.
    • Need to be careful programming shared memory applications with numpy or otherwise Python just replicates the memory in each process which means we run out of RAM
    • Now switching from Jeff developing to me testing, documenting, verifying
    • Runtimes depend a lot on the maz-tap density/ratio
    • Which depends on the maz-tap input files and the max_distance cutoffs
    • maz-tap pair availability can also be modified in the expressions as well
    • All this will be important to document in the user guide
    • Will try a couple different max distances and see how the runtimes compare
    • Marin example has 650,000 maz-tap pairs for walk and 6200 taps
    • One reason we still have dynamic / on-demand calculations is for tracing - if the HH ID is traced, then it re-runs the pre-computed TVPB calculations from within mode choice
    • We could do OD tracing in addition to HH ID tracing
    • Access mode is exposed to the tap-tap expressions; could easily add egress mode too if needed
    • We expect there to be future optimizations as we roll this out in a few places
    • There's maybe 15 times more tours in the full TM2 model so 15 x 5min starts to get a little long for runtimes
  • Update from Jeff Newman on estimation integration improvements
    • Nothing to report
    • Jeff Doyle now turning attention to estimation mode enhancements
  • Update from Clint on ARC related improvements
    • Added scheduler pre-processor but didn't help much
    • This pre-processor is on the choices rather than the choosers since there's lot of duplicate calcs in the choices
    • Lots of duplicate calcs in the logsums by time-of-day
    • Instead of doing logsums for each time period, could do for a representative time period within each skim to save runtime (this is done in some CT-RAMP models)
    • Parking duration is by time period so using representative time periods is a bit of an abstraction. I'll create an issue for this.
    • Clint may consolidate the group by by O,D as opposed to O, D, duration
    • Pull request for trip time-of-day choice and CBD parking location models coming soon
    • Will create example_mtc_arc_extensions example to exercise new features
  • Plan to wrap up TVPB pull request and then merge/reconcile all the PRs for a release later this month
    • Better for us to pull Clint's updates without tracing then to leave them orphaned
    • For release planning, should do periodic (every couple of weeks) review of outstanding code and pull if easy
    • Will release TVPB by end of year, along with ARC's improvements
    • May include some of the estimation improvements as well
    • We will first pull the multizone branch to develop and Clint will rebase his code off of multizone
    • It's easier for the author to deal with merging
    • I'll deal with the other smaller PRs
    • Multizone code works for both spines
  • Oregon coordinated move to ActivitySim
    • Thinking about a coordinated move to Asim by all the Oregon agencies so each doesn't have to do it on their own
    • Alex shared guidance memo drafted by Joel
    • Some good thoughts in the memo on the multizone system features for the user guide
    • For multizone approach, make sure to include network coding implications
  • Chat with LBL/Berkeley folks
    • Building BEAM/matsim/Asim/Urbansim models for DOE
    • Currently have models in Detroit, Austin, and SF with 6 more regions planned
    • They would like a more generic/simpler model to be included with asim so its easier to setup in new regions
    • Also too many detailed skims in the current example - something simpler to start would be good
    • This approach is better than to take away / hack up the TM1 example
    • Also like the idea of model design templates so new users can simply select a template and configure it
    • Reduced barriers to entry is key
    • Let's discuss these ideas in more detail at the next call