Project Meeting 2020.10.20 - ActivitySim/activitysim GitHub Wiki

Technical call

  • Need to fix https://github.com/ActivitySim/activitysim/pull/349 asap
    • SEMCOG got stuck running the example according to the website instructions
    • We'll pull and release a new version today
    • Probably better not to lock down the version but let it take advantage of dependency updates and fix issues as they arise
  • Pre-computing / caching to support the TVPB
  • Jeff making progress
    • He's working on understanding the tradeoffs of pre-computing versus on-demand
    • He implemented not re-calculating duplicate tap-to-tap utilities and this sped up Marin by 4x
    • He implemented caching for tap-to-tap utilities, but this was less advantageous
    • If we can, we may want to cache the n-best-paths list for omaz,dmaz,tod,demographic_segment
    • Maybe we cache it using a fast and multi-process friendly technology such as arrow/feather
    • And then we either pre-compute or possibly update on-demand
    • For a full sample, pre-compute might be better, but for a 100 HH sample, maybe on-demand is better
    • It depends on how sparse the data is
  • Discussion
    • I tried my best to explain things, but I think Doyle needs to explain next time
    • What's the dimensionality of the problem?
      • Marin TM2: 6000 mazs, 6200 taps, the average TAP to MAZ ratio for walk access is 114, the average for drive is 7. Note that this does not include the tap_serves_new_lines function (aka tapLines), which trims MAZ to TAP pairs based on if TAPs further away do not serve new lines. If we cropped to the 1.2 miles used for the tap_serves_new_lines function then we get 63 taps per maz.
    • Marin has a collapsed set of MTC TM2 mazs, which is 30k mazs
    • We think it makes sense to pre-compute the path components, but we're not sure about the N-best tap pairs since it's very big and sparse
    • Pre-computing seems like a reasonable/understandable/simple solution - just compute the components (in parallel by omaz), save them, and look them up later. It may not be completely optimal, but it also might be easier for code maintenance and developer use than something a bit better but more complex
    • Does pre-computing create too big a file and how sparse is the data set and therefore is the tradeoff not worth it?
    • This depends a bit on the settings we spec'd that are consistent with TM2:
      • max_paths_across_tap_sets: 3, which is the number of N-best tap pairs in total to keep for each omaz,dmaz,tod,demographic_segment
      • max_paths_per_tap_set: 1, which is the number of N-best tap pairs to keep within each skim set (premium, local, etc.)
    • Marin TM2 has 6000 mazs * 63 taps * 63 taps * 6000 mazs * 5 time periods * 3 demographic segments = 2,143,260,000,000 (2 trillion) potential paths but not all evaluated since many not needed and some tap-tap pairs not available (including in all time periods, etc)
    • This is a big number so we want to implement a solution that considers tradeoffs of runtime, ram, disk space, behavioral design, code maintenance, developer burden, etc.
    • Jeff to give us an update next week
  • Profiling of memory usage:
    • MTC skims 6.7gb in memory, 826 skims for 5 time periods * 1475 zones
    • SEMCOG skims 47gb 1480 skims for 5 time periods * 2900 zones
    • @Stefan to add PSRC numbers - would be 60gb for 870 skims for 12 time periods * 3900 zones
    • Multiprocessing creates lots of table using chunksize and so this uses lots of memory as well
    • Next time discuss stats on pipeline table sizes
  • Discuss estimation feature completion progress and #354 next time