Project Meeting 2020.11.03 - ActivitySim/activitysim GitHub Wiki

Technical call

  • Discuss telecommute model design with Joel
    • This is the initial design, all components are up for discussion
    • What exactly is telecommuting? It is the replacement of travel
    • Day of the week matters a lot in telecommute modeling
    • The model is based on the sandag model spec
    • What about worker occupation? It is not in many synpops, but it is important for telecommute prediction
    • What are the policy knobs for what-if analysis? Both pre and post COVID
    • Income also an important variable
    • Telecommute frequency model affects CDAP, INMTF, and NMTSF submodels
    • How is usually work at home modeled?
    • Add a work from home or work out of the home model as well
    • If work from home, then there's no commute to replace with telecommuting
    • ActivitySim doesn't have a work from home model but it needs it
    • DaySim has it
    • MAG telecommute model is like SANDAG's model
    • MAG did COVID scenario analysis with it; Joel has a paper they wrote
    • They varied work from home rates by worker occupation
    • The SEMCOG data for estimation should work fine
    • The MWCOG model, where RSG is also building an ActivitySim model, has few employment types
    • Recommend:
      • Worker telecommute frequency model
      • Work from home extension to work location choice
      • Use SEMCOG data to estimate
    • Work from home model includes an employment accessibility term
    • SANDAG work from home model doesn't have occupation/industry since estimated from old survey
    • Person occupation/industry can be supported by activitysim since it is just additional data on the person table
    • The location choice size term selector is user defineable but can be just one person variable and it is currently income
    • Person occupation is important for COVID analysis
    • We won't implement downstream effects for all models, just some to illustrate that it works; we don't want to bit off too much at once
    • SEMCOG wants to get going so comments due next week
    • Can we include transit service in the model estimation? The bay area data shows this matters for telecommuting
    • Maybe we could pool the bay area data and the SEMCOG data but that probably too big for the scope
    • SEMCOG will contribute the example and so could be updated for a bay area example
    • We'll create a wiki page with Joel's initial design presentation and Wu's background info
    • The presentation is sufficient for the first deliverable
  • Discuss progress on TVPB / skim performance and caching strategies with Doyle
    • Jeff Newman's feather backend for skims testing is very promising
    • So Doyle is testing replacing (or adding) a feather memmap caching backend replacement to activitysim to substitute for in-memory skims
    • This would free up a lot of ram to use for other purposes and hopefully get overall faster runtimes
    • In addition, the skim architecture needed some re-writing / updating based on all the updates over the years so that's getting cleaned up too
    • There are performance differences between numpy memmap and feather memmap
    • Instead of 6 GB of RAM for the TM1 skims, there's a memmap file on disk and 200 MB RAM usage
    • Performance is similar so far singled threaded
    • This could replace the need for scaled int skims
    • Could still store scaled int skims in the cache too in order to reduce it
    • With memmapped designs, you need to organize the data in continuous blocks for how the data will be accessed in order to get good performance
    • So making the first index in the existing ODT (Origin, Destination, TimeOfDay) queries, the TimeOfDay dimension makes the queries much faster
    • This means rearranging the skims after reading them from the OMX files but before putting them in the cache
    • So may be able to avoid the existing multiprocessing shared memory setup that currently handles skims and needs to be extended for multiple zones and TVPB data
    • SANDAG has 4D skims, with VOT bin as well, is that a problem? It can either be collapsed to a 3rd dimension or support can be extended with a little work
    • How much disk space is the cache? About the same as RAM would be
    • Want to use a fast SSD drive
    • Disk/RAM usage in modern OSs is changing because of how paging works so the traditional disk/RAM distinctions are blurring
    • Have you tested threading yet? Not yet but believes it should work because the OS is doing the paging
    • All this work is in support for improved skim/tvpb data management for eventual tvpb performance tuning
    • If we can use the cache really fast then let's do that, it makes the whole system easier to maintain, if not, then back to multiprocessing shared memory objects
    • Much RAM usage is for the household processing chunks/threads so maybe we should focus more on that?
    • We could work on slimming them down in terms of data types
    • TOD is stored as 8 bytes for example
    • CT-RAMP has this issue as well
    • Freeing up more RAM to give to the chunks/threads would provide more throughput as well
    • Expect to focus on tvpb data in addition to just skims later this week
  • Discuss progress on running the full scale TM2 Marin work tour mode choice example with me
    • I've got the full scale Marin TM2 work tour mode choice example running on my development machine
    • It's just running single threaded for now since Jeff working on the multiprocessing/performance stuff
    • It includes the tap lines trimming functionality
    • I've summarized mode shares and boarding tap counts
    • Work tour transit mode share is 19% in Marin TM2 and 18% in asim right now
    • But no drive transit, which is 6% in Marin TM2
    • Also the tap count distribution looks pretty good, but there are some outliers that need review, such as tap 5117
    • There's lots of good logging/tracing to review for debugging
    • Plan to trace a couple HHs to find the issues - no drive transit and tap 5117 being popular in TM2 but not in asim
  • Clint and Jeff to discuss ARC runtimes and improvements