Project Meeting 2020.11.03 - ActivitySim/activitysim GitHub Wiki

Technical call

Discuss telecommute model design with Joel
- This is the initial design, all components are up for discussion
- What exactly is telecommuting? It is the replacement of travel
- Day of the week matters a lot in telecommute modeling
- The model is based on the sandag model spec
- What about worker occupation? It is not in many synpops, but it is important for telecommute prediction
- What are the policy knobs for what-if analysis? Both pre and post COVID
- Income also an important variable
- Telecommute frequency model affects CDAP, INMTF, and NMTSF submodels
- How is usually work at home modeled?
- Add a work from home or work out of the home model as well
- If work from home, then there's no commute to replace with telecommuting
- ActivitySim doesn't have a work from home model but it needs it
- DaySim has it
- MAG telecommute model is like SANDAG's model
- MAG did COVID scenario analysis with it; Joel has a paper they wrote
- They varied work from home rates by worker occupation
- The SEMCOG data for estimation should work fine
- The MWCOG model, where RSG is also building an ActivitySim model, has few employment types
- Recommend:
  - Worker telecommute frequency model
  - Work from home extension to work location choice
  - Use SEMCOG data to estimate
- Work from home model includes an employment accessibility term
- SANDAG work from home model doesn't have occupation/industry since estimated from old survey
- Person occupation/industry can be supported by activitysim since it is just additional data on the person table
- The location choice size term selector is user defineable but can be just one person variable and it is currently income
- Person occupation is important for COVID analysis
- We won't implement downstream effects for all models, just some to illustrate that it works; we don't want to bit off too much at once
- SEMCOG wants to get going so comments due next week
- Can we include transit service in the model estimation? The bay area data shows this matters for telecommuting
- Maybe we could pool the bay area data and the SEMCOG data but that probably too big for the scope
- SEMCOG will contribute the example and so could be updated for a bay area example
- We'll create a wiki page with Joel's initial design presentation and Wu's background info
- The presentation is sufficient for the first deliverable
Discuss progress on TVPB / skim performance and caching strategies with Doyle
- Jeff Newman's feather backend for skims testing is very promising
- So Doyle is testing replacing (or adding) a feather memmap caching backend replacement to activitysim to substitute for in-memory skims
- This would free up a lot of ram to use for other purposes and hopefully get overall faster runtimes
- In addition, the skim architecture needed some re-writing / updating based on all the updates over the years so that's getting cleaned up too
- There are performance differences between numpy memmap and feather memmap
- Instead of 6 GB of RAM for the TM1 skims, there's a memmap file on disk and 200 MB RAM usage
- Performance is similar so far singled threaded
- This could replace the need for scaled int skims
- Could still store scaled int skims in the cache too in order to reduce it
- With memmapped designs, you need to organize the data in continuous blocks for how the data will be accessed in order to get good performance
- So making the first index in the existing ODT (Origin, Destination, TimeOfDay) queries, the TimeOfDay dimension makes the queries much faster
- This means rearranging the skims after reading them from the OMX files but before putting them in the cache
- So may be able to avoid the existing multiprocessing shared memory setup that currently handles skims and needs to be extended for multiple zones and TVPB data
- SANDAG has 4D skims, with VOT bin as well, is that a problem? It can either be collapsed to a 3rd dimension or support can be extended with a little work
- How much disk space is the cache? About the same as RAM would be
- Want to use a fast SSD drive
- Disk/RAM usage in modern OSs is changing because of how paging works so the traditional disk/RAM distinctions are blurring
- Have you tested threading yet? Not yet but believes it should work because the OS is doing the paging
- All this work is in support for improved skim/tvpb data management for eventual tvpb performance tuning
- If we can use the cache really fast then let's do that, it makes the whole system easier to maintain, if not, then back to multiprocessing shared memory objects
- Much RAM usage is for the household processing chunks/threads so maybe we should focus more on that?
- We could work on slimming them down in terms of data types
- TOD is stored as 8 bytes for example
- CT-RAMP has this issue as well
- Freeing up more RAM to give to the chunks/threads would provide more throughput as well
- Expect to focus on tvpb data in addition to just skims later this week
Discuss progress on running the full scale TM2 Marin work tour mode choice example with me
- I've got the full scale Marin TM2 work tour mode choice example running on my development machine
- It's just running single threaded for now since Jeff working on the multiprocessing/performance stuff
- It includes the tap lines trimming functionality
- I've summarized mode shares and boarding tap counts
- Work tour transit mode share is 19% in Marin TM2 and 18% in asim right now
- But no drive transit, which is 6% in Marin TM2
- Also the tap count distribution looks pretty good, but there are some outliers that need review, such as tap 5117
- There's lots of good logging/tracing to review for debugging
- Plan to trace a couple HHs to find the issues - no drive transit and tap 5117 being popular in TM2 but not in asim
Clint and Jeff to discuss ARC runtimes and improvements