Project Meeting 2020.11.03 - ActivitySim/activitysim GitHub Wiki
Technical call
Discuss telecommute model design with Joel
This is the initial design, all components are up for discussion
What exactly is telecommuting? It is the replacement of travel
Day of the week matters a lot in telecommute modeling
The model is based on the sandag model spec
What about worker occupation? It is not in many synpops, but it is important for telecommute prediction
What are the policy knobs for what-if analysis? Both pre and post COVID
Income also an important variable
Telecommute frequency model affects CDAP, INMTF, and NMTSF submodels
How is usually work at home modeled?
Add a work from home or work out of the home model as well
If work from home, then there's no commute to replace with telecommuting
ActivitySim doesn't have a work from home model but it needs it
DaySim has it
MAG telecommute model is like SANDAG's model
MAG did COVID scenario analysis with it; Joel has a paper they wrote
They varied work from home rates by worker occupation
The SEMCOG data for estimation should work fine
The MWCOG model, where RSG is also building an ActivitySim model, has few employment types
Recommend:
Worker telecommute frequency model
Work from home extension to work location choice
Use SEMCOG data to estimate
Work from home model includes an employment accessibility term
SANDAG work from home model doesn't have occupation/industry since estimated from old survey
Person occupation/industry can be supported by activitysim since it is just additional data on the person table
The location choice size term selector is user defineable but can be just one person variable and it is currently income
Person occupation is important for COVID analysis
We won't implement downstream effects for all models, just some to illustrate that it works; we don't want to bit off too much at once
SEMCOG wants to get going so comments due next week
Can we include transit service in the model estimation? The bay area data shows this matters for telecommuting
Maybe we could pool the bay area data and the SEMCOG data but that probably too big for the scope
SEMCOG will contribute the example and so could be updated for a bay area example
We'll create a wiki page with Joel's initial design presentation and Wu's background info
The presentation is sufficient for the first deliverable
Discuss progress on TVPB / skim performance and caching strategies with Doyle
Jeff Newman's feather backend for skims testing is very promising
So Doyle is testing replacing (or adding) a feather memmap caching backend replacement to activitysim to substitute for in-memory skims
This would free up a lot of ram to use for other purposes and hopefully get overall faster runtimes
In addition, the skim architecture needed some re-writing / updating based on all the updates over the years so that's getting cleaned up too
There are performance differences between numpy memmap and feather memmap
Instead of 6 GB of RAM for the TM1 skims, there's a memmap file on disk and 200 MB RAM usage
Performance is similar so far singled threaded
This could replace the need for scaled int skims
Could still store scaled int skims in the cache too in order to reduce it
With memmapped designs, you need to organize the data in continuous blocks for how the data will be accessed in order to get good performance
So making the first index in the existing ODT (Origin, Destination, TimeOfDay) queries, the TimeOfDay dimension makes the queries much faster
This means rearranging the skims after reading them from the OMX files but before putting them in the cache
So may be able to avoid the existing multiprocessing shared memory setup that currently handles skims and needs to be extended for multiple zones and TVPB data
SANDAG has 4D skims, with VOT bin as well, is that a problem? It can either be collapsed to a 3rd dimension or support can be extended with a little work
How much disk space is the cache? About the same as RAM would be
Want to use a fast SSD drive
Disk/RAM usage in modern OSs is changing because of how paging works so the traditional disk/RAM distinctions are blurring
Have you tested threading yet? Not yet but believes it should work because the OS is doing the paging
All this work is in support for improved skim/tvpb data management for eventual tvpb performance tuning
If we can use the cache really fast then let's do that, it makes the whole system easier to maintain, if not, then back to multiprocessing shared memory objects
Much RAM usage is for the household processing chunks/threads so maybe we should focus more on that?
We could work on slimming them down in terms of data types
TOD is stored as 8 bytes for example
CT-RAMP has this issue as well
Freeing up more RAM to give to the chunks/threads would provide more throughput as well
Expect to focus on tvpb data in addition to just skims later this week
Discuss progress on running the full scale TM2 Marin work tour mode choice example with me
I've got the full scale Marin TM2 work tour mode choice example running on my development machine
It's just running single threaded for now since Jeff working on the multiprocessing/performance stuff
It includes the tap lines trimming functionality
I've summarized mode shares and boarding tap counts
Work tour transit mode share is 19% in Marin TM2 and 18% in asim right now
But no drive transit, which is 6% in Marin TM2
Also the tap count distribution looks pretty good, but there are some outliers that need review, such as tap 5117
There's lots of good logging/tracing to review for debugging
Plan to trace a couple HHs to find the issues - no drive transit and tap 5117 being popular in TM2 but not in asim
Clint and Jeff to discuss ARC runtimes and improvements