Project Meeting 2020.10.13 - ActivitySim/activitysim GitHub Wiki

Technical Call

Update on TVPB
- Now have a running Marin county example work tour mode choice model up and running
  - Have run 170k work tours successfully
  - Needed to clean-up the draft expression files I created, add some more tracing to the TVPB, and make some chunking improvements
  - Adaptive chunking appears to be working well
- Next steps
  - Pre-computing/caching since there are lots of redundant calculations since the utilities are not specific to individual people
  - There's a lot of redundancy in the new TVPB expression files too so we may be able to make some efficiencies there as well
  - There's an interesting problem with python evaluating Boolean expressions in numexpr versus regular python if on the left side of the equation
  - Python throws a warning and we should catch and notify the user since this increases runtime
  - Plan to discuss pre-computing/caching more next week
- After Jeff gets something working, I'll run the full example to compare against the original Marin TM2 example
Scaled integer skims
- We should think about what activitysim publishes as its expected input skims formats/assumptions
- For example, if we go to storing skims as 16bit unsigned ints, then the range is 0-65,000
- We'd need to scale float values by 100 too
- Can we do time in seconds with this data type? 60 * 60 * 18 = 64,800, so we can only handle up to 18 hours of seconds
- DaySim does this (see scaling here) and so that means it should be acceptable for travel models in general
- Reducing from float32 to int16 storage would mean half the memory usage for skims
- OMX project is discussing using Apache arrow for faster disk-based I/O
- Maybe we could do something similar for skims in ActivitySim? Maybe we don't need to load them into RAM.
- The existing activitysim reload skims from disk using memmap feature is similar
- Maybe ActivitySim adds a float_32 versus scaled_unsigned_int_16 skim setting?
- We want to remain unit agnostic though
- And maybe we could specify different internal data types by skim
- Freeing up more RAM by using leaner data types means more RAM for chunking and multiprocessing
- It's a requirement that activitysim can run the PSRC model - 12 time periods * 60 skims * 4000 zones
- This is a good chance to better define and publish the data model
- How much RAM for skims does the existing example use? We'll check
- We may want to use more efficient data types in expressions as well - say float32 instead of float64
- We'll review this as part of the performance task as well
- Jeff to work on this after he gets TVPB to a good place
- We want to do all our arithmetic in full precision though since some models have lots of choices with small utilities and probabilities
- The scaling/unscaling of skims will happen only in the skims API/class and not in the rest of the system so it is only a storage/RAM issue
- Here are some good relevant links on memory and performance from Stefan:
  - Info of why python uses more memory than some other languages
  - A new feature in Python 3.8 multiprocessing.shared_memory
Welcome Jeff Newman and CDAP larch integration
- Jeff to start on the CDAP larch integration
- Person types are encoded 1 to 8 and so only up to 10 person types is possible (0-9)
- Could be stored as strings (A, B, C) to expand the set of possible values
- This is fine for this version of activitysim; later versions may support more person types but let's not solve a problem we don't have
- SEMCOG HH IDs were very long, 14 digits, and ActivitySim couldn't read them. ActivitySim needs to do a better job of publishing expected data types/ranges
- Jeff to work off asim/develop in a new branch within the repo so everyone can more easily participate
- Clint can too for ARC if he wants