Project Meeting 2020.10.13 - ActivitySim/activitysim GitHub Wiki

Technical Call

  • Update on TVPB
    • Now have a running Marin county example work tour mode choice model up and running
      • Have run 170k work tours successfully
      • Needed to clean-up the draft expression files I created, add some more tracing to the TVPB, and make some chunking improvements
      • Adaptive chunking appears to be working well
    • Next steps
      • Pre-computing/caching since there are lots of redundant calculations since the utilities are not specific to individual people
      • There's a lot of redundancy in the new TVPB expression files too so we may be able to make some efficiencies there as well
      • There's an interesting problem with python evaluating Boolean expressions in numexpr versus regular python if on the left side of the equation
      • Python throws a warning and we should catch and notify the user since this increases runtime
      • Plan to discuss pre-computing/caching more next week
    • After Jeff gets something working, I'll run the full example to compare against the original Marin TM2 example
  • Scaled integer skims
    • We should think about what activitysim publishes as its expected input skims formats/assumptions
    • For example, if we go to storing skims as 16bit unsigned ints, then the range is 0-65,000
    • We'd need to scale float values by 100 too
    • Can we do time in seconds with this data type? 60 * 60 * 18 = 64,800, so we can only handle up to 18 hours of seconds
    • DaySim does this (see scaling here) and so that means it should be acceptable for travel models in general
    • Reducing from float32 to int16 storage would mean half the memory usage for skims
    • OMX project is discussing using Apache arrow for faster disk-based I/O
    • Maybe we could do something similar for skims in ActivitySim? Maybe we don't need to load them into RAM.
    • The existing activitysim reload skims from disk using memmap feature is similar
    • Maybe ActivitySim adds a float_32 versus scaled_unsigned_int_16 skim setting?
    • We want to remain unit agnostic though
    • And maybe we could specify different internal data types by skim
    • Freeing up more RAM by using leaner data types means more RAM for chunking and multiprocessing
    • It's a requirement that activitysim can run the PSRC model - 12 time periods * 60 skims * 4000 zones
    • This is a good chance to better define and publish the data model
    • How much RAM for skims does the existing example use? We'll check
    • We may want to use more efficient data types in expressions as well - say float32 instead of float64
    • We'll review this as part of the performance task as well
    • Jeff to work on this after he gets TVPB to a good place
    • We want to do all our arithmetic in full precision though since some models have lots of choices with small utilities and probabilities
    • The scaling/unscaling of skims will happen only in the skims API/class and not in the rest of the system so it is only a storage/RAM issue
    • Here are some good relevant links on memory and performance from Stefan:
  • Welcome Jeff Newman and CDAP larch integration
    • Jeff to start on the CDAP larch integration
    • Person types are encoded 1 to 8 and so only up to 10 person types is possible (0-9)
    • Could be stored as strings (A, B, C) to expand the set of possible values
    • This is fine for this version of activitysim; later versions may support more person types but let's not solve a problem we don't have
    • SEMCOG HH IDs were very long, 14 digits, and ActivitySim couldn't read them. ActivitySim needs to do a better job of publishing expected data types/ranges
    • Jeff to work off asim/develop in a new branch within the repo so everyone can more easily participate
    • Clint can too for ARC if he wants