Project Meeting 2020.10.13 - ActivitySim/activitysim GitHub Wiki
Technical Call
Update on TVPB
Now have a running Marin county example work tour mode choice model up and running
Have run 170k work tours successfully
Needed to clean-up the draft expression files I created, add some more tracing to the TVPB, and make some chunking improvements
Adaptive chunking appears to be working well
Next steps
Pre-computing/caching since there are lots of redundant calculations since the utilities are not specific to individual people
There's a lot of redundancy in the new TVPB expression files too so we may be able to make some efficiencies there as well
There's an interesting problem with python evaluating Boolean expressions in numexpr versus regular python if on the left side of the equation
Python throws a warning and we should catch and notify the user since this increases runtime
Plan to discuss pre-computing/caching more next week
After Jeff gets something working, I'll run the full example to compare against the original Marin TM2 example
Scaled integer skims
We should think about what activitysim publishes as its expected input skims formats/assumptions
For example, if we go to storing skims as 16bit unsigned ints, then the range is 0-65,000
We'd need to scale float values by 100 too
Can we do time in seconds with this data type? 60 * 60 * 18 = 64,800, so we can only handle up to 18 hours of seconds
DaySim does this (see scaling here ) and so that means it should be acceptable for travel models in general
Reducing from float32 to int16 storage would mean half the memory usage for skims
OMX project is discussing using Apache arrow for faster disk-based I/O
Maybe we could do something similar for skims in ActivitySim? Maybe we don't need to load them into RAM.
The existing activitysim reload skims from disk using memmap feature is similar
Maybe ActivitySim adds a float_32 versus scaled_unsigned_int_16 skim setting?
We want to remain unit agnostic though
And maybe we could specify different internal data types by skim
Freeing up more RAM by using leaner data types means more RAM for chunking and multiprocessing
It's a requirement that activitysim can run the PSRC model - 12 time periods * 60 skims * 4000 zones
This is a good chance to better define and publish the data model
How much RAM for skims does the existing example use? We'll check
We may want to use more efficient data types in expressions as well - say float32 instead of float64
We'll review this as part of the performance task as well
Jeff to work on this after he gets TVPB to a good place
We want to do all our arithmetic in full precision though since some models have lots of choices with small utilities and probabilities
The scaling/unscaling of skims will happen only in the skims API/class and not in the rest of the system so it is only a storage/RAM issue
Here are some good relevant links on memory and performance from Stefan:
Welcome Jeff Newman and CDAP larch integration
Jeff to start on the CDAP larch integration
Person types are encoded 1 to 8 and so only up to 10 person types is possible (0-9)
Could be stored as strings (A, B, C) to expand the set of possible values
This is fine for this version of activitysim; later versions may support more person types but let's not solve a problem we don't have
SEMCOG HH IDs were very long, 14 digits, and ActivitySim couldn't read them. ActivitySim needs to do a better job of publishing expected data types/ranges
Jeff to work off asim/develop in a new branch within the repo so everyone can more easily participate
Clint can too for ARC if he wants
🗂️ Page Index for this GitHub Wiki