Clint try the same setup for ARC to get started and report back
Yes, you can add pre-processors for all the new scheduling submodels to avoid making many duplicate calculations
In general, we want to add pre-processors to every model over time
What's the right pattern for adding pre-processors, estimators, etc? Maybe a decorator pattern to make it easier on the developer to add them?
In general, you don't want too many annotations on the tables or otherwise the tables get too big for memory. Need to balance what gets saved to the tables and what is calculated on demand and then dropped
This is especially true for time-of-day models because there are many alts
Also typically want to write expressions with numpy rather than pandas since its faster
New submodels testings/examples
Clint added fake/dummy data into the py.test classes to exercise new submodels
This puts a level of input specificity into the test classes that's not typical to date
In general we're managing all input data in inputs files; the input files are the data contract
Once these new submodels are in the repo, how do we illustrate them, provide examples, update the documentation for them?
Basically what's our expectation for what we provide new users for new features?
And we prefer to include examples that have good / reasonable / understandable results as opposed to dummy data so they're intuitive
This speaks again to the need for a better defined data contract
Clint will move his test data out of the Python classes and instead into input CSV files so its more obvious this is input data
Why this is tricky is because of how injectables work
Folks don't like the injectables
We want runnable examples in the repo for each submodel
Can the new ARC park location choice submodel example be completely stand alone or do we need the whole ARC model in the test system?
Stand alone is ok, but we still need ARC data in the repo as opposed to MTC data
Do we want to maintain different region data sets?
Maybe we need a new example - example_mtc_arc_extensions - that inherits from the example mtc example and adds Clint's new stuff and is runnable
This means Clints creates dummy data for the MTC region to exercise the new submodels
The deployer (the CLI) is somewhat rigid and could be improved to make adding examples easier
What we maintain to support models in multiple regions/instances is a key strategic plan question
Supporting TM2 transit capacity, crowding, and reliability will raise these questions as well
It's not entirely evident what the best way forward is on this topic
Do we want to encourage or discourage multiple different submodels? This a strategic / policy question in addition to a technical one
ARC is 90% the same as MTC
We have this idea of "spines" and so let's keep ARC on the MTC spine
TM2 is a different spine due to TVPB
We decided Clint will create a new example - example_mtc_arc_extensions - that will inherit from the example mtc spine and will run the new ARC models
It will use data for the MTC region for the examples
ARC wants to wrap up soon so the MTC data development exercise can't be too big
SFCTA will help Clint create some reasonable test data for the contribution
RSG will grant Clint access to the activitysim data repo - done
Let's keep talking about this issue, it's important
Let's also better understand the injectables issue
In general, it came from urbansim and it makes abstraction difficult and no one really likes it
You can't tell when or who said what and that's the problem with it
Jeff reducing its importance over time
Update from Newman on non-mandatory tour frequency by person type estimation integration
Non-man tour freq larch notebook created
But there are 8 EDBs for this model - one for each person type
This makes it more difficult to step through the example in the notebook
But we probably also want to be able to automatically run_all in addition to manually step through
So we'll add a run_all function that calls the manual steps
And to make this submodel EDB like others, Newman will restructure it and send back to Doyle to revise the EDB writer to write one EDB
And then Newman will update the notebook so it illustrates both cases - manual step through and run_all
Users will need both setups - one for debugging/developing and one for production mode
Update on multiprocessing/caching for TVPB with Doyle and verification of results with me
Now precomputing tap to tap utilities
Created a single key across time-of-day, market segment, path type, etc for faster lookups and smarter data storage
All single types so can be stored in big numpy arrays for easier shared memory storage
Still all single threaded so no timing statements yet
Left calculate on demand code in there for now since it is good for small sample size runs and less memory machines
Turning to multiprocessing now, including the best way to expose the new shared data across processes for efficient access
Tap to tap utilities know access mode too- in the Marin example there is a different transfer penalty for drive transit
I fixed the drive transit expression issue so now we have many drive transit trips
I'm still working through the verification / QA/QC stuff