Project Meeting 2023.02.21 - ActivitySim/activitysim GitHub Wiki

Agenda

ORCA removal status update

Meeting Notes

ORCA Background
- ORCA is a way to manage data and orchestrate a pipeline of model steps
- It has managed data and the tables used in processing ActivitySim through a set of references that are stored in the global state of orca itself
Status:
- Most of ORCA has been replaced by an encapsulated state that gets passed into an argument to get the information
- Now things that were hard with ORCA can be done more easily. For example, you can have more than one pipeline at a time
How it works
- New package called workflow
- Access to settings from settings.yaml (note there is a new configuration system with Pydantic that is being developed), and network LOS settings from network_settings.
  - At some point in the roadmap task, we may want to discuss how the different pieces fit together and why the settings are in different multiple files.
  - As part of a forthcoming task, all component models will be migrated to have setting files in Pydantic and include set default values. Before, default values were defined anywhere in the code where you would access the setting, you could set a default value. So, default values were scattered everywhere and you could have different default values being implemented at different points in the model. Now (or soon to be), defining data types and default values with Pydantic would be in one consolidated, specific spot.
- Settings get validated
  - Everything that’s not expected gets dumped into other settings and can be used in other expressions later
  - Settings.models includes all components from the settings file
- Running the model
  - Run.all allows you to run everything
  - You don’t have to run all – you can run individual model components. For example, run.initialize_landuse()
  - You have access to data frames in Jupyter notebook
  - You can also pause at any point and compare results
    - Zipped files as references for comparison can be the hdf5 files (where you can create pseudo directories and nodes within them that have tables), if that’s what you have, or it can be parquet format files with checkpoints.
    - Parquet format has reading/writing capabilities and have compression algorithms that allow it run faster. It can also be read only. All individual checkpoints can be indexed similar to hdf5 format.
    - You can check any table checkpointed. You can also set up a test that will identify where the model diverges from the referenced model checkpoints.
How will these changes impact estimation mode?
- Runs about the same as before
- Jeff can envision a future where the individual model components will have a couple of different input channels (data input channel has been reconfigured with this system), and you could add an array of parameters as another channel. You could write the same model component and run it in learning mode (instead of simulation mode) and provide the observed outputs as input (instead of providing the parameters) and get the estimated parameters as output, within this environment.
Instead of the inject decorator that was used before, we can now call the function directly and give it the inputs as arguments and the function will run as expected because all the settings are defined in the encapsulated state. You can import the function and call the parameters manually. Workflow.step decorator could evolve in the future.