Project Meeting 2023.04.13 - ActivitySim/activitysim GitHub Wiki

Agenda

  • Discuss Short-term solutions for ending the activitysim_resources git monorepo.
  • Discussion Questions:
    • Do we want to continue to develop/maintain 3-zone model code?
    • Is validation testing against TM2 results at scale something we want to do?

Key Decisions

  • Code related to a 3-zone system is functioning but not verified that results are correct. Due to lack of implementations or planned implementations of a 3-zone system, resources will not be used to verify this system or fixing bugs moving forward, at this point. Users should be warned that the 3-zone system code is not currently being maintained.

Meetings

Admin Items

  • Still looking for volunteers to help build ActivitySim website. Alex to coordinate.

Discussion questions

  • Do we want to continue to develop/maintain 3-zone model code?
    • If no participating agency is planning on using this formulation, we should consider ending support for it ASAP to allow that time to be spent supporting other more useful things.
    • What Jeff built now has consistent and stable results. However, are the results accurate? Should Jeff spend time/resources confirming results by hand?
    • Consortium agreement: The 3-zone model tests are complete but holding on whether or not the results are correct,
    • Going forward, document config lines more clearly. Must release information regarding bug to ActivitySim users outside of consortium (i.e. Graduate students at UC Irvine utilize 3-zone model system).
  • Is validation testing against TM2 results at scale a thing we want to do?
    • The BayDAG PR included a number of component level tests that are backed by comparisons against TM2 results at scale. The datasets for these tests are large and it is unclear whether they can be supported in the CI package.
    • Currently running these tests requires getting the specialized data files (TM2 outputs) from ActivitySim resources and/or sharepoint
    • Datasets for BayDAG PR component level tests are large and may not be supported by the Continuous Integration package.
    • Decision: no decision. For now, no further attempt to address BayDAG PR component level tests. This might be better addressed in the RoadMap.

Short-term solutions for ending the activitysim_resources git monorepo

  • Create or leverage external example to pull in data from just the estimation example - the only test that runs inside GitHub. Jeff will replace just this one piece.
  • We will be able to pull this data file without touching our bandwidth quota.

Data Format for Checkpointing Structures:

  • Developers Checkpointing Guide: https://camsys.github.io/activitysim/generic-whale/dev-guide/checkpointing.html
  • In the new implementation, there are currently two data file formats available for checkpointing:
    • HDF5
    • Parquet (Default)
  • Checkpointing done using a pluggable class instead of line-by-line changes.
  • User perspective change regarding the Parquet: In the output file directory, a pipeline.parquetpipeline directory has individual folders, each with a set of Parquet files named based on the component that the checkpoint was written for.
  • Writing to a Parquet file format could fail because the Parquet storage does not allow for arbitrary objects in the datafile (i.e. a mix of datatypes).