Project Meeting 2023.04.06 - ActivitySim/activitysim GitHub Wiki

Agenda

Action Items

  • Jeff to look into Pandas 2.0 issues and report back.
  • Jeff to propose short-term fix to ending the activitysim_resources git monorepo.

Meeting Notes

Pandas 2.0

  • Do not upgrade to Pandas 2.0 yet. #663
  • Jeff is looking into issues and will report back. Corrections may just be some updates to new function names, which is not a large effort.
  • We want to pin activitysim releases to different versions of software required. We should also try to stay up to date on software releases so we can try to keep up, as reasonable.
  • This is something that can/should be addressed in the roadmap: how much dependency pinning do we want to do?

ORCA Removal PR #654

  • The core functionality of this work is "done" (ish), still need to write lots of documentation to support the changes. All CI tests now pass except estimation (which is broken for non-code reasons presently), although an error was found in the placeholder_sandag 3-zone model non-sharrow regression tests that required their re-generation.
  • Still needs a lot of work on the documentation, so PR is draft. However, people can start code review.
    • Focus area of code review on core workflow directory, which includes bulk of the changes. State.py is a good place to start.
    • This is a big enough change that RSG and WSP should both review.
  • New code includes separate python files to organize the functions.
  • Old orca had ability to automatically merge tables. There was a bunch of infrastructure to help with that. Jeff has simplified that process by adding a more generic process with a temporary workflow table, similar to having a step but the temporary table will create a table that will exist only for the workflow step that is being called. Now there’s python code that uses a function to join tables. This is a simplified/more accessible process.

Ending the activitysim_resources git monorepo.

  • One giant repo with 10's of GB of files from multiple agencies is inconvenient to manage.
  • Tracking data changes in Git (even with LFS) is mostly unnecessary
  • When our monthly bandwidth is exhausted, users (including the CI testing) are locked out of the data until the next month
    • If someone were to access the whole repo, that would use about half the available monthly bandwidth.
  • Maybe replace with per-agency repos and/or S3 buckets or similar, possibly independently managed? e.g. activitysim-prototype-mtc
  • Agree that we don’t want to maintain all of the examples included in the ActivitySim repo, but we don’t want to get ahead of the roadmap and make decisions now.
    • Short-term solution is break these out into separate agency-specific repos. Keep a few here – the MTC example for a one-zone and then another for two-zones; then let the roadmap process determine a longer-term solution.
      • Note that this may be a slow process because of bandwidth limitations, unless we want to put money into it. If it’s a few hundred to $1k, that would be ok to spend to pass this hurdle.
  • Are there permissions or controls that would limit access to downloading? Jeff could change this repo to be private and then no one outside the consortium would have access.
  • You can still conduct tests on repos owned and controlled by someone else.
    • Joe asks Jeff to propose a short-term fix on Tuesday and assess additional tasks.
  • Another roadmap question is whether we should be testing individual implementations.