Project Meeting 2023.04.20 - ActivitySim/activitysim GitHub Wiki

Agenda

  • Input checker (RSG)
  • Update on two-zone memory profiling (WSP)

Meeting Notes

Input Checker

Presentation: input_checker_rsg.pptx

  • Configs for the checker includes defining:
    • What the check will be
    • Input table is what will be checked
    • Core ID/table that you want to run the test on
    • Type of severity warning (for example, a logical test could be a check that totals don’t match)
    • Test type (set calculation if you want to define a variable to be used in a test versus test that will be run)
    • Expression defines the test
    • Report defines the report statistics, such as getting min/max values
  • Output is log/text file
  • Current status
    • It is running and basic functionality is all there and working
    • Currently working to improve the output file (possibly HTML)
  • Discussion
    • Is there anything way to automate this process – something like scanning the files and then the configs for the input checker would be automatically populated?
      • RSG is exploring a more generalized approach. Joe was thinking it would go into other config yamls to pull info instead of the users having to define everything themselves. RSG to consider.
      • For example, process would look at annotated household and person files, check to see if something is in the file, etc. Checker would only apply checks to those being used, by looking through UECs and annotated files. It is possible that there are many things specified but not used (such as some building types not being used).
    • Discussion points to issue that there is no data model (documentation) that could be queried.
      • See issue on data model: https://github.com/ActivitySim/activitysim/issues/617
      • Data model here would define the types of files, a yaml file that specifies the field/variable, with max/min values, etc., that can be used for checking.
      • Jeff will be employing Pydantic. Jeff to present on what he plans to do for the Config documentation and checking to make sure these two efforts are complementary.
      • Can we even define a data model when there’s so much variation of implementations? What’s the vision for something like this? Can we define what that exactly might be?
        • Conceptually – there’s a household file, there’s a person file, skims, etc.
        • You could say there’s a household file. You could say there also need to be an ID file. It must have a EMP_ prefix for any employment categories. You could use it in data summary files, documentation, elsewhere in the model.
        • WSP could present an example data model.
        • Michelle to add Data Model as a parking lot item.

Next Steps for RSG on the Input Checker

  • Need to define checks that are more complicated than things like checking max/min values.
  • Discuss opportunities to make it more generic.
  • Evaluate possibilities for automating data processes

Two zone memory profiling update

*Issue 662: Sharrow mode not accessing xarray skim object as expected. Jeff fixed this issue. Sijia retested with a third run and now the new sharrow code is about the same memory as without sharrow (and runs faster, as would be expected).

  • With all issues resolved, Sijia will draft report for 2-zone memory profiling, similar to one-zone document, and will be distribute in about 2 weeks, by May 5.