Project Meeting 2022.11.10 - ActivitySim/activitysim GitHub Wiki

Agenda

  • Phase 8 scoping update
  • Scheduling upcoming meetings
  • Task and code review updates
  • Memory profiling update

Action Items

  • Partners to review the IRFQ distributed by Alex and provide written comments/suggested revisions before next Thursday's partner meeting (11/17).
  • WSP to distribute Tableau workbook with memory profiling information to the group.

Notes

Phase 8 scoping update

  • Alex drafted IRFQ and sent to the partners.
  • ACTION ITEM: Joe and Alex ask that the Partners review the doc and send Joe and Alex any recommended revisions before before next Thursday's partner meeting (11/17) so that they can incorporate changes prior to the meeting.
  • Ideally, by the end of that meeting on the 17th, the text is finalized and can be sent to AMPO on Friday the 18th.

Scheduling upcoming meetings

  • Tuesday the 15th - CANCELED
  • Thursday the 17th - Update on tasks/code review, memory profiling check-in
  • Tuesday the 22nd - Code acceptance and management
  • Thursday the 24th - CANCELED

Task and Code Review Updates

Memory profiling update

  • Presentation: ActivitySim Memory Profiling Task – Process Update 11-10-2022.pptx

  • Test runs conducted without chunking or multiprocessing

  • Prototype_arc runs

    • Memory peaks at 5, 12.5, 25% samples
    • Ran regression results and demonstrated linear relationship
    • Memory peaks measured from a tool that Jeff created as part of sharrow, which exports a csv that prints out memory usage every half secondd that the model is running
  • Prototype_mtc_extended runs

    • Also demonstrated linear relationships
  • Hypothesis Testing

    • Relationship between sample size and memory usage
      • Produced time series of memory usage for different sample size runs.
      • ARC runs at different sample sizes show the same profile for memory usage.
      • For MTC extended (at 100% sample), memory peaks at different points. Sijia has a theory about why this is (memory may not be released fast enough when moving to next step) but needs to run some tests to confirm.
      • Haven’t run MWCOG yet but may not be able to run 100% sample so anticipating results similar to the ARC runs.
    • Issue is not the skims, it’s the pipeline information
      • Looked into what the pipeline looks like at different memory levels. The chooser table is large when the memory is the highest. (There are some outliers but that may be due to the memory not being released fast enough).
    • Data type issues
      • Table has variable names at each checkpoint and data type, looks at number of tables by data type
      • For ARC 25% run, many are int64
      • In the settings for the ARC model, you can set datatypes when files are read in. Sijia will test specifying lower data types to see if that helps the memory issue.
      • Request was made to look at the number of rows in each table in the data type table.
  • Other things

    • Run time for vehicle type is high; this is because of string checks in the model that aren’t optimized well with numba.
    • Question about memory usage with multiprocessing – can this be a test as well?
    • ACTION ITEM: Sijia to provide the Tableau workbook to the group.