Project Meeting 2023.09.26 - ActivitySim/activitysim GitHub Wiki

Agenda

  • Admin
  • Data Type Update (WSP)

Action Items

  • Jeff to create an issue for sharrow tracing optimization, a longer-term improvement.
  • Sijia to complete a run without tracing in sharrow.
  • Sijia to look into what is held in memory between segments with Sharrow?
  • Sijia to clean up the data type code changes
  • TO DO: Sijia to think about the appropriate way to document these investigations.

Meeting Notes

Admin

  • Another partners only meeting this Thursday.
  • Joe was able to set up ActivitySim org in Google Drive.
  • Regarding status of the response to Eric Miller's editorial, Joe sent to the letter to the Transport Reviews' editors, expressing desire to publish and attaching the letter. He is still waiting to hear back.

Memory Use / Data Type Update

  • Presentation: ActivitySim Data Types Task - Progress Update 09-26-2023.pptx
  • Identified where in mandatory tour scheduling that the peak is happening: eval_interaction_utilities
  • Between mandatory tour scheduling segments, the memory accumulates (with Sharrow), instead of getting released (without sharrow)
  • Three additional test runs
    1. Reverse order of mandatory tour scheduling segments
    • Results: this did not help, actually made the peak even higher.
    1. Add extra logging at memory peak eval_iteraction_utilities
    • Revealed the peak to be during sharrow tracing when calling load_dataarray. The peak is suspected to be when the code is trying to extract data from sharrow xarray and put into pandas.
    • This appears to be only be happening with sharrow tracing turned on, which is currently the default.
    • With tracing turned off, this helped the memory peak.
    • From Jeff - when tracing is turned on in sharrow, you are breaking all the optimizations in order to be able to give data that requires massive amounts of memory. It collects all the information for all the households simulated and then picks some households, as specified. Sharrow tracing has not been optimized.
    • If we were to write this thing from scratch in ActivitySim, tracing would be on the side, not within the production model. There is no need to do tracing simultaneously when running with thousands of households.
    • TO DO: Jeff to create an issue for sharrow tracing optimization, a longer-term improvement.
    • TO DO: Sijia to complete a run without tracing in sharrow.
    1. Adding pauses and calling garbage collection at the end of each segment
    • Due to cached trees in sharrow.
  • Revisiting a previously reported observation that showed a similar pattern in memory use for school as work, like a mini-work profile. When sharrow runs it caches the file flow functions. It was holding onto a reference to temporary data. Jeff just pushed a PR, but it still failing a test but will be ready soon.
  • Takeaways
    • Tracing takes a lot of memory. By necessity – tracing will always require more resources.
    • Right now – we’re looking at going from 500GB memory peak down to 260GB, with the data type updates and tracing off in sharrow.
  • Next steps
    • Sijia to look into what is held in memory between segments with Sharrow?
    • Sijia to clean up the data type code changes
    • TO DO: Sijia to think about the appropriate way to document these investigations.