Project Meeting 2023.09.26 - ActivitySim/activitysim GitHub Wiki
Agenda
Admin
Data Type Update (WSP)
Action Items
Jeff to create an issue for sharrow tracing optimization, a longer-term improvement.
Sijia to complete a run without tracing in sharrow.
Sijia to look into what is held in memory between segments with Sharrow?
Sijia to clean up the data type code changes
TO DO: Sijia to think about the appropriate way to document these investigations.
Meeting Notes
Admin
Another partners only meeting this Thursday.
Joe was able to set up ActivitySim org in Google Drive.
Regarding status of the response to Eric Miller's editorial, Joe sent to the letter to the Transport Reviews' editors, expressing desire to publish and attaching the letter. He is still waiting to hear back.
Identified where in mandatory tour scheduling that the peak is happening: eval_interaction_utilities
Between mandatory tour scheduling segments, the memory accumulates (with Sharrow), instead of getting released (without sharrow)
Three additional test runs
Reverse order of mandatory tour scheduling segments
Results: this did not help, actually made the peak even higher.
Add extra logging at memory peak eval_iteraction_utilities
Revealed the peak to be during sharrow tracing when calling load_dataarray. The peak is suspected to be when the code is trying to extract data from sharrow xarray and put into pandas.
This appears to be only be happening with sharrow tracing turned on, which is currently the default.
With tracing turned off, this helped the memory peak.
From Jeff - when tracing is turned on in sharrow, you are breaking all the optimizations in order to be able to give data that requires massive amounts of memory. It collects all the information for all the households simulated and then picks some households, as specified. Sharrow tracing has not been optimized.
If we were to write this thing from scratch in ActivitySim, tracing would be on the side, not within the production model. There is no need to do tracing simultaneously when running with thousands of households.
TO DO: Jeff to create an issue for sharrow tracing optimization, a longer-term improvement.
TO DO: Sijia to complete a run without tracing in sharrow.
Adding pauses and calling garbage collection at the end of each segment
Due to cached trees in sharrow.
Revisiting a previously reported observation that showed a similar pattern in memory use for school as work, like a mini-work profile. When sharrow runs it caches the file flow functions. It was holding onto a reference to temporary data. Jeff just pushed a PR, but it still failing a test but will be ready soon.
Takeaways
Tracing takes a lot of memory. By necessity – tracing will always require more resources.
Right now – we’re looking at going from 500GB memory peak down to 260GB, with the data type updates and tracing off in sharrow.
Next steps
Sijia to look into what is held in memory between segments with Sharrow?
Sijia to clean up the data type code changes
TO DO: Sijia to think about the appropriate way to document these investigations.