Project Meeting 2023.06.27 - ActivitySim/activitysim GitHub Wiki

Agenda

  • Admin: Need written approval of contract extension this week
  • Phase 8 Technical Task Update: Data Type (WSP)

Meeting Notes

Admin

  • Reminder that written approval of contract extension this week

Phase 8 Technical Task Update: Data Type (WSP)

Presentation: ActivitySim Data Types Task - Progress Update 06-27-2023.pptx

  • Strings in ActivitySim
    • Examples include time periods and tour purposes
    • Five string columns take over 150 GB of memory (out of using 254GB total). This can be significantly reduced if set to integer values.
    • Strings are in a lot of places: UECs, settings, etc.
  • How to update the data types
    • Replacing strings with Enums
    • Need to go through all relevant files
    • Impacts downstream submodels
  • Approach
    • Defining all the strings as enums.
      • Enum is a class that you can create to correspond string nominal values to integer values. Each has a name that can be called. You can access all the members in your enums, dictionaries, etc.
    • Import asim_enum
    • Use enum mapping instead of strings and store the integer values.
    • When the strings are replaced with integers, you need to update the downstream UECs.
    • Pass through to UECs and directly reference the enums.
    • All the implementations will need to change their UECs.
  • Other considerations
    • Categorical data types in panda
      • Sijia will look into categorical data types in panda, but she started with enums since it would be compatible with input checker and other technical updates. It doesn’t have to be either enums or pandas categorical set up, it can be both. A lot of enums don’t have meaningful ordering – except time period, it matches dimensions of skims array. If they can line up the same, the integer positions are the same as the array index positions, it would have enormous performance benefits.
    • Backwards compatibility
      • If outside of sharrow, backwards compatibility may be possible with this update. Jeff has looked into it but not fully thought through.
      • We need to change a lot of specs if we switch to the categoricals.
  • Tracing
    • It will look like the enum object instead of the integer value in tracing.
  • Next steps is to revisit topic in 2 weeks.