2023.08.28 - ovis-hpc/ovis-wiki GitHub Wiki

  • Best Practices
    • Discuss a path forward for collaboratively developing a common configuration approach for resilient large-scale deployment in HPC environments. Example
    • Put a sub-working group together to:
      • focus on the identification, development, and testing of promising solutions
      • report back to the larger group as progress warrants
  • Multi-Tenancy Discussion
    • Define realistic use-case scenarios to drive data collection/attribution at sub-node and possibly subsystem levels (e.g., CPU, GPU, and memory resources)
    • Identify gaps in the current LDMS ecosystem concerning accommodation of multi-tenant use case scenarios
    • Put a sub-working group together to:
      • focus on the identification of resources to engineer and develop solutions to fill gaps
      • provide timeline estimates
      • report back to larger group as progress warrants
  • Please be on the lookout for a Doodle poll link for each working group from Jim.
  • More details on the discussion can be found in the slides here.