Discuss a path forward for collaboratively developing a common configuration approach for resilient large-scale deployment in HPC environments. Example
Put a sub-working group together to:
focus on the identification, development, and testing of promising solutions
report back to the larger group as progress warrants
Multi-Tenancy Discussion
Define realistic use-case scenarios to drive data collection/attribution at sub-node and possibly subsystem levels (e.g., CPU, GPU, and memory resources)
Identify gaps in the current LDMS ecosystem concerning accommodation of multi-tenant use case scenarios
Put a sub-working group together to:
focus on the identification of resources to engineer and develop solutions to fill gaps
provide timeline estimates
report back to larger group as progress warrants
Please be on the lookout for a Doodle poll link for each working group from Jim.
More details on the discussion can be found in the slides here.