Meeting 2024 01 04 - openpmix/openpmix GitHub Wiki

January 4, 2024 OpenPMIx-devel call notes

Attendees

  • Samuel Gutierrez (LANL)
  • Ralph Castain (Nanook)
  • Tim Wickberg (SchedMD)
  • Michael Karo (Altair)
  • Howard Pritchard (LANL)
  • Rajat Bhattarai (TNTech)
  • Thomas Naughton (ORNL)

Notes

  • Released new pmix 4.2, prrte 3.1 to support new release of openmpi 5.0.1

  • Some items emerged from this, mainly command-line related items coming up

  • Few other items that came up over holidays came up

    • MPI4PY issue
      • Sam looked into it w/ Howard and is now able to reproduce issue
      • Ralph: The issue related to having a realloc / double-insert
      • Sam: Will look into some of things and will ask questions as needed
      • See also: https://github.com/open-mpi/ompi/issues/12195
      • Would be good to have a PMIx test to avoid having this problem, so would have a unit-test to check for this in future
      • Q: Is there a related MPI test that is similar?
      • Check the sessions test under ibm suite in ompi-private tests, but will need to be modified further.
      • The test is not part of the default set of OMPI CI checks.
      • On PMIx/PRTE side, we do have CI checks that we could test it there if we have a PMIx level set of tests (instead of at the higher-level MPI4PY)
    • Scheduler Integration
      • Ralph trying to add some of the allocation support into PRRTE
      • Found some old ORTE code that caused session dir to be cleaned up and wipe-out session directory...to include anything that was still using it like the scheduler. Working to revamp now.
      • So working on this cleanup and then return to other integration.
      • Previously had a single top-level session dir, but that does not have enough specificity. So now each prte tool (scheduler, tool, etc.) will now have its own top-level session dir. The job family dir under that will go away. This simplifies all the cleanup b/c there is no sharing, just cleanup self.
      • All specific to session directory definitions, but revamps how that is setup.
      • Plan for this to go into prte 4.0 branch
      • Plan to pmix-6.0 and prte-4.0 in future (months). And that will be it, minor slow changes after that.
      • Looking for volunteers for active development! Otherwise looking like things will be rather dormant.
  • Ralph: remarks at the quarterly PMIx meeting... paraphrase, why is there a PMIx Standard vs just the Library?

    • Tim: Is OpenPMIx meant to lag or lead standard? What is the benefit/goal of having the two separate?
    • Questions from others over recent period, raised a question about PMIx Standard participants' consumption of the standard.
    • Raises question of where things should be going based on what projects are using the Standard... leading to questions of how much effort to keep the two stay in sync. If library has it all, then why worry about the Standard. Again, trying to succinctly capture the notion/idea in the notes here.
    • Tim: Previously trying to highlight the interaction between Standard and Library... the standard had better notion of what would like to be what is "wanted" and use that to inform the roadmap for bringing library and standard in sync. Gist: maybe having something like the Standards body to drive the Library devel, to help with release management and release engineering.
    • Possibly some resources to come onboard, and help to keep the development alive.
    • General discussion hovered around sustainability and time allocation for Standard and Library (devel). Where best value but also where there appear to be motivations.
    • Good to have design doc elements to help clarify things, incorporating that info in the Standard could be useful and avoids some of the possibly (sometimes) more onerous standard language.