Meeting 2023 08 03 - openpmix/openpmix GitHub Wiki

August 3, 2023 OpenPMIx-devel call notes

Attendees

  • Ralph Castain (Nanook)
  • Howard Pritchard (LANL)
  • Samuel Gutierrez (LANL)
  • Rajat Bhattarai
  • Thomas Naughton (ORNL)

Notes

  • pmix-4.4 release

  • pmix-4.5 likely to be released to include a few bugfixes that arose

  • then plan to move on to pmix-5.0

  • regarding prrte kind of on a holding

    • some work on ompi side to adjust the docs
    • trying to reorder to help organize use of man pages, while keeping the context sensitive help
    • may release prrte-3.1 soon and delay the doc sorting
    • few items coming up that will be fixed with prrte-3.1
  • a dynamic resource mgmt issue arose, so likely to be in prrte-4.x

    • two apps tryign to run
    • appA add nodes and then appB tryies to launch 3ppn, there is a race on the ordering... if appA first, then appB might get confusing results b/c appA influences appB's info
    • would like to be deterministic, but not easy to resolve
    • discussion on possible ideas, some options could resolve but would need to do lot of bookkeeping
    • should appB see the extended allocation, or should appB only deal w/ the allocation that it started with... or is another flag needed to notify about growth
    • in current context where spawn gets an error is actually more a matter of resource additions in the midst of asynchronous resource change, daemons change and the spawn fails b/c it is cought in midst of this asychronous update.
    • The PMIx standard is silent on this.
    • Actually not part of standard, it is on the implementation side
    • Thinking now might be that you serialize things when a dvm modification occurs. But what happens if a change occurs after already mapped.
    • Several corner cases that need to be worked out.
    • Some thinking/discussion in concept of growth with SLURM, in that context it would be new nodes are a new session and you can link the old/new session so spawns are across the combined sessions.
  • Discussion about different scenarios/options for the new scheduler capability

  • Possibly useful to have a follow-up meeting to discuss some of the elastic resource mgmt items