Meeting 2023 06 01 - openpmix/openpmix GitHub Wiki

June 1, 2023 OpenPMIx-devel call notes

Attendees

  • Aurelien Bouteiller (UTK)
  • Howard Pritchard (LANL)
  • Tim Wickberg (SchedMD)
  • Samuel Gutierrez (LANL)
  • Ralph Castain (Nanook)
  • Michael Karo (Altair)
  • Thomas Naughton (ORNL)

Notes

  • Admin: Thomas look into removing the "host admit" setting for Teams meeting

  • openpmix v4.2 release

    • No known issues/objections
    • DStore problem with with multiple session init/fini, but having problems reproducing the issue
    • Can not find location where the unlink is occurring, so just suggest waiting for things in update if gds=hash still not work
  • prrte v3.0

    • A spawn issue w.r.t. using HAN collective component, not sure exactly what is going on
      • https://github.com/open-mpi/ompi/issues/11724
      • George @ UTK is hitting the issue
      • Aurelien suggests w/ can probably not hold up and fix in subsequent release
      • Aurelien mentioned the issue comes from locality info be lost/missing, which might be in failure paths
      • Ralph: something related to the interleaving the spawn/split etc., the locality of the underlying process gets confused. Not sure why/where If remove the splits/spawn, works fine.
      • Aurelien: there is some packing of the info during the split/spawn and it is a subset of the COMM_WORLD during the split
      • The basic passing of data is working. But maybe something environment specific?
      • George working w/ what is OMPI main, Ralph tests were with latest PRRTE/PMIX head
    • Maybe defer other items to v3.0.1 or v3.1.0
  • Question: Can we update the submodule pointers in OMPIS

    • TODO: Update prrte/pmix submodule pointers on Open MPI

    • TODO: TJN create ompi ticket to update submodule pointers

    • Few OMPI tickets w/ failures that are updated/fixed by updating the submodule pointers

  • Recap of discussion related to Nic/GPU distance selection

    • GDR needs to be on same root complex
    • But still need to pick the NIC/GPU affinity
    • Need to ensure that locality questions are considered w.r.t. to gpu/nic process placement
  • Aurelien: Lots of PMIx standard folks on this call