Meeting 2022 04 28 - openpmix/openpmix GitHub Wiki

04/28/2022 OpenPMIx-devel call notes

Attendees

  • Samuel Gutierrez (LANL)
  • Austen Lauria (IBM)
  • Michael Karo (Altair)
  • Ralph Castain (Nanook)
  • Aurelien Bouteiller (UTK)
  • Tommy Janjusic (NVIDIA)
  • Thomas Naughton (ORNL)

Notes

  • Ralph added first commit on memory footprint refactor

    • e.g., convert keys to integers, doing it at lowest level, instead of at higher level. Less efficient, but working and reduces footprint.
    • Sam's work is not encumbered by these changes. At level he's working is still coming in as strings, so not impacted by change.
  • Ralph - working on changes for Howard related to MPI Sessions needs, and needing different value for different CIDs.

    • had to rework some things due to prior pass being little stale
    • having to do updates in GDShash code, hopefully not impact Sam/others
    • Planning on order of having this in few weeks
  • Ralph - next resource tracking pieces

    • issue is that new needs with dynamic programming/use cases, the tracking is not accurate enough on say tracking of core use/free
    • when having multiple jobs in the system, causes problems for way tracking was done. need new approach that does not chew up lots of memory. need to track independently of the hw topologies.
    • current mapping changes likely not to be committed b/c they do not solve the root problem, just patch symptoms
    • will do this work after Howard/Session req
    • Sam - maybe could leverage work from other project for purpose here. there using resource bitmasks, possible useful here.
    • Ralph - similar to what are doing, but trying to reuse some of hwloc's capabilities for this mapping and when "map" use these bitmasks to mark utilization. Need efficient method to have tracking at different level. Add node level bitmask to Node object in prte, and track there w/o hitting the caching problem facing now.
    • These changes will touch every mapper, so will take a while.
  • Ralph - dynamic resource folks

    • Launch applications where a set of "drones" reside
    • Launch set of processes where each of these "drones" reside.
    • There is a co-location capability, but currently tied to the debugger scenario and lacking some book-keeping for use w/ app processes. Will require generalizing the debugger scenario.
    • Plan to work on this after Howard/Session item, before Mapper/ResourceTracking item
    • Slightly more complicated with OMPI + PMIx, but little easier with
    • MPICH + PMIx for mpicc wrapper
    • If using internal (embedded) pmix, then still need to add -lpmix
    • This is a bit of a barrier for OMPI + PMIx users. Needs to be solved to have users easily use this item.
    • There are some workarounds for now, but looking to OMPI members to work this item
  • Ralph - finding some inefficiencies in pmix lib

    • e.g., extraneous free/malloc
    • removed some of those and added comparison function to help avoid dups in hash system to elminate these inefficiencies
    • branches have drifted so may be more difficult to pull those over
    • some genuine bug fixes (fixed in master), but will need to back port to v4.2.x (likely v4.0.x, v4.1.x b/c those are actual bugs)
  • Ralph - slurm issues resolved

    • Slurm is now compiling up to pmix-master
    • Plan to eliminate the slurm-pmix branch as changes will be upstreamed to slurm and available in their next release (May 2022)
  • Ralph - prrte blocker flags

  • Ralph - other items

    • European project w/ dynamic PMIx work is continuing
  • Ralph - ompi notifications

    • Due to snafu on notification settings, fixed/changed now (think), did not get some notifications from ompi github even if explicitly mentioned (@rhc54)
    • Should be fixed now, but if not getting response on something please ping directly (some pre-existing mentions not get notify)
  • Ralph/Thomas - pmix standard clarification

    • Will add clarifying text on job/app location of attributes for spawn
    • Can query for supported attributes, but not indicate if recognized via the job or apps set of info keys.
    • Clarification on PMIx_Spawn info attributes #400
    • Ralph - note that pmix_server_dyn.c will have set of attributes recognized by apps area
  • Aurelien - working on variable process sets (psets)

    • Aurelien/Howard Working to support variable psets to support MPI Sessions work
    • MPI-4 sessions only require static psets, but working on future functionality that could employ variable psets, so would expect that work to be coming back to prte/pmix as needed