Meeting 2025 02 06 - openpmix/openpmix GitHub Wiki
February 6, 2025 OpenPMIx-devel call notes
Attendees
- Aurelien Bouteiller (UTK)
- Sonja Happ (ParTec)
- Simon Pickartz (ParTec)
- Norbert Eicker (JSC)
- Stephan Krempel (ParTec)
- Yannik Muller (ParTec)
- Ralph Castain (self)
Agenda
- significant implementation changes to PMIx_Resolve_peers[node] and PMIx_Query_info
- dictionary confusion when dealing with clients using significantly different versions and/or implementations
- documentation policy changes (focusing on adding explanations to the “docs”)
- release plans for PMIx v5/6 and PRRTE v3/4
Notes
-
PMIx_Resolve_peers, PMIx_Query_info
- https://github.com/openpmix/openpmix/issues/3359
- can this be resolved locally? can't: it goes to the server, if the host supports Query, pass it to the host, the host will pass-back the info about peers; no host query, the server will return the answer from its own local knowledge and pass it down
- this is useful when you need to provide an answer to query when the server did not receive the info on job startup (or when additional peers have been added since then)
- some hosts would give only node-local peers to the server on startup
- that could save having to connect the namespaces to obtain the peer map, but still avoiding distributing info that is generally never used, but may be used in some cases (so explicit query)
- OMPI has an use case with intercom-merges where they do need to collect peers to create the network endpoints; they would use different node maps on different nodes part of the original intercoms and that would deadlock (notably when creating shared memory segments would be problematic here)
-
shared-memory dictionary
- using integer keys to access the dictionary
- however the dictionary may be different if the client and server are running different versions
- multiple clients may be using a different versions under the same server
- not using shared memory dictionary under cross-version support would resolve that problem
- but at the cost of duplicating the dictionary multiple times
- Ralph wants to try to fix the cross-version support before resolving to dropping to 'fallback, duplicate memory', because it is a more common than expected deployment case (ParTec confirms, OpenMPI ships its own PRTE)
- The job-level info is already fixed: the dictionary is provided by the server, so clients will access based on server indexing, client-specific keys (if client is newer than server), the client keys are reindexed at the end of the dictionary.
- When data is PUT, data is stored in the local view before it gets committed to the server, that assigns an index that depends on the PUT call order (not common, but could happen notably with MPMD); the index from different clients are differing, so that makes the server not being able to use the client-based indexing. To solve this the modex info goes into the hash system as key-value pairs rather than index-value pairs.
-
documentation policy changes
- ASC25Q1 was informed: Ralph writing documentation in openPMIx rather than as 'standardease' wording
- We can pick-on the documentation to write the standard text at a later point, Ralph being semi-retired will not be doing that
- Standard committee should look at the doc to make sure that it looks legit and raise alarm if not
-
Timeouts
- Many API (Notably collectives) support a timeout attribute.
- Timeouts introduce race conditions and can complicate the implementation
- This attribute has been mostly ignored so far
- fence can be local, then its the responsibility of the server to deal with timeouts
- Trying to have an uniform across the board behavior among APIs for when there are timeouts, still WIP
- driving use case: OMPI wants group_construct to have a timeout, trying to generalize from there
-
Release plans for PMIx v5/6 and PRRTE v3/4
- PMIx6 is a fork of master, has new things that are not in v5 (group ops in v5 have problems, group-construct is new), back port would be expensive so not planning
- PMIx v5 is still receiving bug fixes, no new features, 6-7 bugfixes in the branch not released
- distro packagers don't like release too often, they prefer 6mo or so
- When do we stop supporting v5?
- ParTec asks: What is the cost of upgrading from v5 to v6?
- should be straightforward, there are new features but the old APIs have not changed, group ops have been modified quite a bit so that would be the area of pain if any
- rework of resolve peer: is this in v6 or v5? its not a bug, but a limitation so the inclination is to limit it to v6
-
Issue reported related, maybe related to shmem module.
- https://github.com/openpmix/openpmix/issues/3433
- https://github.com/open-mpi/ompi/issues/12993 %TODO: verify this is the correct link for the same issue in ompi
- ParTec can reproduce, looking into it
- Could be an uninitialized variable, length of the allocation could be wrong, checked for 0, but not for negative, should probably be a size_t
-
Future activities planning (from prior notes)
-
Planning: Dealing with daemon failures
- Previously had support in ORTE for daemon failures
- Interest in having these pieces restored
-
Planning: Examples need revisited
- Some examples are not working and need to be reviewed to make sure they are all in running condition, document them and how to run them.
Prev. TODO (from past meeting notes)
TODO: agenda PMIx Forum Q125 "resolve peer" discussion, more fine-grained error reporting (success on operation but results are empty, etc.). See also: openpmix:#3359TODO: Update GoogleGroup link on OpenPMIx www, https://groups.google.com/g/pmix?pli=1TODO: migrate to self-hosted captcha page for meeting info https://openpmix.org/captcha.(Note: also migrate PMIx Standard to self-hosted captcha https://pmix.org/captcha.)
TODO
- tell Thomas to change the calendar invite to the 13th of March
- Document difference between v5 and v6
Next meeting
- Move next meeting to the 13th of March as it conflicts with the MPI Forum on original planned date