2022 04 11 webex joint ftwg - mpiwg-sessions/sessions-issues GitHub Wiki
#04/11/22 meeting notes for joint FT/Sessions WGs meeting
Attending: Howard Pritchard, Brian Smith, Trupeshkumar Patel, Aurelien Bouteiller, Dan Holmes, Isais Urena, Thomas Hines, Grace Nansamba
Agenda items
- Continue discussion concerning agreement (see notes at the bottom of the miro document’s note section - https://miro.com/app/board/o9J_l_Rxe9Q=/ in particular we wanted to hear from Martin Schreiber about their asymmetric use of process set names
- Topics from the FT WG (maybe reinit + sessions)?
Notes
Isaias has taken two ideas he's taking aware about versioning of process set names. One approach is explicit versions for sessions. Second approach with the close session/open new session in which case we could hide the version name from the application, mpi_world0, mpi_world1,... Will MPI internally know about these versions of world? If A thinks B is in the process set than B must think that A is in the process set - question of scope. A background consistency model such that when a process set is available all processes in the set will "see" it.
Example of ocean and atmosphere. Going to use MPI_Intercomm_create_from_groups. All processes would need to be able to "see" ocean and atmosphere process sets. Should a process only be able to see the ones in its "job"? Or maybe system level, like rack1, rack2, etc.
Isais asks if we have multiple proposals or just one? Dan is of the opinion that we essentially only have one at this point. We are still thinking that we need a new session handle to pick up changes in process sets. We still need to define a comparator between process sets from from old and new sessions. Notification of changes is a separate concern. Should MPI provide new session handle? Or should the app make a call to MPI_SESSION_INIT? Still need to decide on this. psets within a session have to be immutable. Martin Schreiber doesn't like this idea - immutability of psets. Dan clarifies that what he means is that once a pset is visible, it can't be changed. new psets can appear within a given session however. Discussion of a world1 and a world2 process set names.
Aurelian has a markup example. This gets back to the version mismatch problem. Idea, when invoking MPI_COMM_CREATE_FROM_GROUP all groups derived from older psets will be revoked. Discuss incorporating attempt number to MPI_COMM_CREATE_FROM_GROUP in the stringtag argument. Discuss order of versions? If we don't use numeric version, but something else. Need a way to compare process sets with same base names, which is newer? If we stick with numbers would be simpler but need to state that X is newer than Y if the numberic value X is greater than Y.
Martin Schreiber - MPI pset sync idea - which is allowed to also remove psets for an MPI process.
Snapshot of chat:
11:29:07 From Martin Schreiber to Everyone: My connection is very bad, so I'm dropping a chat message. I don't think that the multiple sessions workaround (as I see it) is the way how we should advance here because of the overheads. I think that we (e.g., Dominik) showed how to do it without this. 11:31:15 From Martin Schreiber to Everyone: I also think that we should weaken this statement that the set of psets within a session should be immutable (yet another sentence I understood ...). 11:34:41 From Martin Schreiber to Everyone: How about including an "MPI PSET sync" which is allowed to also remove psets for an MPI process? 11:39:21 From Martin Schreiber to Everyone: I just want to avoid creating every time a new MPI Session 11:40:54 From Martin Schreiber to Everyone: Wounldn't also creating a new MPI Session include this psets sync? 11:41:08 From Dan (Intel) to Everyone: Probably 11:41:36 From Dan (Intel) to Everyone: otherwise most of the pset names don't mean anything reliable 11:45:08 From Martin Schreiber to Everyone: Do we need a full sync of all psets? I think that only getting the information about one particular pset is sufficient. This should be sufficient for the master/root approach. 11:45:44 From Martin Schreiber to Everyone: If the pset is then not yet available locally, it needs to be looked up in the "global dictionary"