22Feb2017 - openpmix/openpmix GitHub Wiki

PMIx OpenMP/MPI/RM Working Group

Date

Feb 22, 2017

Attendees:

Ralph Castain, Intel.
Aurelien Bouteiller, UTK
Geoff Paulsen, IBM
Josh Hursey, IBM
Thomas Naughton, ORNL
Geoffroy Vallee, ORNL

Minutes:

We had logistical problems with Intel's Skype teleconferencing system, and several people who had indicated a desire to attend probably missed the meeting due lacking a calendar invite. Ralph is prohibited from sending invites to the mailing list (hackers are lurking), so he will generate invites for upcoming meetings and send them directly to the individual attendees. He will also arrange for a Webex meeting to replace Skype.

Given the light attendance, we didn't try to be too ambitious. Main thrust of the discussion was to begin setting objectives and scope for the discussion:

Geoffroy provided a brief overview of two related ECP projects, one focused on OpenMPI and the other on OpenMP. Three use-cases were identified:
- hybrid applications (MPI + OpenMP)
- avoiding conflicts when multiple jobs are on the same node, each using threading
- threaded applications with MPI underneath - e.g., PGAS using MPI
Aurelien noted that the coordination problem goes beyond just OpenMP and applies to any threaded application, even those that directly use threading primitives
There was a general sense that the community in general doesn't really know how threads and MPI might best work together. Ralph suggested that we therefore focus on enabling experimentation, providing a set of APIs by which people can try a variety of things before settling on what works and what doesn't. We can then throw out things that prove "non-useful", but not preclude something that later proves to be a "winner"
We suggested a few things that are likely of near-term interest
- "announce" that a given model is active (e.g., OpenMP and MPI) so that they are at least aware of each other. Ralph noted that some PMIx folks were looking at exchanging that info with the RM at PMIx_Init, looking towards the day when the RM might provide some optimized support based on programming model.
- what processor is each thread using?
- allow users to experiment with placement of ranks and threads relative to GPUs and NICs
Geoffroy noted that there is an ECP project based at SNL looking at a new "resource manager" model where a "worker pool" of threads is created on each node, and applications request threads from within that pool. This allows for flexible resource utilization while managing conflicts. They are only just getting started, and obviously will need an app-to-RM communication capability. He suggested that there may be some overlap worth exploring.

Actions

Ralph to arrange for Webex meeting to replace Skype [DONE]
Ralph to send out recurring calendar event to people expressing interest in attending
Geoffroy to arrange meeting with SNL "RM" project to discuss possible collaboration on APIs for coordinating their "worker pool" of threads