8Mar2017 - openpmix/openpmix GitHub Wiki

PMIx OpenMP/MPI/RM Working Group

Date

Mar 08, 2017

Attendees:

Ralph Castain, Intel.
Aurelien Bouteiller, UTK
Geoff Paulsen, IBM
Josh Hursey, IBM
Alexandre Eichenberger, IBM Mark Allen, IBM Howard Pritchard, LANL

Minutes:

We started with a brief review of the last meeting's minutes, and then initiated a discussion to identify the problems we want to address in this working group. The general objectives identified by the last meeting seemed to meet agreement here as well:

provide a mechanism by which each library can determine that the other library is in operation. It was noted that OpenMP is looking for envars to indicate that MPI is active, but this isn't terribly reliable as MPI envars change over time and with releases. Also, the fact that an envar is present doesn't mean that MPI_Init was called since RMs routinely set envars "just in case" the application needs them.
allow libraries to share knowledge of each other's resources and intended resource utilization. For example, OMP would benefit from knowing if the MPI layer is using pthreads. Likewise, the MPI layer might be able to take advantage of the OMP worker thread pool if it knew it existed
give users the ability to experiment as the community really doesn't know the "best practices" for hybrid applications at this point. We shouldn't design interfaces that are rigid to current practices as these may not prove to be the best long-term.

Determining that OMP is "active" can be somewhat tricky. Initializer is done on first use, which can occur before main. No OMP explicit call to init, unlike MPI. How much is setup before main might be executing depends on the implementation.

So we need some kind of callback mechanism by which a library can be "notified" that the other model initialized after it has already initialized itself. It was noted that we need more than just "other guy initialized" - we need to know what is happening, e.g., how many cores is the other guy using? We clearly can only get a snapshot in time, but not guaranteed to still be accurate at any later moment. It was noted that it might be able to be guaranteed if MPI used OMP thread pool as then there would be a single "figure of authority".

MPI uses threads that sleep, wakeup to do something, go back to sleep. OMP uses thread tasking model with pool of workers. Perhaps knowing that OMP is around could allow MPI to use the pool instead? Would need a way to tell the the other model that "I need so many resources at this priority" to help determine who gets what when. May be a second-order optimization - may get more bang by simply ensuring the threads from the two models don't both try to use the resource at the same time.

We perhaps could have a common pool that one of the libraries owns and does allocation, maybe extending the idea by providing a pre-emption mechanism so allocations could be re-adjusted on-the-fly. Could provide API that requests threads and gives priority hint. Probably just need a blocking call, plus a way to release the threads back to the pool.

OMP allows user or RM to set limit on #threads used on host, else take limit as #cpu. Use thread per HT? Can define by policy, can spread across cores by policy as well. Looks at cpu mask to determine which cpu's are available to this process, but doesn't cover GPUs and NICs.

Geoff: pointed out that mix of MPI and OMP threads on cores could be influenced by knowing of existence. We didn't have time to explore that point, so will defer to another meeting.

How to manage thread migration? Generally don't allow, lock thread to HT/core.

Can we get agreement on who specifies/controls the resources? Alexandre - think its MPI as it has more global view, sets some default partition of resources, OMP operates within that envelope

Completed actions

Ralph to arrange for Webex meeting to replace Skype [DONE]
Ralph to send out recurring calendar event to people expressing interest in attending [DONE]

Pending actions

Geoffroy to arrange meeting with SNL "RM" project to discuss possible collaboration on APIs for coordinating their "worker pool" of threads

New actions

Ralph offered to create a strawman set of API definitions based on conversations so-far and circulate it for comment purely as a "stake-in-the-ground" to make the discussion a little more concrete
Ralph agreed to send to the mailing list some links to background on PMIx
Alexandre agreed to do the same for OpenMP, and provide an overview presentation at future meeting