3.3 PMIx Event Notification - openpmix/openpmix GitHub Wiki

###3.3 PMIx Event Notification The resource manager will be aware of a wide range of events that occur across the system. For the purposes of this discussion, only events that impact the allocated session being served by the PMIx server are considered. These events can be divided into two distinct classes:

  • Job-specific events that directly relate to a job executing within the session. This might include events such as debugger attachment or process failure within a related job. These events are characterized by directly targeting processes within session jobs - i.e., the "procs" parameter of the notification contain members of a job executing within the session. Events in this category are to be immediately delivered to the PMIx server library for delivery to the specified processes. Clients can indicate a desire to register solely for job-specific events by including the PMIX_EVENT_JOB_LEVEL key in their registration call.

  • Environment events that impact the session, but are not directly sent to executing jobs. This is a much broader category of events that includes ECC errors, temperature excursions, and other environmental events directly affecting the session's resources. Note that although these do impact the session's jobs, they are not directly referencing those jobs - i.e., the event is generated without specifying a particular target. Thus, events in this category are to be delivered to the PMIx server library only upon request - i.e., when the PMIx server has registered for those events.

    Note that race conditions can cause the registration to come after events of possible interest (e.g., a memory ECC event that occurs after start of execution but prior to registration). RMs are free to cache events in this category for some time to mitigate this situation, but are not required to do so. Thus, applications must be aware that environment events prior to registration may not be included in notifications.

    As above, clients can indicate a desire to register solely for environment events of a given type by include the PMIX_EVENT_ENVIRO_LEVEL key in their registration call.

The PMIx server will cache any environment events passed to it for a period of time to provide notification to clients that have not yet registered for them. Currently, the PMIx server uses a ring buffer to cache events. The size of the ring buffer defaults to 512 events (as of PMIx 2.0), but can be configured using the PMIx_server_cache_size info key during the call to the PMIx_Server_init API.

Event Registration

Registration of event callbacks is accomplished via the PMIx_Register_event_hdlr API. An array of info keys to describe the events upon which the provided callback is to be executed can include:

  • PMIX_EVENT_HDLR_NAME: a string name identifying the registered handler. This value can be used for debugging purposes to record the status returned by the handler when called, and so that subsequent handlers (called during the precedence chain, as described below) can use the name and value when determining their response.
  • PMIX_EVENT_NAME: an event reporting the specific pmix_status_t value given in this pmix_info_t object. See the enumerated list of status constants in the pmix_common.h file for valid values.
  • PMIX_EVENT_GROUP_NAME: a boolean flag indicating that events contained within the specified group NAME are to be reported. Examples include PMIX_EVENT_GROUP_COMM for events involving communication, and PMIX_EVENT_GROUP_MIGRATE to be notified of process migrations. See the documentation for a particular PMIx release for descriptions of supported groups as these may expand over time.
  • PMIX_EVENT_ORDER_PREPEND: a boolean flag directing that this callback be positioned first in precedence when considering events of the same type (named or group). Default is to append the callback to the end of the current precedence list

Registration of event callbacks that do not provide an array of info keys (beyond the optional PMIX_EVENT_HDLR_NAME) are considered default registrations for purposes of servicing order.

RM-Host Registrations

The RM host daemon is not required to register for any PMIx notifications. The daemon will automatically be notified (without registration) of client connection and finalize, plus any client service requests, via the appropriate server callback functions, if provided. However, internal PMIx server errors (e.g., message protocol violations) will only be reported to the host RM if the RM daemon has registered for event notification, and will specify a NULL value for the target recipients.

Note that PMIx does request that the host daemon register for PMIx notifications so that any client-generated notifications (as described below) can be circulated to their destination processes. The target recipients for these notifications will be specified in the callback function. Note that a notification targeting every process in a given namespace can use the PMIX_RANK_WILDCARD value in place of listing every process by individual rank. PMIx will return PMIX_ERR_NOT_SUPPORTED for notification requests when the RM does not register for notifications.

Client Registrations

Application processes may request event notification via the PMIx_Register_event_hdlr API. Registrations are first recorded in the client's notification callback stack using a first-in-first-out (FIFO) policy based on the order in which calls to PMIx_Register_event_hdlr were issued, subject to adjustments per the provided info keys. This order will dictate the precedence given to event processing. Once locally recorded, a registration request is reviewed to see if it pertains to job-level events - if so, the request is held at the client level as the host PMIx server will automatically transfer any job-level event notification. Environmental event registrations are sent to the local PMIx server for handling.

PMIx Server Registrations

The PMIx server itself does not register for event notifications. Instead, it serves as a proxy for client registrations. Internal events (e.g., unexpected client disconnect or message protocol failures) are resolved in code paths outside of the event notification system.

Once a registration request is received from a local client, the PMIx server checks to see if it is already registered with the host RM for matching events (either the specific event that was requested, or an event group that contains the specified event). If already registered, then no further action is required - otherwise, the PMIx server will register with the host RM for the specified events (or event groups).

Once registration is complete, the server acks the request to the client, and then transmits any matching cached events to the client for local notification. Cached events are always retained until the ring buffer

RM Notifications

PMIx expects that all RM notifications pertaining to an allocated session will be distributed to the RM daemons within that allocation. Job-specific events, and events for which the PMIx server has registered, are to be delivered upon receipt to the local PMIx server via the PMIx_Notify_event function. All environmental events are to be delivered to the PMIx server only if that server has previously registered for matching events.

Once the PMIx server has been given an event, it performs the following operations:

  • for a job-specific event, the server immediately sends the event to its local clients from that job. If all local clients have been started, then the server can delete the event upon completion of notification. If any local clients for the job have not yet connected, then the server will cache the event for delivery upon connection, and delete it from the cache once all relevant processes have been notified.
  • environment events are immediately sent to all clients who have registered for events that match the incoming event, and cached in the server's ring buffer. If the ring buffer is full, then the oldest event will be ejected and released to make room.

Upon receipt of a notification message, the PMIx client will scan its list of registered callback functions to identify appropriate recipients according to the following precedence rules:

  • registered callbacks for specific event codes that match the incoming one shall be serviced first. If multiple callbacks meet this criteria, then they will be processed according to their FIFO-based precedence when registered.
  • registered callbacks for event groups that contain the incoming event are serviced next. Again, if multiple callbacks meet this criteria, then the same FIFO-based precendence rules are applied to them.
  • default registered callbacks are serviced last.

The scan is continued until a callback returns PMIX_SUCCESS, thereby indicating that the event has been handled and no further action is required. Return of any other status indicates that the procedure is to continue, with the returned status added to the info key's passed along with the event. These updates are presented in a form where the key is the "name" given to the event callback (provided during registration), and the value is the returned status. Thus, subsequent event handlers can scan the incoming info key's to see what prior event handlers reported.

Once the client has completed handling of an event, the received notification message is released. No return message is sent to the notifying server - it is assumed that any such action will be taken directly by an event handler if required.

Client-Based Notifications

The client may also choose to generate notifications, either by the application itself (e.g., informing its peers of some internal event) or by the PMIx client library for use by its host application. Examples of the latter include notification of loss of contact with the local PMIx server, which indicates that the process has become isolated and may be used to trigger a "suicide".

Internal PMIx client library notifications are never transmitted to the local PMIx server. These notifications are only for use by the host application, and are provided based on registration by the application for events. Event registration by the client application does not differentiate between locally internal and external events. Thus, the user must differentiate by registering for specific internal error constants to separately respond to internal events. Currently supported internal events include:

  • PMIX_ERR_LOST_SERVER_CONNECTION: Connection to the local PMIx server has been lost, usually indicating that the local server has failed
  • PMIX_ERR_FAILED_COMM: Indicates either that a message from the local server could not be correctly unpacked, or that an outbound message could not be packed. In either case, the communication was unsuccessful.

Users are advised to check the release notes for their version for updates to this list.

Notifications generated by the application itself (via calls to PMIx_Notify_event) are transmitted to the local PMIx server for distribution according to the specified target processes. Since the PMIx server does not itself have the ability to communicate across nodes, it will rely on the host RM daemon to circulate the notification to the desired destinations. As stated earlier, this in turn relies on the host RM daemon having registered for notifications coming from the PMIx library - if this has not been done, then the PMIx server will return a status of PMIX_ERR_NOT_SUPPORTED to the client.

⚠️ **GitHub.com Fallback** ⚠️