Stage 1: Setting up the application - openpmix/openpmix GitHub Wiki

Once the scheduler has allocated resources to the session and the launcher has mapped the application processes in the job to their respective node/cpuset locations, the launcher is ready to send its launch message to the backend daemons. This typically includes information about the job itself along with the map as to which ranks each daemon is to launch. However, prior to sending that message, the launcher should pass the job map to the PMIx_server_setup_application API. This allows vendor plugins to utilize their knowledge of the system inventory and topology to provide their own job- and rank-specific data. Examples of available returned data include (subject to vendor plugin support):

  • relative device distances for each process to each local resource (e.g., NICs and GPUs) to assist in device selection, optimized communication path selection, and workload balancing across the job. Depending upon the vendor plugin, this usually requires that the job map include cpuset assignments for each rank.

  • endpoint assignments for each process and each NIC available to that process

  • network security keys for communication between the application processes in the job

  • network topology information, including the mapping of NICs to switches and switch-to-switch interconnects

  • GPU bindings for each process, indicating which GPUs each process was allocated

  • forwarded environment variables, including the ability to set those variables to something other than what is in the launcher's local environment.

The launcher can control the information to be returned via attributes passed to the PMIx_server_setup_application function:

  • PMIX_SETUP_APP_ENVARS: request that envars be included in the returned data
  • PMIX_SETUP_APP_NONENVARS: request that non-envar data be included
  • PMIX_SETUP_APP_ALL: request that everything available be provided (default).
  • PMIX_ALLOC_FABRIC_ENDPTS: request that process endpoint assignments be included
  • PMIX_ALLOC_FABRIC_SEC_KEY: request that a network security key be allocated to this job

Data is returned to the host launcher via the specified pmix_setup_application_cbfunc_t as an array of pmix_info_t structures that the launcher can include in its launch message. Upon receipt of the launch message, backend daemons should separate out the application setup data and deliver it to their local PMIx library via the PMIx_server_setup_local_support API. This provides the vendor plugins with an opportunity to identify any data included by the launcher's corresponding plugins and take appropriate action. This can include caching envars for later passing to application processes, or setting up local network drivers - e.g., by installing a security key.

NOTE: the PMIx_server_setup_local_support function, and the vendor plugins underlying it, will not take any action (e.g., setting up local network drivers) until the host registers the job with its PMIx server (via the PMIx_server_register_nspace API) and indicates (in the nlocalprocs parameter) that there will be application processes from this job executing on the node. This is done to avoid unnecessarily consuming local resources (e.g., in the network driver) when no processes will be using them.


Prior stage: Collecting inventory........................................................Next stage: Spawning the local processes