Stage 0: Collecting Inventory - openpmix/openpmix GitHub Wiki

While not technically part of the launch procedure, the inventory of resources (including NICs, switches, network and node topology, memory and CPUs, etc) is critical for scheduling applications as well as efficient completion of the launch process. Many of the vendor-specific support plugins in PMIx depend upon having this information when assigning endpoints, computing device distances, assigning security keys, and other operations. They can get their information in several ways, depending upon the vendor and the environment:

  • inventory rollup. In the case of something like mpirun, the launcher will start its launch procedure by first spawning a "proxy daemon" on each node where application processes will be executed. These daemons usually "phone home" to indicate they are alive, provide their connection information, etc. For this scenario, PMIx provides the PMIx_server_collect_inventory API that allows the vendor plugins in the daemon to examine the local inventory, identify whatever information they might need for supporting applications, and extract that for relay back to the launcher. This is aggregated across all vendor plugins by the PMIx library and returned to the caller as a single data "blob". Typically, this represents a fairly small additional payload that the daemon incorporates into its "phone home" message. Upon receipt back at the launcher, the launcher separates out the data "blob" containing the inventory and delivers it to the PMIx library via the PMIx_server_deliver_inventory API, thus allowing the vendor plugins supporting the launcher to extract their information about the backend nodes.

  • local topology. In cases where the system is homogeneous, the PMIx library supporting the launcher can simply infer the inventory on the backend nodes. This is the default situation in the absence of the launcher providing detailed inventory information as it covers the majority of installations. The PMIx library will use the topology local to the launcher as its template, with each plugin then operating as if each node has an identical inventory. However, in this mode the plugins may not be able to assign endpoints (e.g., if they lack the NIC-specific identifiers for devices on the backend nodes). Thus, support when operating in this mode may be limited.

  • external inventory provided. In some cases, the launcher has access to the local topology for each node in the system. This can be loaded into the PMIx server library supporting the launcher via the PMIx_Register_resources API, passing in each unique topology in combination with the node where it is found. If the launcher provides the actual topology for each node, then the vendor plugins in the PMIx library will be able to provide full support. In the case where the launcher only knows the generic topology for each node - i.e., it knows what topology corresponds to each node, but doesn't have the exact topology of that node - then the launcher can provide each unique topology along with an array of PMIX_NODEID or PMIX_HOSTNAME identifying the nodes that have it. Again, in this mode, the plugins may not be able to assign endpoints.

  • vendor-specific methods. Some vendors utilize centralized services to manage inventory information. In these situations, the vendor may choose to have their plugin directly obtain any required inventory information from those services. Such plugins will not return information in the inventory rollup procedure as they know they will not be needing it - thus, there is no harm done by executing rollup in this scenario.

  • cached information. Some environments (e.g., a system employing a resource manager) continuously run daemons on each node of the cluster, bringing them up at system boot. These daemons are part of the cluster monitoring/control system. If the control system daemons include a PMIx server, then they can utilize the inventory rollup method to create a cache of the overall system's topology information by including the PMIX_TOPOLOGY_CACHE attribute (specifying the filename to use) when calling the PMIx_server_deliver_inventory function. During startup, the launcher can pass that same attribute to the PMIx_tool_init API - this will direct the PMIx server plugins to read the file and obtain their topology information from it.

Regardless of the method employed, the objective is to "seed" the launcher's PMIx server library with the information it requires to provide each process with a complete picture of the resources available to every process in the job, thus allowing applications to optimize their work distribution, the selection of communication channels, etc.


...............................................................................................................Next stage: Setting up the application