2.8 PMIx Server Data Requirements - openpmix/openpmix GitHub Wiki

2.8 PMIx Server Data Requirements

Launching a new job can be accomplished much more scalably if the host resource manager provides each application process with a set of information required for initial wireup support. This includes:

Application-level Information

JobID: unique namespace assigned to identify processes belonging to this job
Offset: starting global rank of this job. Rarely used, but may be of greater interest in the future.
Universe size: number of processes in this namespace
Job size: number of processes in this specific application
Local size: number of processes in this application on this node
Node size: Total number of application processes (spanning all namespaces) on this node
Max procs: largest number of allowed processes for this application
ClusterID: string identifier/name of the cluster the job is running on - used when connecting jobs on different clusters

Mapping Information

List of nodes hosting processes in this job. Typically expressed as a regular expression. For convenience, PMIx provides the PMIx_generate_regex function that will generate a regular expression for this purpose when given an array of node names. Clients can subsequently obtain information from the client library (which will parse the regular expression) using the PMIx_Resolve_nodes function.
Map of process ranks to nodes. Typically expressed as a regular expression. For convenience, PMIx provides the PMIx_generate_ppn function that will generate the regular expression when given an array of process ranks. Clients can subsequently obtain information from the client library (which will parse the regular expression) using the PMIx_Resolve_peers function

Node-level Information

Node ID: integer identifier of this node
Hostname: the name the resource manager is using for this host. Usually is just the output of the hostname command.
Local peers: comma-delimited list of ranks from this application that share the local node
Local cpusets: comma-delimited list of cpuset bindings for the local peers
Local leader: rank of the lowest-ranked peer on this node
Architecture - integer representation of the datatype architecture
Node topology - the HWLOC topology of the local node
Top-level temporary directory assigned to this allocated session on this node
Temporary directory assigned to this namespace under the top-level session temporary directory

Peer-level Information

The following list of information should be provided for each peer in the application:

Rank: an integer rank of the process within the application
Appnum: the number of the application to which this process belongs, starting with zero. Specifically addresses multi-application jobs.
Application leader: the lowest global rank of a peer in this specific application. Will always be zero except for multi-application jobs
Global rank: integer rank of the process within the overall job. Will always equal the process' rank except in multi-application jobs
Application rank: integer rank of the process within its own application
Local rank: integer rank of the process amongst its peers on the node where it is executing
Node rank: integer rank of the process across all processes on the node where it is executing
Node ID: the integer identifier of the node where this process is executing
URI: contact information for the process
Cpuset: the cpuset this process to which this process is bound
Spawned - a boolean flag indicating if this process was launched via a dynamic spawn request
Temporary directory assigned to this process under the namespace temporary directory