3May2017 - openpmix/openpmix GitHub Wiki

PMIx OpenMP/MPI/RM Working Group

Date

May 3, 2017

Attendees:

Ralph Castain, Intel.
Alexandre Eichenberger, IBM
George Bosilca, UTK
Aurelien Bouteiller, UTK
David Bernholt, ORNL
Geoffroy Vallee, ORNL
Swem Boehm, ORNL
Terry Wilmarth, Intel.
(my apologies for missing the names of any others)

Minutes:

Welcomed new participants Terry Wilmarth and Michael Klemm, both of Intel.

Status of work:
  • Modifications have been made to PMIx per the RFC. They are currently in the OMPI prototype branch but will be brought over to the master repo shortly

  • An OMPI branch has been created with the inter-library support. Ralph is in final debug and will post URL when ready

  • We won’t modify OpenMP runtime yet but can declare the library using PMIx from inside the app for now. Ralph will provide some more info hopefully motivating OpenMP to adopt PMIx regardless of inter-library support (see below in response to specific questions raised during the meeting)

What info is available via PMIx at time of process start:

Copied here from wiki:

Launching a new job can be accomplished much more scalably if the host resource manager provides each application process with a set of information required for initial wireup support. This information is communicated to the process at first call to PMIx_Init, and can be retrieved using the PMIx_Get function. This includes:

Application-level Information
  • JobID: unique namespace assigned to identify processes belonging to this job
  • Offset: starting global rank of this job. Rarely used, but may be of greater interest in the future.
  • Universe size: number of processes in this namespace
  • Job size: number of processes in this specific application context, and in the overall job
  • Number of application contexts in this job
  • Local size: number of processes in this application on this node
  • Node size: Total number of application processes (spanning all namespaces) on this node
  • Max procs: largest number of allowed processes for this application
  • Scratch directories assigned to the process, the namespace, and the allocation
Mapping Information
  • List of nodes hosting processes in this job. Typically expressed as a regular expression. For convenience, PMIx provides the PMIx_generate_regex function that will generate a regular expression for this purpose when given an array of node names
  • Map of process ranks to nodes. Typically expressed as a regular expression. For convenience, PMIx provides the PMIx_generate_ppn function that will generate the regular expression when given an array of process ranks
Node-level Information
  • Node ID: integer identifier of this node
  • Hostname: the name the resource manager is using for this host. Usually is just the output of the hostname command.
  • Local peers: comma-delimited list of ranks from this application that share the local node
  • Local cpusets: comma-delimited list of cpuset bindings for the local peers
  • Local leader: rank of the lowest-ranked peer on this node
  • Architecture - integer representation of the datatype architecture
  • Node topology - the HWLOC topology of the local node
  • Top-level temporary directory assigned to this allocated session on this node
  • Temporary directory assigned to this namespace under the top-level session temporary directory
  • Number of jobs using this node - will be updated via event notification. Only other than 1 in shared environments
Peer-level Information

The following list of information should be provided for each peer in the application:

  • Rank: an integer rank of the process within the application
  • Appnum: the number of the application to which this process belongs, starting with zero. Specifically addresses multi-application jobs.
  • Application leader: the lowest global rank of a peer in this specific application. Will always be zero except for multi-application jobs
  • Global rank: integer rank of the process within the overall job. Will always equal the process' rank except in multi-application jobs
  • Application rank: integer rank of the process within its own application
  • Local rank: integer rank of the process amongst its peers on the node where it is executing
  • Node rank: integer rank of the process across all processes on the node where it is executing
  • Node ID: the integer identifier of the node where this process is executing
  • URI: contact information for the process
  • Cpuset: the cpuset this process to which this process is bound
  • Spawned - a boolean flag indicating if this process was launched via a dynamic spawn request
  • Temporary directory assigned to this process under the namespace temporary directory
Can envars be added to a job

Yes, through a couple of mechanisms. The PMIx community is working on a set of basic tools for interacting with a PMIx-enabled resource manager for operations such as querying queue status, submitting a job, and enabling interactions between a batch job script and the running application. The tool for submitting a job would add any given envars to the application context before calling PMIx_Spawn.

PMIx also has the PMIX_SET_ENVAR and PMIX_UNSET_ENVAR attributes that are used in PMIx plugins for modifying the environment of an application process. These attributes allow the plugin to instruct the resource manager to apply the changes to the constructed process environment just prior to spawning it.

Debuggers

The PMIx community has been working closely with several debugger vendors to integrate PMIx support into their products. This would help standardize the debugger-to-application interface, but also opens the door to a range of new features, including:

  • the PMIx_Query API to obtain information on system configuration, network bandwidth utilization, etc. so the debugger can display the processes in relation to their network location, identify congestion points, and other advanced features.

  • the PMIx_Job_control API that instructs the RM regarding control over job execution (e.g., pause, restart, etc.). This includes the ability to terminate and restart specific processes, and allows an application to request that a debugger attach to it once a problem has been encountered. Also includes the ability to request checkpoints.

  • the PMIx_Log API for scalably storing “trace” information as a record in the RM’s job history

  • the PMIx_Monitor API that allows the app to request heartbeat monitoring for detecting “stalled” applications.

  • network interface APIs that will allow the tool to communicate with the fabric manager and the network library on individual nodes - this will support information requests as well as configuration commands

  • file system APIs to query GPFS and other parallel file systems for status info and estimated retrieval times, plus issuing of commands for data movement and caching strategies.

If these features are of interest, please let your debugger vendor know! Full implementation is currently being delayed by the need for customer requests and expressions of interest.

PMIx licensing

Copied from LICENSE file:

The following LICENSE pertains to both PMIx and any code ported from Open MPI.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer listed in this license in the documentation and/or other materials provided with the distribution.

  • Neither the name of the copyright holders nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

The copyright holders provide no reassurances that the source code provided does not infringe any patent, copyright, or any other intellectual property rights of third parties. The copyright holders disclaim any liability to any recipient for claims brought against recipient by any third party for infringement of that parties intellectual property rights.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Completed actions

  1. Ralph will create a PMIx branch and begin updating it per the RFC [DONE]

Pending actions

  1. Geoffroy to arrange meeting with SNL "RM" project to discuss possible collaboration on APIs for coordinating their "worker pool" of threads
  2. Alexandre offered to create a branch and update the OpenMP runtime to support the RFC
  3. Aurelien volunteered to send a link to a suggested prototype application
  4. Everyone will review the RFC and provide comments!
  5. Ralph will continue working on the document to reflect the discussion, provided comments

New actions

  1. All - identify what info needs to be exchanged
⚠️ **GitHub.com Fallback** ⚠️