Running HPX on Rostam - STEllAR-GROUP/hpx GitHub Wiki

Introduction

Rostam is a heterogeneous cluster with a wide variety of machine architectures and has 26 computing nodes. Rostam uses SLURM for job scheduling. Rostam's compute nodes are grouped into the following partitions:

  • sc13: It consists of two nodes: (Carson and Reno). Each has two 10-core Intel Xeon E5-2670 Ivy Bridge CPUs with 128 GB of RAM each. Reno has an NVIDIA K40m and an AMD Radeon R9 Nano Fury GPU. Carson has two AMD Radeon R9 Nano Fury GPUs. Hyper-threading is on for Reno and off for Carson.
  • k40: Contains Reno.
  • cuda: Nodes with Nvidia GPUs (Diablo, Bahram, Geev, Reno, Tycho)
  • QxV100: Diablo, A node with two 20-core Intel Xeon Gold 6148 Skylake CPUs, 384 GB of memory, and four NVIDIA Tesla V100 SXM2s. Hyper-threading is off on this node.
  • k80: Bahram, a node with two Intel Xeon E5-2660 Haswell CPUs (20 cores in total) with 128 GB of memory, and four NVIDIA K80s. Hyper-threading is off on this node.
  • v100: Geev, a node with two 10-core Intel Xeon E5-2660 Haswell CPUs with 128 GB of memory and two NVIDIA V100s. Except for the V100s, this node is identical to Bahram.
  • sc15: It consists of two nodes: Bahram and Geev.
  • marvin: It has 15 nodes, each with two 8-core Intel Xeon E5-2450 Sandy Bridge CPUs and 48 GB of memory.
  • medusa: It has 16 nodes, each with two 20-core Intel Xeon Gold 6148 Skylake CPUs and 96 GB of memory.

Note

The list of available partitions can be queried on Rostam using sinfo.


Software

Rostam comes with a set of pre-installed software that can be used with HPX. Different compilers and libraries are available on Rostam by using the modules system to configure the environment.

Modules System

Usage

  • module avail - List all available modules
  • module load gcc/8.3.0 - Loads GCC 8.3.0 compiler in this terminal
  • module unload gcc/8.3.0 - Unload GCC 8.3.0 compiler in this terminal
  • module list - Lists all loaded modules in this session
  • module purge - Unload all loaded modules in this session

Software Versions

Rostam has multiple versions of some software installed (e.g., GCC, Boost). When a -debug extension is present at the end of a module name, it indicates that the package is a debug version of the software (was compiled with debug symbols). Most software has been compiled with GCC 4.8.5, Red Hat Enterprise Linux 7's default compiler. When a different compiler was used, the compiler name and version are included in the package name. If the software comes by Red Hat but is not the default version of the software, the letter s appears at the end of the module name.

Default Modules

Red Hat Enterprise Linux comes with many pre-compiled libraries that are installed in /usr. This software can be used directly, without the users needing to load modules or manually alter the environment variables. These default packages are compiled with GCC 4.8, and they only support the old C++ ABI.

Compiler Modules

  • GCC
    • gcc/4.8.5s - The default compiler, comes with Red Hat Enterprise Linux 7 itself
    • gcc/4.9.2s - Provided by Red Hat, uses GCC 4 compatible ABI
    • gcc/4.9.3
    • gcc/4.9.4
    • gcc/5.3.1s - Provided by Red Hat, uses GCC 4 compatible ABI
    • gcc/5.4.0
    • gcc/5.5.0
    • gcc/6.1.0
    • gcc/6.2.0
    • gcc/6.3.0
    • gcc/6.3.1s - Provided by Red Hat, uses GCC 4 compatible ABI
    • gcc/6.4.0
    • gcc/6.5.0
    • gcc/7.1.0
    • gcc/7.2.0
    • gcc/7.3.0
    • gcc/7.3.1s - Provided by Red Hat, uses GCC 4 compatible ABI
    • gcc/7.4.0
    • gcc/8.1.0
    • gcc/8.2.0
    • gcc/8.3.0
    • gcc/8.3.1s - Provided by Red Hat, uses GCC 4 compatible ABI
    • gcc/9.1.0
    • gcc/10.0.1
  • Clang
    • clang/3.7.1
    • clang/3.8.1
    • clang/3.9.0
    • clang/3.9.1
    • clang/4.0.0
    • clang/4.0.1
    • clang/5.0.0
    • clang/5.0.1
    • clang/6.0.0
    • clang/6.0.1
    • clang/7.0.0
    • clang/7.0.1
    • clang/8.0.0

Boost

Boost versions 1.58.0 and upward are installed in /opt/boost. The newer versions are compiled with several compilers. Boost v 1.53 is the pre-compiled default package; this is what you get when loading no modules.

Example: HPX with GCC 6.3.0 and Boost 1.63.0

Load CMake 3.14.2, GCC 8.3.0 and Boost 1.70.0 compiled in Debug mode:

module load cmake/3.14.2 gcc/8.3.0 boost/1.70.0-gcc8.3.0-release

Run CMake:

cmake /path/to/source/tree

Installed Boost versions:

Version Compiler Module Path
1.58.0 GCC 4.8.5 boost/1.58.0
boost/1.58.0-debug
/opt/boost/1.58.0/release
/opt/boost/1.58.0/debug
1.59.0 GCC 4.8.5 boost/1.59.0
boost/1.59.0-debug
/opt/boost/1.59.0/release
/opt/boost/1.59.0/debug
1.60.0 GCC 4.8.5 boost/1.60.0
boost/1.60.0-debug
/opt/boost/1.60.0/release
/opt/boost/1.60.0/debug
1.60.0 Clang 3.8.1 boost/1.60.0-clang3.8.1
boost/1.60.0-clang3.8.1-debug
/opt/boost/1.60.0-clang3.8.1/release
/opt/boost/1.60.0-clang3.8.1/debug
1.61.0 GCC 5.4.0 boost/1.61.0-gcc5.4.0
boost/1.61.0-gcc5.4.0-debug
/opt/boost/1.61.0-gcc5.4.0/release
/opt/boost/1.61.0-gcc5.4.0/debug
1.61.0 Clang 3.9.1 boost/1.61.0-clang3.9.1
boost/1.61.0-clang3.9.1-debug
/opt/boost/1.61.0-clang3.9.1/release
/opt/boost/1.61.0-clang3.9.1/debug
1.62.0 GCC 5.4.0 boost/1.62.0-gcc5.4.0
boost/1.62.0-gcc5.4.0-debug
/opt/boost/1.62.0-gcc5.4.0/release
/opt/boost/1.62.0-gcc5.4.0/debug
1.62.0 Clang 3.9.0 boost/1.62.0-clang3.9.0
boost/1.62.0-clang3.9.0-debug
/opt/boost/1.62.0-clang3.9.0/release
/opt/boost/1.62.0-clang3.9.0/debug
1.63.0 GCC 6.4.0 boost/1.63.0-gcc6.4.0
boost/1.63.0-gcc6.4.0-debug
/opt/boost/1.63.0-gcc6.4.0/release
/opt/boost/1.63.0-gcc6.4.0/debug
1.63.0 Clang 4.0.1 boost/1.63.0-clang4.0.1
boost/1.63.0-clang4.0.1-debug
/opt/boost/1.63.0-clang4.0.1/release
/opt/boost/1.63.0-clang4.0.1/debug
1.64.0 GCC 6.4.0 boost/1.64.0-gcc6.4.0
boost/1.64.0-gcc6.4.0-debug
/opt/boost/1.64.0-gcc6.4.0/release
/opt/boost/1.64.0-gcc6.4.0/debug
1.64.0 Clang 4.0.1 boost/1.64.0-clang4.0.1
boost/1.64.0-clang4.0.1-debug
/opt/boost/1.64.0-clang4.0.1/release
/opt/boost/1.64.0-clang4.0.1/debug
1.65.0 GCC 7.2.0 boost/1.65.0-gcc7.2.0
boost/1.65.0-gcc7.2.0-debug
/opt/boost/1.65.0-gcc7.2.0/release
/opt/boost/1.65.0-gcc7.2.0/debug
1.65.0 Clang 5.0.0 boost/1.65.0-clang5.0.0
boost/1.65.0-clang5.0.0-debug
/opt/boost/1.65.0-clang5.0.0/release
/opt/boost/1.65.0-clang5.0.0/debug
1.65.1 GCC 7.2.0 boost/1.65.1-gcc7.2.0
boost/1.65.1-gcc7.2.0-debug
/opt/boost/1.65.1-gcc7.2.0/release
/opt/boost/1.65.1-gcc7.2.0/debug
1.65.1 Clang 5.0.0 boost/1.65.1-clang5.0.0
boost/1.65.1-clang5.0.0-debug
/opt/boost/1.65.1-clang5.0.0/release
/opt/boost/1.65.1-clang5.0.0/debug
1.66.0 GCC 7.3.0 boost/1.66.0-gcc7.3.0-release
boost/1.66.0-gcc7.3.0-debug
/opt/boost/1.66.0-gcc7.3.0/release
/opt/boost/1.66.0-gcc7.3.0/debug
1.66.0 Clang 5.0.1 boost/1.66.0-clang5.0.1-release
boost/1.66.0-clang5.0.1-debug
/opt/boost/1.66.0-clang5.0.1/release
/opt/boost/1.66.0-clang5.0.1/debug
1.67.0 GCC 8.1.0 boost/1.67.0-gcc8.1.0-release
boost/1.67.0-gcc8.1.0-debug
/opt/boost/1.67.0-gcc8.1.0/release
/opt/boost/1.67.0-gcc8.1.0/debug
1.67.0 Clang 6.0.0 boost/1.67.0-clang6.0.0-release
boost/1.67.0-clang6.0.0-debug
/opt/boost/1.67.0-clang6.0.0/release
/opt/boost/1.67.0-clang6.0.0/debug

MPI

  • No MPI module is loaded by default and MPI library module of your choice needs be loaded for MPI applications.
  • Following modules are available:
    • mpi/compat-openmpi16-x86_64
    • mpi/mpich-x86_64 - Loads the mpich-3.0
    • mpi/mpich-3.0-x86_64
    • mpi/mpich-3.2-x86_64
    • mpi/openmpi-x86_64 - verson 1.10.7
    • mpi/mvapich2-2.2-x86_64
    • impi/2017.2.174
    • impi/2017.3.196

CMake Modules

  • 2.8.12 - default: too old for HPX
  • cmake/3.9.0
  • cmake/3.10.2

hwloc

  • 1.11.2 - default
  • hwloc/1.11.3
  • hwloc/2.0.0

PAPI

  • 5.2.0 - default - has some problem after patches for Meltdown and Spectre.
  • papi/5.6.0

binutils

  • 2.27 - default
  • binutils/2.28
  • binutils/2.30

Note: The newer compiler require newer version on binutils, loading those compilers module will load binutils automatically.

tcmalloc

  • 2.6.1 - default
  • gperftools/2.7

Note: There is a problem with the system-provided tcmalloc, which causes an Attempt to free invalid pointer error in GCC 7 and 8. To be safe, load the newer version.

debugger

  • gdb:
    • 7.6.1 - default
    • gdb/8.1
  • lldb: since version 6.0.0 llvm/clang is compiled with lldb, you can use lldb by loading clang module
    • 3.4.2 - default
    • clang/6.0.0

Running HPX applications on compute nodes

Running HPX applications on Rostam can be done by using the srun command. In order to run a HPX application you can use the following command:

srun -p <partition> -N <number-of-nodes> <path-hpx-application> [-n <number-of-processes>] [-c <number-of-cores-per-process>]
  • <partition>: One of the partitions (see Introduction).
  • <number-of-nodes>: Number of compute nodes. Default is 1.
  • <number-of-processes>: Overall number of processes. The number of processes per node will be <number-of-processes>/<number-of-nodes>. Optional.
  • <number-of-cores-per-processor>: Number of cores SLURM will tell each process to use. Optional.

Note:

You can change the number of localities per node by specifying the -n option. We recommend you always to supply -n <number-of-instances>, even when the number of instances is 1. The number of cores per locality can be set with -c.

Example:

The following examples assume that you successfully built HPX on Hermione and the current working directory is the HPX install directory.

Running hello_world on one of Medusa nodes on all cores:

srun -p medusa -N 1 ./bin/hello_world

Running hello_world on two Marvin nodes on all cores:

srun -p marvin -N 2 ./bin/hello_world

Running hello_world on two Marvin nodes, using one locality per NUMA domain:

srun -p marvin -N 2 -n 4 -c 8 ./bin/hello_world

Running the MPI version of hello_world on four Marvin nodes, using two localities per NUMA domain:

salloc -p marvin -N 4 -n 8 -c 8 mpirun ./bin/hello_world

Interactive Sessions

To acquire an interactive session on one of the compute nodes you can issue the following command:

srun -p <partition> -N <number-of-nodes> --pty /bin/bash -l

After a job is acquired, you can run your HPX application. By default, it will use all available cores. Note that if you requested one node, you do not need to do srun again. However, if you requested more than one node and want to run your distributed application, use srun to start the HPX application, which will use the resources that have been requested for the interactive shell.

Note

Do not use the command above for MPI applications. If you need to test something use:

salloc -p <partition> -N <number-of-nodes> bash -l

This command does not create a session and commands will continue to run on the login node until srun or mpirun are issued. Therefore, consider using the following to run applications.

salloc <slurm-args> mpirun <application-path> # for MPI applications
salloc <slurm-args> srun <application-path>   # for non-MPI applications

Scheduling Batch Jobs

The method mentioned above of running HPX applications is fine for development purposes. However, srun's disadvantage is that its lifetime depends on the user's session. This setup may not be appropriate for long-running applications (for example, benchmarks or larger scale simulations). To cope with that limitation, you can use the sbatch command.

sbatch expects a script that it can run once the requested resources are available. To request resources, you need to add #SBATCH comments in your script or provide the necessary parameters to sbatch directly. The parameters are the same as with srun. The commands you can execute are the same command you can execute in interactive shells.

Example batch job:

We have a script named example.sbatch which runs the hello_world HPX application. It contains the following:

#!/usr/bin/env bash

#SBATCH -o hostname_%j.out
#SBATCH -t 0-00:02
#SBATCH -p marvin
#SBATCH -N 2

srun ~/demo_hpx/bin/hello_world

Submit the job:

sbatch example.sbatch
⚠️ **GitHub.com Fallback** ⚠️