Running HPX on Rostam - STEllAR-GROUP/hpx GitHub Wiki
Rostam is a heterogeneous cluster with a wide variety of machine architectures and has 26 computing nodes. Rostam uses SLURM for job scheduling. Rostam's compute nodes are grouped into the following partitions:
- sc13: It consists of two nodes: (Carson and Reno). Each has two 10-core Intel Xeon E5-2670 Ivy Bridge CPUs with 128 GB of RAM each. Reno has an NVIDIA K40m and an AMD Radeon R9 Nano Fury GPU. Carson has two AMD Radeon R9 Nano Fury GPUs. Hyper-threading is on for Reno and off for Carson.
- k40: Contains Reno.
- cuda: Nodes with Nvidia GPUs (Diablo, Bahram, Geev, Reno, Tycho)
- QxV100: Diablo, A node with two 20-core Intel Xeon Gold 6148 Skylake CPUs, 384 GB of memory, and four NVIDIA Tesla V100 SXM2s. Hyper-threading is off on this node.
- k80: Bahram, a node with two Intel Xeon E5-2660 Haswell CPUs (20 cores in total) with 128 GB of memory, and four NVIDIA K80s. Hyper-threading is off on this node.
- v100: Geev, a node with two 10-core Intel Xeon E5-2660 Haswell CPUs with 128 GB of memory and two NVIDIA V100s. Except for the V100s, this node is identical to Bahram.
- sc15: It consists of two nodes: Bahram and Geev.
- marvin: It has 15 nodes, each with two 8-core Intel Xeon E5-2450 Sandy Bridge CPUs and 48 GB of memory.
- medusa: It has 16 nodes, each with two 20-core Intel Xeon Gold 6148 Skylake CPUs and 96 GB of memory.
Note
The list of available partitions can be queried on Rostam using
sinfo
.
Rostam comes with a set of pre-installed software that can be used with HPX. Different compilers and libraries are available on Rostam by using the modules system to configure the environment.
Usage
-
module avail
- List all available modules -
module load gcc/8.3.0
- Loads GCC 8.3.0 compiler in this terminal -
module unload gcc/8.3.0
- Unload GCC 8.3.0 compiler in this terminal -
module list
- Lists all loaded modules in this session -
module purge
- Unload all loaded modules in this session
Rostam has multiple versions of some software installed (e.g., GCC, Boost). When a -debug
extension is present at the end of a module name, it indicates that the package is a debug version of the software (was compiled with debug symbols). Most software has been compiled with GCC 4.8.5, Red Hat Enterprise Linux 7's default compiler. When a different compiler was used, the compiler name and version are included in the package name. If the software comes by Red Hat but is not the default version of the software, the letter s
appears at the end of the module name.
Red Hat Enterprise Linux comes with many pre-compiled libraries that are installed in /usr
. This software can be used directly, without the users needing to load modules or manually alter the environment variables. These default packages are compiled with GCC 4.8, and they only support the old C++ ABI.
- GCC
-
gcc/4.8.5s
- The default compiler, comes with Red Hat Enterprise Linux 7 itself -
gcc/4.9.2s
- Provided by Red Hat, uses GCC 4 compatible ABI gcc/4.9.3
gcc/4.9.4
-
gcc/5.3.1s
- Provided by Red Hat, uses GCC 4 compatible ABI gcc/5.4.0
gcc/5.5.0
gcc/6.1.0
gcc/6.2.0
gcc/6.3.0
-
gcc/6.3.1s
- Provided by Red Hat, uses GCC 4 compatible ABI gcc/6.4.0
gcc/6.5.0
gcc/7.1.0
gcc/7.2.0
gcc/7.3.0
-
gcc/7.3.1s
- Provided by Red Hat, uses GCC 4 compatible ABI gcc/7.4.0
gcc/8.1.0
gcc/8.2.0
gcc/8.3.0
-
gcc/8.3.1s
- Provided by Red Hat, uses GCC 4 compatible ABI gcc/9.1.0
gcc/10.0.1
-
- Clang
clang/3.7.1
clang/3.8.1
clang/3.9.0
clang/3.9.1
clang/4.0.0
clang/4.0.1
clang/5.0.0
clang/5.0.1
clang/6.0.0
clang/6.0.1
clang/7.0.0
clang/7.0.1
clang/8.0.0
Boost versions 1.58.0 and upward are installed in /opt/boost
. The newer versions are compiled with several compilers. Boost v 1.53 is the pre-compiled default package; this is what you get when loading no modules.
Load CMake 3.14.2, GCC 8.3.0 and Boost 1.70.0 compiled in Debug mode:
module load cmake/3.14.2 gcc/8.3.0 boost/1.70.0-gcc8.3.0-release
Run CMake:
cmake /path/to/source/tree
Installed Boost versions:
Version | Compiler | Module | Path |
---|---|---|---|
1.58.0 | GCC 4.8.5 |
boost/1.58.0 boost/1.58.0-debug
|
/opt/boost/1.58.0/release /opt/boost/1.58.0/debug
|
1.59.0 | GCC 4.8.5 |
boost/1.59.0 boost/1.59.0-debug
|
/opt/boost/1.59.0/release /opt/boost/1.59.0/debug
|
1.60.0 | GCC 4.8.5 |
boost/1.60.0 boost/1.60.0-debug
|
/opt/boost/1.60.0/release /opt/boost/1.60.0/debug
|
1.60.0 | Clang 3.8.1 |
boost/1.60.0-clang3.8.1 boost/1.60.0-clang3.8.1-debug
|
/opt/boost/1.60.0-clang3.8.1/release /opt/boost/1.60.0-clang3.8.1/debug
|
1.61.0 | GCC 5.4.0 |
boost/1.61.0-gcc5.4.0 boost/1.61.0-gcc5.4.0-debug
|
/opt/boost/1.61.0-gcc5.4.0/release /opt/boost/1.61.0-gcc5.4.0/debug
|
1.61.0 | Clang 3.9.1 |
boost/1.61.0-clang3.9.1 boost/1.61.0-clang3.9.1-debug
|
/opt/boost/1.61.0-clang3.9.1/release /opt/boost/1.61.0-clang3.9.1/debug
|
1.62.0 | GCC 5.4.0 |
boost/1.62.0-gcc5.4.0 boost/1.62.0-gcc5.4.0-debug
|
/opt/boost/1.62.0-gcc5.4.0/release /opt/boost/1.62.0-gcc5.4.0/debug
|
1.62.0 | Clang 3.9.0 |
boost/1.62.0-clang3.9.0 boost/1.62.0-clang3.9.0-debug
|
/opt/boost/1.62.0-clang3.9.0/release /opt/boost/1.62.0-clang3.9.0/debug
|
1.63.0 | GCC 6.4.0 |
boost/1.63.0-gcc6.4.0 boost/1.63.0-gcc6.4.0-debug
|
/opt/boost/1.63.0-gcc6.4.0/release /opt/boost/1.63.0-gcc6.4.0/debug
|
1.63.0 | Clang 4.0.1 |
boost/1.63.0-clang4.0.1 boost/1.63.0-clang4.0.1-debug
|
/opt/boost/1.63.0-clang4.0.1/release /opt/boost/1.63.0-clang4.0.1/debug
|
1.64.0 | GCC 6.4.0 |
boost/1.64.0-gcc6.4.0 boost/1.64.0-gcc6.4.0-debug
|
/opt/boost/1.64.0-gcc6.4.0/release /opt/boost/1.64.0-gcc6.4.0/debug
|
1.64.0 | Clang 4.0.1 |
boost/1.64.0-clang4.0.1 boost/1.64.0-clang4.0.1-debug
|
/opt/boost/1.64.0-clang4.0.1/release /opt/boost/1.64.0-clang4.0.1/debug
|
1.65.0 | GCC 7.2.0 |
boost/1.65.0-gcc7.2.0 boost/1.65.0-gcc7.2.0-debug
|
/opt/boost/1.65.0-gcc7.2.0/release /opt/boost/1.65.0-gcc7.2.0/debug
|
1.65.0 | Clang 5.0.0 |
boost/1.65.0-clang5.0.0 boost/1.65.0-clang5.0.0-debug
|
/opt/boost/1.65.0-clang5.0.0/release /opt/boost/1.65.0-clang5.0.0/debug
|
1.65.1 | GCC 7.2.0 |
boost/1.65.1-gcc7.2.0 boost/1.65.1-gcc7.2.0-debug
|
/opt/boost/1.65.1-gcc7.2.0/release /opt/boost/1.65.1-gcc7.2.0/debug
|
1.65.1 | Clang 5.0.0 |
boost/1.65.1-clang5.0.0 boost/1.65.1-clang5.0.0-debug
|
/opt/boost/1.65.1-clang5.0.0/release /opt/boost/1.65.1-clang5.0.0/debug
|
1.66.0 | GCC 7.3.0 |
boost/1.66.0-gcc7.3.0-release boost/1.66.0-gcc7.3.0-debug
|
/opt/boost/1.66.0-gcc7.3.0/release /opt/boost/1.66.0-gcc7.3.0/debug
|
1.66.0 | Clang 5.0.1 |
boost/1.66.0-clang5.0.1-release boost/1.66.0-clang5.0.1-debug
|
/opt/boost/1.66.0-clang5.0.1/release /opt/boost/1.66.0-clang5.0.1/debug
|
1.67.0 | GCC 8.1.0 |
boost/1.67.0-gcc8.1.0-release boost/1.67.0-gcc8.1.0-debug
|
/opt/boost/1.67.0-gcc8.1.0/release /opt/boost/1.67.0-gcc8.1.0/debug
|
1.67.0 | Clang 6.0.0 |
boost/1.67.0-clang6.0.0-release boost/1.67.0-clang6.0.0-debug
|
/opt/boost/1.67.0-clang6.0.0/release /opt/boost/1.67.0-clang6.0.0/debug
|
- No MPI module is loaded by default and MPI library module of your choice needs be loaded for MPI applications.
- Following modules are available:
mpi/compat-openmpi16-x86_64
-
mpi/mpich-x86_64
- Loads thempich-3.0
mpi/mpich-3.0-x86_64
mpi/mpich-3.2-x86_64
-
mpi/openmpi-x86_64
- verson 1.10.7 mpi/mvapich2-2.2-x86_64
impi/2017.2.174
impi/2017.3.196
-
2.8.12
- default: too old for HPX cmake/3.9.0
cmake/3.10.2
-
1.11.2
- default hwloc/1.11.3
hwloc/2.0.0
-
5.2.0
- default - has some problem after patches for Meltdown and Spectre. papi/5.6.0
-
2.27
- default binutils/2.28
binutils/2.30
Note: The newer compiler require newer version on binutils, loading those compilers module will load binutils automatically.
-
2.6.1
- default gperftools/2.7
Note: There is a problem with the system-provided tcmalloc, which causes an
Attempt to free invalid pointer
error in GCC 7 and 8. To be safe, load the newer version.
- gdb:
-
7.6.1
- default gdb/8.1
-
- lldb: since version 6.0.0 llvm/clang is compiled with lldb, you can use lldb by loading clang module
-
3.4.2
- default clang/6.0.0
-
Running HPX applications on Rostam can be done by using the srun command. In order to run a HPX application you can use the following command:
srun -p <partition> -N <number-of-nodes> <path-hpx-application> [-n <number-of-processes>] [-c <number-of-cores-per-process>]
-
<partition>
: One of the partitions (see Introduction). -
<number-of-nodes>
: Number of compute nodes. Default is 1. -
<number-of-processes>
: Overall number of processes. The number of processes per node will be<number-of-processes>/<number-of-nodes>
. Optional. -
<number-of-cores-per-processor>
: Number of cores SLURM will tell each process to use. Optional.
Note:
You can change the number of localities per node by specifying the
-n
option. We recommend you always to supply-n <number-of-instances>
, even when the number of instances is1
. The number of cores per locality can be set with-c
.
The following examples assume that you successfully built HPX on Hermione and the current working directory is the HPX install directory.
Running hello_world
on one of Medusa nodes on all cores:
srun -p medusa -N 1 ./bin/hello_world
Running hello_world
on two Marvin nodes on all cores:
srun -p marvin -N 2 ./bin/hello_world
Running hello_world
on two Marvin nodes, using one locality per NUMA domain:
srun -p marvin -N 2 -n 4 -c 8 ./bin/hello_world
Running the MPI version of hello_world
on four Marvin nodes, using two localities per NUMA domain:
salloc -p marvin -N 4 -n 8 -c 8 mpirun ./bin/hello_world
To acquire an interactive session on one of the compute nodes you can issue the following command:
srun -p <partition> -N <number-of-nodes> --pty /bin/bash -l
After a job is acquired, you can run your HPX application. By default, it will use all available cores. Note that if you requested one node, you do not need to do srun
again. However, if you requested more than one node and want to run your distributed application, use srun
to start the HPX application, which will use the resources that have been requested for the interactive shell.
Note
Do not use the command above for MPI applications. If you need to test something use:
salloc -p <partition> -N <number-of-nodes> bash -lThis command does not create a session and commands will continue to run on the login node until
srun
ormpirun
are issued. Therefore, consider using the following to run applications.salloc <slurm-args> mpirun <application-path> # for MPI applications salloc <slurm-args> srun <application-path> # for non-MPI applications
The method mentioned above of running HPX applications is fine for development purposes. However, srun
's disadvantage is that its lifetime depends on the user's session. This setup may not be appropriate for long-running applications (for example, benchmarks or larger scale simulations). To cope with that limitation, you can use the sbatch command.
sbatch
expects a script that it can run once the requested resources are available. To request resources, you need to add #SBATCH
comments in your script or provide the necessary parameters to sbatch
directly. The parameters are the same as with srun
. The commands you can execute are the same command you can execute in interactive shells.
We have a script named example.sbatch
which runs the hello_world
HPX application. It contains the following:
#!/usr/bin/env bash
#SBATCH -o hostname_%j.out
#SBATCH -t 0-00:02
#SBATCH -p marvin
#SBATCH -N 2
srun ~/demo_hpx/bin/hello_world
Submit the job:
sbatch example.sbatch