Building NEXUS on Theta@alcf - next-exp/nexus GitHub Wiki

This page is to document the build process for nexus on Theta @ ALCF, Argonne's supercomputer. Nexus was built and tested on the Theta login nodes (haswell CPUs from Intel), but the compute nodes are Intel KNL chips. Software for the login node should run on the compute nodes, but will be missing the AVX512 vectorization instructions. This leaves a lot of performance on the table, so to speak. More documentation on Theta can be found in the ALCF documentation pages.

Modules

Software on Theta that is installed system-wide is available via modules. This includes compilers, linkers, and many other computing packages that aren't relevant to nexus.

The default modules on Theta aren't well suited for compiling nexus. The following changes make compilation much easier:

# Unload these to make compilation work a little easier:
module unload trackdeps
module unload darshan/3.1.5
module unload xalt

# This uses the gnu compilers instead of intel:
module swap PrgEnv-intel/6.0.5 PrgEnv-gnu

Additionally, ROOT is installed system wide as a module and works for nexus:

# This uses the prebuilt cernroon package.
module load cernroot/6.14.04-cray-py36

All other modules may be left as their default.

Build strategy

Building for the login nodes is not particularly different on Theta. Instead, building for the compute nodes is more challenging. Here we document build steps on Theta for the login nodes, as well as the compute nodes. These software stacks are stored in different folders because they need to be separate. In particular, the compute-node build can not run on the login node, making debugging very difficult.

The standard build is located in /projects/Next/software/ with a corresponding setup script. The compute node build is located in /projects/Next/software_knl/ with a corresponding setup script.

Built software

The following software is built:

  • GATE @ v2_00_00
  • Geant4 @ 4.10.05.p01
  • GSL @ 2.6
  • hdf5 @ 1.10.6
  • scons @ 3.1.2
  • nexus

HDF5

HDF5 is built on the login command with the following commands:

CC=$(which gcc) CXX=$(which g++) ./configure
make
make install

On the compute node (executed from the MOM node, could be adapted to run directly with ssh enabled), the commands are:

CC=$(which cc) CXX=$(which CC) ./configure
aprun make -j 64
aprun make install -j 64

Note that the compiler wrapper injects the architecture target to be KNL and therefore ought to give AVX512 instructions. I have not finished this, however, since hdf5 is not a bottleneck.

GATE

Check out version 2_00_00 of GATE. Run make. It just compiles on the login node.

On the compute node, modify the CXX flags in the Makefile to include -O3 -march=knl and it will optimize and use AVX512 instructions. I have not build with AVX512, however, as profiling indicates GATE is not an issue.

GEANT

Geant compiles via cmake. On the login nodes, override the compiler wrappers and install via commands like this:

export PREFIX=/lus/theta-fs0/projects/Next/software/geant4/
cd $PREFIX
mkdir geant-build/
mkdir geant-install/
cd geant-build/
CC=$(which gcc) CXX=$(which g++) cmake -DCMAKE_INSTALL_PREFIX=${PREFIX}/geant4-install/ ../../geant4.10.05.p01/ 
make -j 24
make install 

On the compute nodes, we don't want to override the wrappers. so instead, use commands like this:

Note: this is untested!

export PREFIX=/lus/theta-fs0/projects/Next/software_knl/geant4/
export CRAYPE_LINK_TYPE=dynamic
cd $PREFIX
mkdir geant-build/
mkdir geant-install/
cd geant-build/
CC=$(which cc) CXX=$(which CC) cmake -DCMAKE_INSTALL_PREFIX=${PREFIX}/geant4-install/ ../../geant4.10.05.p01/ 
aprun make -j 64
aprun make install 

Note that the dynamic link type had to be set.

GSL

GSL uses a configure script similar to hdf5. It tries to install at the top level directories, so we set it to install into a different prefix instead:

./configure --prefix=/projects/Next/software/gsl-install/

Dependencies with Spack

Spack is a package manager developed for HPC. It claims to support well compilation for nodes like KNL, so I built the nexus software dependencies (particularly ROOT and GEANT but also scons, hdf5 and gsl) to see if this would improve performance.

I installed spack at /projects/Next/spack_build/spack. I set some packages to use system defaults rather than build from source; things like perl fail at cross compilation with spack and are only needed as a dependency for python. In fact, I use intel python which has optimizations for KNL. I also set the default compiler to gcc 8.3.0. This can be viewed in spack/etc/spack/packages.yaml.

I still unloaded the modules above and then setup spack as: source /projects/Next/spack_build/spack/share/spack/setup-env.sh

To handle the dependencies, I used a spack env:

spack env create nexus
spack env activate nexus
spack add root~math~opengl~rootfit~tbb~tmva~x hdf5~mpi geant4 ^xerces-c cxxstd=11
spack concretize

Next, I ran spack install and this proceeded for quite awhile.

Eventually, root did not install and I will have to open a ticket. This got quite close though.