Chroma with QUDA - lattice/quda GitHub Wiki

Chroma with QUDA support

Chroma can use GPUs with QUDA in two different flavors.

  1. Chroma with QDP++ and QUDA offload for inversions
  2. Chroma with QDPJIT for full GPU offload and QUDA for inversions

Which version is optimal depends on the workload. For HMC workloads the QDPJIT version is usually preferable as otherwise the parts remaining on the CPU cause a significant slowdown.

Both version have a few dependencies required for building in common and differ in other parts. In any case we assume that you have

  • MPI
  • CMake (You can get a recent version from https://cmake.org/download/. The .tar.gz binary distribution unpacks in your home on linux and does not require you to build anything)
  • CUDA
  • (optional) Ninja build tool, you can get it from https://ninja-build.org

available on your system.

NOTE: if you build your own QMP, rather than have QUDA build it,

  1. You must use CMAKE to build QMP
  2. In your QUDA build script, you must specify this path for QMP:
cmake -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_HOME} \
    < lots of QUDA options> \
    -DQUDA_QMP=ON \
    -DQMP_DIR=${INSTALLDIR}/qmp/lib/cmake/QMP \
    -DCMAKE_INSTALL_PREFIX=${INSTALLDIR}/$install_name \
    ${SRCDIR}/quda

Building Chroma with QUDA and QDP++

(WIP)

Building Chroma with QUDA and QDPJIT

As by now all dependencies can be build using cmake the process has become a lot simpler. We assume that you use a clean source directory and refer to it as ${SRCDIR}

#####
# SET UP ENVIRONMENT

# use Ninja to build, if Ninja is not available comment out the below line and
export CMAKE_GENERATOR=Ninja
# if not using Ninja please uncomment below line for a parallel build
#export CMAKE_MAKE_OPTS="-- -j$(nproc)"

export SM=sm_70 # Volta, use sm_80 for A100
# if you want to use use NVSHMEM
export QUDA_NVSHMEM=OFF # or ON
export QDPJIT_HOST_ARCH="X86;NVPTX"

### COMPILER FLAGS, modify to your need and don't use native if the build machine has a different CPU than the compute nodes
export ARCHFLAGS="-march=native"
export DEBUGFLAGS=" "

# define and create some directories, adapt as needed
export BASEDIR=$(pwd)
export SRCDIR=${BASEDIR}/src
export BUILDDIR=${BASEDIR}/build
export INSTALLDIR=${BASEDIR}/install

mkdir -p ${SRCDIR}
mkdir -p ${BUILDDIR}

To get the required sources runs

cd ${SRCDIR}
git clone --depth=1 --branch  llvmorg-14.0.6 https://github.com/llvm/llvm-project.git
git clone --branch v2.9.14 https://github.com/GNOME/libxml2.git
git clone --branch qmp2-5-4 https://github.com/usqcd-software/qmp.git
git clone --recursive --branch devel https://github.com/JeffersonLab/qdp-jit.git # 88d2777
git clone --branch develop https://github.com/lattice/quda.git # c04150e
git clone --branch devel --recursive https://github.com/JeffersonLab/chroma.git # 52ee19f
cd ${BASEDIR}

The git tags (where applicable) have been tested in July 2022 and for Chroma and QDP-JIT we used the git sha's specified in the comment at the end of the line, as we used the current devel branch of these packages and that is not always stable.

LLVM

cmake -S ${SRCDIR}/llvm-project/llvm -B ${BUILDDIR}/build_llvm \
  -DLLVM_ENABLE_TERMINFO="OFF" \
  -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
  -DLLVM_TARGETS_TO_BUILD="${QDPJIT_HOST_ARCH}" \
  -DLLVM_ENABLE_ZLIB="OFF" \
  -DBUILD_SHARED_LIBS="OFF" \
  -DLLVM_ENABLE_RTTI="ON" \

cmake --build ${BUILDDIR}/build_llvm ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_llvm  

QMP

cmake -S ${SRCDIR}/qmp -B ${BUILDDIR}/build_qmp \
  -DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
  -DQMP_MPI=ON \
  -DBUILD_SHARED_LIBS=ON \
  -DQMP_TESTING=OFF

cmake --build ${BUILDDIR}/build_qmp ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_qmp

LIBXML2

cmake -S ${SRCDIR}/libxml2 -B ${BUILDDIR}/build_libxml2 \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DLIBXML2_WITH_PYTHON=OFF \
  -DLIBXML2_WITH_LZMA=OFF \
  -DCMAKE_INSTALL_PREFIX=${INSTALLDIR}

cmake --build ${BUILDDIR}/build_libxml2 ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_libxml2

QDP-JIT

cmake -S ${SRCDIR}/qdp-jit -B ${BUILDDIR}/build_qdp-jit \
  -DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
  -DCMAKE_PREFIX_PATH=${INSTALLDIR} \
  -DBUILD_SHARED_LIBS=ON \
  -DQDP_ENABLE_BACKEND=CUDA \
  -DQDP_ENABLE_COMM_SPLIT_DEVICEINIT=ON \
  -DQDP_ENABLE_LLVM14=ON \
  -DQDP_PROP_OPT=OFF \
  -DCMAKE_CXX_FLAGS=${ARCHFLAGS}

cmake --build ${BUILDDIR}/build_qdp-jit ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_qdp-jit

QUDA

cmake -S ${SRCDIR}/quda -B ${BUILDDIR}/build_quda \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
  -DCMAKE_PREFIX_PATH=${INSTALLDIR} \
  -DQUDA_GPU_ARCH=${SM} \
  -DQUDA_NVSHMEM=${QUDA_NVSHMEM} \
  -DQUDA_DIRAC_DEFAULT_OFF=ON \
  -DQUDA_DIRAC_CLOVER=ON \
  -DQUDA_DIRAC_WILSON=ON \
  -DQUDA_INTERFACE_QDPJIT=ON \
  -DQUDA_QDPJIT=ON \
  -DQUDA_INTERFACE_MILC=OFF \
  -DQUDA_INTERFACE_CPS=OFF \
  -DQUDA_INTERFACE_QDP=ON \
  -DQUDA_INTERFACE_TIFR=OFF \
  -DQUDA_QMP=ON \
  -DQUDA_QIO=OFF \
  -DQUDA_MULTIGRID=ON \
  -DQUDA_MAX_MULTI_BLAS_N=9 \
  -DQUDA_BUILD_SHAREDLIB=ON \
  -DQUDA_BUILD_ALL_TESTS=OFF \
  -DCMAKE_CXX_FLAGS=${ARCHFLAGS}

cmake --build ${BUILDDIR}/build_quda ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_quda

CHROMA

cmake -S ${SRCDIR}/chroma -B ${BUILDDIR}/build_chroma \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DCMAKE_INSTALL_PREFIX=${INSTALLDIR}/ \
  -DCMAKE_PREFIX_PATH=${INSTALLDIR}/ \
  -DBUILD_SHARED_LIBS=ON \
  -DChroma_ENABLE_JIT_CLOVER=ON \
  -DChroma_ENABLE_QUDA=ON \
  -DChroma_ENABLE_OPENMP=ON \
  -DCMAKE_CXX_FLAGS=${ARCHFLAGS}

cmake --build ${BUILDDIR}/build_chroma ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_chroma

As this will build all libraries as shared libraries be sure to add ${INSTALLDIR}/lib and ${INSTALLDIR}/lib64 to your LD_LIBRARY_PATH. Chroma unfortunately does -- as of writing this -- not use cmake's rpath functionality.

Running Chroma with QUDA / QDPJIT

WIP

Using QMP

With QMP-2.5.1 and above, users can control the logical topology, helping improve inter/intra node layout. In addition to the regular QMP args (-geom x y z t), one can now also pass two new args, for example:

chroma -geom/qmp-geom x y z t -qmp-logic-map 3 2 1 0 -qmp-alloc-map 3 2 1 0

Where the above invocation would result in the time dimension running fastest with the x dimension running slowest.


OLD stuff


No longer maintained or needed but left here for reference

Experimental automated build with cmake

To simplify the build process there is an experimental CMake script that automates getting the source and building Chroma for the QDPJIT as well as the QDPXX version.

NOTE that CMake 3.19+ is required Get it from https://cmake.org/download/. There is no need to build it yourself, you can just place unpack the tar.gz binary package in your home directory and make sure it is included in your PATH.

To use it download the CMakeLists.txt and place it in a clean directory.

Then simply call

cmake -DCHROMABUILD_QDPJIT=ON -DCHROMABUILD_QUDA_GPU_ARCH=sm_70.

in that directory.

The two most important options as included above are:

  • CHROMABUILD_QDPJIT whether to build the QDPJIT (ON) or QDPXX (OFF) version of Chroma
  • CHROMABUILD_QUDA_GPU_ARCH the gpu architecture you are building for (sm_60, sm_70 or sm_80 for Pascal, Volta or Ampere, respectively.)

Note that you can also change these options using ccmake later.

The CXX, CUDACXX, CC and MPI compilers and directories are selected by standard cmake logics. By default cmake will select whatever is in your PATH first. To specify non-default versions and flags please set the environment variables

  • CC / CFLAGS
  • CXX / CXXFLAGS
  • CUDACXX before the initial cmake run.

Note that you can also change flags later using ccmake and modifying the corresponding cmake variables.

For MPI please refer to the documentation of FindMPI

Once cmake finished you can build the selected Chroma version by running

make -j <N>

where N should be roughly 1-1.5x the number of cores in your system. Note that building Chroma and all dependencies will take a significant amount of time, in particular for the QDPJIT version.

The build versions of Chroma can then be found in the directories

  • QDPJIT_sm_<xy>/bin for the QDPJIT version and
  • QDP_sm_<xy>/bin for the QDPXX version.

In both cases the sm_<xy> corresponds to the gpu architecture selected using CHROMABUILD_QUDA_GPU_ARCH. Installing the binaries in a different location is currently not supported and recommended.

NOTE You can build both, the QDPJIT and QDPXX versions, in the same directory. After completing the first build just run cmake again selecting the other option or toggle CHROMABUILD_QDPJIT using ccmake and build again.

TODO Include details of setting

  • <FermAct> from CLOVER/SEOPREC_CLOVER
  • <AsymmetricLinop> true/false

as needed for whether we are doing asymmetric or symmetric preconditioning.

⚠️ **GitHub.com Fallback** ⚠️