Chroma with QUDA - lattice/quda GitHub Wiki
Chroma can use GPUs with QUDA in two different flavors.
- Chroma with QDP++ and QUDA offload for inversions
- Chroma with QDPJIT for full GPU offload and QUDA for inversions
Which version is optimal depends on the workload. For HMC workloads the QDPJIT version is usually preferable as otherwise the parts remaining on the CPU cause a significant slowdown.
Both version have a few dependencies required for building in common and differ in other parts. In any case we assume that you have
- MPI
- CMake (You can get a recent version from https://cmake.org/download/. The .tar.gz binary distribution unpacks in your home on linux and does not require you to build anything)
- CUDA
- (optional) Ninja build tool, you can get it from https://ninja-build.org
available on your system.
NOTE: if you build your own QMP, rather than have QUDA build it,
- You must use CMAKE to build QMP
- In your QUDA build script, you must specify this path for QMP:
cmake -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_HOME} \
< lots of QUDA options> \
-DQUDA_QMP=ON \
-DQMP_DIR=${INSTALLDIR}/qmp/lib/cmake/QMP \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR}/$install_name \
${SRCDIR}/quda
(WIP)
As by now all dependencies can be build using cmake the process has become a lot simpler.
We assume that you use a clean source directory and refer to it as ${SRCDIR}
#####
# SET UP ENVIRONMENT
# use Ninja to build, if Ninja is not available comment out the below line and
export CMAKE_GENERATOR=Ninja
# if not using Ninja please uncomment below line for a parallel build
#export CMAKE_MAKE_OPTS="-- -j$(nproc)"
export SM=sm_70 # Volta, use sm_80 for A100
# if you want to use use NVSHMEM
export QUDA_NVSHMEM=OFF # or ON
export QDPJIT_HOST_ARCH="X86;NVPTX"
### COMPILER FLAGS, modify to your need and don't use native if the build machine has a different CPU than the compute nodes
export ARCHFLAGS="-march=native"
export DEBUGFLAGS=" "
# define and create some directories, adapt as needed
export BASEDIR=$(pwd)
export SRCDIR=${BASEDIR}/src
export BUILDDIR=${BASEDIR}/build
export INSTALLDIR=${BASEDIR}/install
mkdir -p ${SRCDIR}
mkdir -p ${BUILDDIR}
To get the required sources runs
cd ${SRCDIR}
git clone --depth=1 --branch llvmorg-14.0.6 https://github.com/llvm/llvm-project.git
git clone --branch v2.9.14 https://github.com/GNOME/libxml2.git
git clone --branch qmp2-5-4 https://github.com/usqcd-software/qmp.git
git clone --recursive --branch devel https://github.com/JeffersonLab/qdp-jit.git # 88d2777
git clone --branch develop https://github.com/lattice/quda.git # c04150e
git clone --branch devel --recursive https://github.com/JeffersonLab/chroma.git # 52ee19f
cd ${BASEDIR}
The git tags (where applicable) have been tested in July 2022 and for Chroma and QDP-JIT we used the git sha's specified in the comment at the end of the line, as we used the current devel
branch of these packages and that is not always stable.
cmake -S ${SRCDIR}/llvm-project/llvm -B ${BUILDDIR}/build_llvm \
-DLLVM_ENABLE_TERMINFO="OFF" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
-DLLVM_TARGETS_TO_BUILD="${QDPJIT_HOST_ARCH}" \
-DLLVM_ENABLE_ZLIB="OFF" \
-DBUILD_SHARED_LIBS="OFF" \
-DLLVM_ENABLE_RTTI="ON" \
cmake --build ${BUILDDIR}/build_llvm ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_llvm
cmake -S ${SRCDIR}/qmp -B ${BUILDDIR}/build_qmp \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
-DQMP_MPI=ON \
-DBUILD_SHARED_LIBS=ON \
-DQMP_TESTING=OFF
cmake --build ${BUILDDIR}/build_qmp ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_qmp
cmake -S ${SRCDIR}/libxml2 -B ${BUILDDIR}/build_libxml2 \
-DCMAKE_BUILD_TYPE=RELEASE \
-DLIBXML2_WITH_PYTHON=OFF \
-DLIBXML2_WITH_LZMA=OFF \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR}
cmake --build ${BUILDDIR}/build_libxml2 ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_libxml2
cmake -S ${SRCDIR}/qdp-jit -B ${BUILDDIR}/build_qdp-jit \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
-DCMAKE_PREFIX_PATH=${INSTALLDIR} \
-DBUILD_SHARED_LIBS=ON \
-DQDP_ENABLE_BACKEND=CUDA \
-DQDP_ENABLE_COMM_SPLIT_DEVICEINIT=ON \
-DQDP_ENABLE_LLVM14=ON \
-DQDP_PROP_OPT=OFF \
-DCMAKE_CXX_FLAGS=${ARCHFLAGS}
cmake --build ${BUILDDIR}/build_qdp-jit ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_qdp-jit
cmake -S ${SRCDIR}/quda -B ${BUILDDIR}/build_quda \
-DCMAKE_BUILD_TYPE=RELEASE \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR} \
-DCMAKE_PREFIX_PATH=${INSTALLDIR} \
-DQUDA_GPU_ARCH=${SM} \
-DQUDA_NVSHMEM=${QUDA_NVSHMEM} \
-DQUDA_DIRAC_DEFAULT_OFF=ON \
-DQUDA_DIRAC_CLOVER=ON \
-DQUDA_DIRAC_WILSON=ON \
-DQUDA_INTERFACE_QDPJIT=ON \
-DQUDA_QDPJIT=ON \
-DQUDA_INTERFACE_MILC=OFF \
-DQUDA_INTERFACE_CPS=OFF \
-DQUDA_INTERFACE_QDP=ON \
-DQUDA_INTERFACE_TIFR=OFF \
-DQUDA_QMP=ON \
-DQUDA_QIO=OFF \
-DQUDA_MULTIGRID=ON \
-DQUDA_MAX_MULTI_BLAS_N=9 \
-DQUDA_BUILD_SHAREDLIB=ON \
-DQUDA_BUILD_ALL_TESTS=OFF \
-DCMAKE_CXX_FLAGS=${ARCHFLAGS}
cmake --build ${BUILDDIR}/build_quda ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_quda
cmake -S ${SRCDIR}/chroma -B ${BUILDDIR}/build_chroma \
-DCMAKE_BUILD_TYPE=RELEASE \
-DCMAKE_INSTALL_PREFIX=${INSTALLDIR}/ \
-DCMAKE_PREFIX_PATH=${INSTALLDIR}/ \
-DBUILD_SHARED_LIBS=ON \
-DChroma_ENABLE_JIT_CLOVER=ON \
-DChroma_ENABLE_QUDA=ON \
-DChroma_ENABLE_OPENMP=ON \
-DCMAKE_CXX_FLAGS=${ARCHFLAGS}
cmake --build ${BUILDDIR}/build_chroma ${CMAKE_MAKE_OPTS}
cmake --install ${BUILDDIR}/build_chroma
As this will build all libraries as shared libraries be sure to add ${INSTALLDIR}/lib
and ${INSTALLDIR}/lib64
to your LD_LIBRARY_PATH
. Chroma unfortunately does -- as of writing this -- not use cmake's rpath functionality.
WIP
With QMP-2.5.1 and above, users can control the logical topology, helping improve inter/intra node layout. In addition to the regular QMP args (-geom x y z t), one can now also pass two new args, for example:
chroma -geom/qmp-geom x y z t -qmp-logic-map 3 2 1 0 -qmp-alloc-map 3 2 1 0
Where the above invocation would result in the time dimension running fastest with the x dimension running slowest.
OLD stuff
No longer maintained or needed but left here for reference
To simplify the build process there is an experimental CMake script that automates getting the source and building Chroma for the QDPJIT as well as the QDPXX version.
NOTE that CMake 3.19+ is required Get it from https://cmake.org/download/. There is no need to build it yourself, you can just place unpack the tar.gz binary package in your home directory and make sure it is included in your PATH
.
To use it download the CMakeLists.txt and place it in a clean directory.
Then simply call
cmake -DCHROMABUILD_QDPJIT=ON -DCHROMABUILD_QUDA_GPU_ARCH=sm_70.
in that directory.
The two most important options as included above are:
-
CHROMABUILD_QDPJIT
whether to build the QDPJIT (ON
) or QDPXX (OFF
) version of Chroma -
CHROMABUILD_QUDA_GPU_ARCH
the gpu architecture you are building for (sm_60
,sm_70
orsm_80
for Pascal, Volta or Ampere, respectively.)
Note that you can also change these options using ccmake
later.
The CXX, CUDACXX, CC and MPI compilers and directories are selected by standard cmake logics. By default cmake will select whatever is in your PATH first. To specify non-default versions and flags please set the environment variables
-
CC
/CFLAGS
-
CXX
/CXXFLAGS
-
CUDACXX
before the initial cmake run.
Note that you can also change flags later using ccmake
and modifying the corresponding cmake variables.
For MPI please refer to the documentation of FindMPI
Once cmake finished you can build the selected Chroma version by running
make -j <N>
where N
should be roughly 1-1.5x the number of cores in your system. Note that building Chroma and all dependencies will take a significant amount of time, in particular for the QDPJIT version.
The build versions of Chroma can then be found in the directories
-
QDPJIT_sm_<xy>/bin
for the QDPJIT version and -
QDP_sm_<xy>/bin
for the QDPXX version.
In both cases the sm_<xy>
corresponds to the gpu architecture selected using CHROMABUILD_QUDA_GPU_ARCH
.
Installing the binaries in a different location is currently not supported and recommended.
NOTE You can build both, the QDPJIT and QDPXX versions, in the same directory. After completing the first build just run cmake
again selecting the other option or toggle CHROMABUILD_QDPJIT
using ccmake and build again.
TODO Include details of setting
-
<FermAct>
fromCLOVER/SEOPREC_CLOVER
-
<AsymmetricLinop>
true/false
as needed for whether we are doing asymmetric or symmetric preconditioning.