OpenQxD with QUDA - lattice/quda GitHub Wiki

These instructions are intended to be a quick start guide to getting openQxD running with GPUs using the QUDA library.

These instructions assume you are using the recommended branches of QUDA and openQxD:

feature/openqxd, see https://github.com/lattice/quda, (TODO: merge into develop, see pull-reuqest)
feature/quda/main-thesis-release in case of openQxD, see https://gitlab.com/rcstar/openQxD-devel, (TODO: merge into master, see pull-request)

Obtaining and compiling QUDA

First clone QUDA into a subdirectory src/quda;

git clone -b feature/openqxd https://github.com/chaoos/quda.git src/quda

For compilation, several compile time flags have to be set to enable openQxD interface:

QUDA_INTERFACE_OPENQCD=ON   # enables openQxD interface
QUDA_INTERFACE_MILC=OFF
QUDA_INTERFACE_QDP=OFF
QUDA_INTERFACE_BQCD=OFF
QUDA_INTERFACE_CPS=OFF
QUDA_INTERFACE_QDPJIT=OFF
QUDA_INTERFACE_TIFR=OFF
QUDA_DOWNLOAD_USQCD=OFF
QUDA_QIO=OFF
QUDA_QMP=OFF
QUDA_MPI=ON                 # enable MPI

We want to use all precisions and reconstruction types:

QUDA_PRECISION=14
QUDA_RECONSTRUCT=7

As well as the Wilson- and Clover-Dirac operators:

QUDA_DIRAC_DEFAULT_OFF=ON   # disables ALL Dirac operators
QUDA_DIRAC_WILSON=ON        # enables Wilson-Dirac operators
QUDA_DIRAC_CLOVER=ON        # enables Wilson-clover operators

For the compilers, we choose different ones for difference target machines:

CMAKE_CXX_COMPILER: Either g++ version 11, 12, or Clang version 14
CMAKE_C_COMPILER: usually gcc version 11 or higher
MPI_CXX_SKIP_MPICXX=ON
CMAKE_CUDA_COMPILER: nvcc version 11 or 12 or higher
CUDAToolkit_BIN_DIR: Set to CUDA binary directory (for example /usr/local/cuda/bin)
CUDAToolkit_INCLUDE_DIR: Set to the CUDA include diractory (for example/usr/local/cuda/include)
CMAKE_CUDA_COMPILER_LAUNCHER=ccache if ccache is available
CMAKE_CXX_COMPILER_LAUNCHER=ccache if ccache is available

Finally, the architecture and the build type:

QUDA_GPU_ARCH: target architecture, see https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/, https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#gpu-feature-list, https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#virtual-architecture-feature-list for more information
QUDA_GPU_ARCH_SUFFIX: real or virtual, see links above
CMAKE_BUILD_TYPE=STRICT or DEVEL, RELEASE, STRICT, DEBUG, HOSTDEBUG, SANITIZE, see https://github.com/lattice/quda/wiki/QUDA-Build-With-CMake#reducing-qudas-build-time

For the remaining compiling options, we refer to Building QUDA using CMake

Obtaining and compiling openQxD

Clone openQxD into a subdirectory src/openqxd;

git clone -b feature/quda/main-thesis-release https://gitlab.com/rcstar/openQxD-devel.git src/openqxd

Set the required environment variables before compiling (see openQ*D code: a versatile tool for QCD+QED simulations)

export GCC=gcc
export CC=mpicc
export CXX=mpicxx
export MPI_HOME="/usr/lib/x86_64-linux-gnu/openmpi/" # for example
export MPI_INCLUDE="${MPI_HOME}/include"

In the Makefile of the utility you plan to build, make sure to enable QUDA offloading with (see openqxd:extras/main/lowrnk/Makefile as an example)

USE_QUDA ?= yes

or while compiling

make USE_QUDA=yes

This enables building the required modules in openQxD and linking to QUDA. Check if linking was done correctly with

$ env -i ldd <binary>
[...]
libquda.so => /path/to/libquda.so (0x00007f73af092000)
[...]

Choose the number of ranks as the number of GPUs in openqxd:include/global.h.

Running openQxD with QUDA

Running a compiled binary behaves the same as before.

On a local machine:

mpirun -np <N> <binary> ... # on regular linux

Or on a cluster like CSCS:

srun ...
sbatch ...

Misc

Profiler

Make sure to have nsys installed (e.g. yoshi.ethz.ch has it installed). Then run for example

mpirun -np 2 nsys profile -o profiler%q{OMPI_COMM_WORLD_RANK} ./check3 -i check.in

This will create two files profiler0.nsys-rep and profiler1.nsys-rep. Download them to your local laptop, and install Nsight Systems 2023. Note that you need to register at Nvidia in order to download the program. In order to obtain named regions, run

# obtain named regions
export NSYS_NVTX_PROFILER_REGISTER_ONLY=0
mpirun -np 2 nsys profile --sample=none --trace=cuda,nvtx,mpi -o profiler_nvtx%q{OMPI_COMM_WORLD_RANK} ./check3 -i check.in