CPS with QUDA - lattice/quda GitHub Wiki
These instructions are intended to be a quick start guide to getting Columbia Physics System (CPS) running with GPUs using the QUDA library.
These instructions are based on the master
branch of the CPS that is accessible from here.
Obtaining and compiling QUDA
You can obtain QUDA using
git clone --branch develop https://github.com/lattice/quda.git
QUDA uses cmake
to set compilation options. To build QUDA with only domain wall fermions (i.e. plain domain wall, Möbius),
mkdir build
cd build
cmake ../quda \
-D CMAKE_BUILD_TYPE=RELEASE \
-D QUDA_GPU_ARCH=sm_70 \
-D QUDA_DIRAC_DEFAULT_OFF=ON \
-D QUDA_DIRAC_DOMAIN_WALL=ON \
-D QUDA_GPU_ARCH=sm_70 \
-D QUDA_INTERFACE_CPS=ON \
-D QUDA_QIO=ON \
-D QUDA_QMP=ON \
-D QUDA_DOWNLOAD_USQCD=ON
Above, we implicitly assume that the CUDA and MPI compilers are present in the $PATH
. Here we are setting the the GPU architecture to sm_70
which corresponds to Volta. Choices include:
sm_35
for Kepler (Tesla K20 / K40 / K80)sm_60
for Pascal (Tesla P100, Quadro GP100)sm_70
for Volta (Tesla V100, Quadro V100)sm_80
for Ampere (NVIDIA A100)
Here we are disabling unnecessary parts of QUDA when used with CPS, assuming one wants to run CPS with only domain wall/Mobius fermions, in order to reduce compilation time. The final three arguments concern the installation of the USQCD companion libraries QMP and QIO. QUDA can automate their download and installation, and that is what we have enabled here. You can optionally specify an install directory with -D CMAKE_INSTALL_PREFIX=[path]
, though for CPS bindings it's sufficient to just work from the build directory.
To build QUDA, you should use a parallel build as QUDA can take a long time to build,
make -j N
where N
is the number of cores / threads that the compilation node has. We typically recommend setting this to the number of hardware threads (e.g., hyperthreads) in the system. If you have set an install path when running cmake (-DCMAKE_INSTALL_PREFIX=[path]
), then to complete the installation run
make install
Finally note that when building with OpenMPI 4.x and above, due to the use of the deprecated MPI_Type_struct
, QMP will fail to build unless OpenMPI has been configured with the MPI-1 compatibility option --enable-mpi1-compatibility
. The solution is to either enable this option in the OpenMPI build or trivially edit the QMP source code to change the single occurrence of MPI_Type_struct
to MPI_type_create_struct
in usqcd/src/QMP/lib/mpi/QMP_mem_mpi.c
. Fixing this issue in QMP is tracked here.
Getting CPS dependencies ready
CPS requires GMP
, GSL
and FFTW
, also QIO
and QMP
. These are all commonly used libraries, and here we provides an example on how to obtain and compile GMP
, GSL
and FFTW
. Note that in the following the directories have to be adjusted as appropriate.
# https://gmplib.org/download/gmp/gmp-6.2.0.tar.lz
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp https://gmplib.org/download/gmp/gmp-6.2.0.tar.xz && \
mkdir -p /var/tmp && tar -x -f /var/tmp/gmp-6.2.0.tar.xz -C /var/tmp && \
cd /var/tmp/gmp-6.2.0 && ./configure --prefix=/usr/local/gmp && \
make -j$(nproc) && \
make -j$(nproc) install && \
rm -rf /var/tmp/gmp-6.2.0 /var/tmp/gmp-6.2.0.tar.xz
# ftp://ftp.gnu.org/gnu/gsl/gsl-2.6.tar.gz
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp ftp://ftp.gnu.org/gnu/gsl/gsl-2.6.tar.gz && \
mkdir -p /var/tmp && tar -x -f /var/tmp/gsl-2.6.tar.gz -C /var/tmp -z && \
cd /var/tmp/gsl-2.6 && ./configure --prefix=/usr/local/gsl && \
make -j$(nproc) && \
make -j$(nproc) install && \
rm -rf /var/tmp/gsl-2.6 /var/tmp/gsl-2.6.tar.gz
# FFTW version 3.3.8
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
file \
make \
wget && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp ftp://ftp.fftw.org/pub/fftw/fftw-3.3.8.tar.gz && \
mkdir -p /var/tmp && tar -x -f /var/tmp/fftw-3.3.8.tar.gz -C /var/tmp -z && \
cd /var/tmp/fftw-3.3.8 && ./configure --prefix=/usr/local/fftw --enable-openmp --enable-shared --enable-sse2 --enable-threads && \
make -j$(nproc) && \
make -j$(nproc) install && \
rm -rf /var/tmp/fftw-3.3.8 /var/tmp/fftw-3.3.8.tar.gz
ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH
# FFTW version 3.3.8
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
file \
make \
wget && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp ftp://ftp.fftw.org/pub/fftw/fftw-3.3.8.tar.gz && \
mkdir -p /var/tmp && tar -x -f /var/tmp/fftw-3.3.8.tar.gz -C /var/tmp -z && \
cd /var/tmp/fftw-3.3.8 && ./configure --prefix=/usr/local/fftw --enable-float && \
make -j$(nproc) && \
make -j$(nproc) install && \
rm -rf /var/tmp/fftw-3.3.8 /var/tmp/fftw-3.3.8.tar.gz
ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH
Obtaining and compiling CPS
For use with QUDA we recommend the present master
branch of CPS. This enables the maximum benefit of QUDA acceleration.
git clone --branch master https://github.com/RBC-UKQCD/CPS.git
Then configure, note that the directories listed here should be adjusted as appropriate. Also note that we are using the QMP
and QIO
from QUDA.
mkdir build
cd build
CC=mpicc CXX=mpicxx CXXFLAG=-qoffload CXXFLAGS="-fopenmp -I/usr/local/fftw/include -I/usr/local/gsl/include" DFLAGS="-DQUDA_NEW_INTERFACE -DUSE_QUDA_SPLIT_GRID" FC=mpif90 LDFLAGS="-fopenmp -lz -L/usr/local/gsl/lib -L/usr/local/fftw/lib -lfftw3f -lfftw3" ../cps/cps_pp/configure --prefix=/usr/local/cps --build=powerpc64le-none-linux-gnu --enable-c11 --enable-c11-rng --enable-cuda=/usr/local/cuda --enable-gmp=/usr/local/gmp --enable-openmp --enable-qio=/usr/local/quda/usqcd --enable-qmp=/usr/local/quda/usqcd --enable-quda=/usr/local/quda --host=powerpc64le-none-linux-gnu --target=powerpc64le-none-linux-gnu
and then make
.
Running CPS with QUDA
TODO