CPS with QUDA - lattice/quda GitHub Wiki

These instructions are intended to be a quick start guide to getting Columbia Physics System (CPS) running with GPUs using the QUDA library.

These instructions are based on the master branch of the CPS that is accessible from here.

Obtaining and compiling QUDA

You can obtain QUDA using

git clone --branch develop https://github.com/lattice/quda.git

QUDA uses cmake to set compilation options. To build QUDA with only domain wall fermions (i.e. plain domain wall, M&oumlbius),

mkdir build
cd build
cmake ../quda \
 -D CMAKE_BUILD_TYPE=RELEASE \
 -D QUDA_GPU_ARCH=sm_70 \
 -D QUDA_DIRAC_DEFAULT_OFF=ON \
 -D QUDA_DIRAC_DOMAIN_WALL=ON \
 -D QUDA_GPU_ARCH=sm_70 \
 -D QUDA_INTERFACE_CPS=ON \
 -D QUDA_QIO=ON \
 -D QUDA_QMP=ON \
 -D QUDA_DOWNLOAD_USQCD=ON

Above, we implicitly assume that the CUDA and MPI compilers are present in the $PATH. Here we are setting the the GPU architecture to sm_70 which corresponds to Volta. Choices include:

sm_35 for Kepler (Tesla K20 / K40 / K80)
sm_60 for Pascal (Tesla P100, Quadro GP100)
sm_70 for Volta (Tesla V100, Quadro V100)
sm_80 for Ampere (NVIDIA A100)

Here we are disabling unnecessary parts of QUDA when used with CPS, assuming one wants to run CPS with only domain wall/Mobius fermions, in order to reduce compilation time. The final three arguments concern the installation of the USQCD companion libraries QMP and QIO. QUDA can automate their download and installation, and that is what we have enabled here. You can optionally specify an install directory with -D CMAKE_INSTALL_PREFIX=[path], though for CPS bindings it's sufficient to just work from the build directory.

To build QUDA, you should use a parallel build as QUDA can take a long time to build,

make -j N

where N is the number of cores / threads that the compilation node has. We typically recommend setting this to the number of hardware threads (e.g., hyperthreads) in the system. If you have set an install path when running cmake (-DCMAKE_INSTALL_PREFIX=[path]), then to complete the installation run

make install

Finally note that when building with OpenMPI 4.x and above, due to the use of the deprecated MPI_Type_struct, QMP will fail to build unless OpenMPI has been configured with the MPI-1 compatibility option --enable-mpi1-compatibility. The solution is to either enable this option in the OpenMPI build or trivially edit the QMP source code to change the single occurrence of MPI_Type_struct to MPI_type_create_struct in usqcd/src/QMP/lib/mpi/QMP_mem_mpi.c. Fixing this issue in QMP is tracked here.

Getting CPS dependencies ready

CPS requires GMP, GSL and FFTW, also QIO and QMP. These are all commonly used libraries, and here we provides an example on how to obtain and compile GMP, GSL and FFTW. Note that in the following the directories have to be adjusted as appropriate.

# https://gmplib.org/download/gmp/gmp-6.2.0.tar.lz
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp https://gmplib.org/download/gmp/gmp-6.2.0.tar.xz && \
    mkdir -p /var/tmp && tar -x -f /var/tmp/gmp-6.2.0.tar.xz -C /var/tmp && \
    cd /var/tmp/gmp-6.2.0 &&   ./configure --prefix=/usr/local/gmp && \
    make -j$(nproc) && \
    make -j$(nproc) install && \
    rm -rf /var/tmp/gmp-6.2.0 /var/tmp/gmp-6.2.0.tar.xz

# ftp://ftp.gnu.org/gnu/gsl/gsl-2.6.tar.gz
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp ftp://ftp.gnu.org/gnu/gsl/gsl-2.6.tar.gz && \
    mkdir -p /var/tmp && tar -x -f /var/tmp/gsl-2.6.tar.gz -C /var/tmp -z && \
    cd /var/tmp/gsl-2.6 &&   ./configure --prefix=/usr/local/gsl && \
    make -j$(nproc) && \
    make -j$(nproc) install && \
    rm -rf /var/tmp/gsl-2.6 /var/tmp/gsl-2.6.tar.gz

# FFTW version 3.3.8
RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        file \
        make \
        wget && \
    rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp ftp://ftp.fftw.org/pub/fftw/fftw-3.3.8.tar.gz && \
    mkdir -p /var/tmp && tar -x -f /var/tmp/fftw-3.3.8.tar.gz -C /var/tmp -z && \
    cd /var/tmp/fftw-3.3.8 &&   ./configure --prefix=/usr/local/fftw --enable-openmp --enable-shared --enable-sse2 --enable-threads && \
    make -j$(nproc) && \
    make -j$(nproc) install && \
    rm -rf /var/tmp/fftw-3.3.8 /var/tmp/fftw-3.3.8.tar.gz
ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH

# FFTW version 3.3.8
RUN apt-get update -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        file \
        make \
        wget && \
    rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp ftp://ftp.fftw.org/pub/fftw/fftw-3.3.8.tar.gz && \
    mkdir -p /var/tmp && tar -x -f /var/tmp/fftw-3.3.8.tar.gz -C /var/tmp -z && \
    cd /var/tmp/fftw-3.3.8 &&   ./configure --prefix=/usr/local/fftw --enable-float && \
    make -j$(nproc) && \
    make -j$(nproc) install && \
    rm -rf /var/tmp/fftw-3.3.8 /var/tmp/fftw-3.3.8.tar.gz
ENV LD_LIBRARY_PATH=/usr/local/fftw/lib:$LD_LIBRARY_PATH

Obtaining and compiling CPS

For use with QUDA we recommend the present master branch of CPS. This enables the maximum benefit of QUDA acceleration.

git clone --branch master https://github.com/RBC-UKQCD/CPS.git

Then configure, note that the directories listed here should be adjusted as appropriate. Also note that we are using the QMP and QIO from QUDA.

mkdir build
cd build
CC=mpicc CXX=mpicxx CXXFLAG=-qoffload CXXFLAGS="-fopenmp -I/usr/local/fftw/include -I/usr/local/gsl/include" DFLAGS="-DQUDA_NEW_INTERFACE -DUSE_QUDA_SPLIT_GRID" FC=mpif90 LDFLAGS="-fopenmp -lz -L/usr/local/gsl/lib -L/usr/local/fftw/lib -lfftw3f -lfftw3" ../cps/cps_pp/configure --prefix=/usr/local/cps --build=powerpc64le-none-linux-gnu --enable-c11 --enable-c11-rng --enable-cuda=/usr/local/cuda --enable-gmp=/usr/local/gmp --enable-openmp --enable-qio=/usr/local/quda/usqcd --enable-qmp=/usr/local/quda/usqcd --enable-quda=/usr/local/quda --host=powerpc64le-none-linux-gnu --target=powerpc64le-none-linux-gnu

and then make.

Running CPS with QUDA

TODO