The Multi Splitting Preconditioned Conjugate Gradient (MSPCG), an application of the additive Schwarz Method - lattice/quda GitHub Wiki

The additive Schwarz method is a method where the full problem domain is splitted into several smaller ones and boundary conditions are applied at these sub-domains. It is found [arXiv:1811.08488] that this method, when used as a preconditioner and with the domains being the processors/GPUs and boundary condition being Dirichlet, speeds up the conjugate gradient (CG) algorithm when the inter-processor/inter-GPU communication bandwidth lags behind the local processor/GPU throughput.

The core of MSPCG is the implementation of The the local even-odd preconditioned MdagM operator for (contant, real) MDWF, with zero Dirichlet boundary condition applied at the boundary of each rank. There are 4 hopping terms in the operator so a faithful implementation needs to capture the snake terms that hop out of the boundary and hop back within the allowed 4 steps.

Compiling with MSPCG support

A minimal cmake command for compiling with MSPCG support is given by:

cmake ../quda \
  -DQUDA_GPU_ARCH=sm_xx \
  -DQUDA_DIRAC_DEFAULT_OFF=ON \
  -DQUDA_DIRAC_DOMAIN_WALL=ON \
  -DQUDA_PRECISION=6 \
  -DQUDA_RECONSTRUCT=4 \
  -DQUDA_ENABLE_MMA=ON \
  -DQUDA_MDW_FUSED_LS_LIST=8

This performs a single GPU compilation of only DWF fermions, instantiating only single and half precision (as is necessary to test the preconditioner), no reconstructs, and only Ls = 8. These routines use tensor cores and as such are only supported on NVIDIA GPUs.

Tests

Note that these tests have been written to work with the cmake command given above. In practice we recommend running with recon-12 for the outer reconstruct and recon-8 for the sloppy/preconditioned bits.

Standalone operator test

The operator can be tested with

tests/dslash_test \
  --matpc even-even \
  --dim 12 12 12 16 \
  --Lsdim 8 \
  --gridsize 1 1 1 1 \
  --dslash-type mobius \
  --b5 1.5 --c5 0.5 \
  --mass 0.01 \
  --prec half \
  --recon 18 \
  --test MatPCDagMatPCLocal

Standalone inverter test

The MSPCG inverter can be tested with

tests/invert_test \
  --matpc even-even \
  --dim 12 12 12 16 \
  --Lsdim 8 \
  --gridsize 1 1 1 1 \
  --dslash-type mobius \
  --b5 2.5 --c5 1.5 \
  --inv-type pcg \
  --precon-type cg \
  --precon-schwarz-type additive \
  --tol 1e-6 \
  --tol-precondition 1e-6 \
  --niter 860 \
  --maxiter-precondition 6 \
  --mass 0.01 \
  --prec single \
  --prec-sloppy half \
  --prec-precondition half 

Notes on the interface

As a reference for the interface, the solve is invoked when we solve using either PCG or GCR as the outer solver (with PCG being superior), using CG or MR as the inner solver (with CG being superior), as well as explicitly requesting additive schwarz:

inv_param.inv_type = QUDA_PCG_INVERTER;
inv_param.precon_type = QUDA_CG_INVERTER;
inv_param.schwarz_type = QUDA_ADDITIVE_SCHWARZ;

Currently only half or quarter precisions and the symmetric even-odd preconditioning is supported

gauge_param.cuda_prec_precondition = QUDA_HALF_PRECISION;
inv_param.cuda_prec_precondition = QUDA_HALF_PRECISION;
inv_param.matpc_type = QUDA_MATPC_EVEN_EVEN or QUDA_MATPC_ODD_ODD;

One needs to use a normal operator solve in order to use preconditioned CG:

inv_param.solution_type = QUDA_MATPCDAG_MATPC_SOLUTION;
inv_param.solve_type = QUDA_NORMOP_PC_SOLVE;