HISQ MG for Measurements - lattice/quda GitHub Wiki

This page is a WIP.

At the time of writing, HISQ MG can be used most effectively through MILC in ks_spectrum_hisq by using the following branches of MILC and QUDA:

QUDA: develop
MILC: develop

As a top-level remark, HISQ MG tends to break down (i.e., not converge) for beta < 6.5; finer is better. Depending on tuning HISQ MG tends to break even relative to double-half CG around the strange quark mass, with emphasis on "tends to."

Build process

Building QUDA with HISQ MG

Building QUDA with HISQ MG support requires only a small modification from the standard build process. The current process is as follows:

QUDA_SRC=$DESIRED_QUDA_SOURCE_LOCATION
QUDA_BUILD=$DESIRED_QUDA_BUILD_LOCATION
USQCD_FLAGS="-DQUDA_DOWNLOAD_USQCD=ON"

pushd .
if [ ! -d $QUDA_SRC ]
then
  git clone --branch develop https://github.com/lattice/quda $QUDA_SRC # clone QUDA to the desired directory
else
  cd $QUDA_SRC; git pull; cd ..
fi
if [ ! -d $QUDA_BUILD ]; then mkdir $QUDA_BUILD; fi
cd $QUDA_BUILD
cmake -DCMAKE_BUILD_TYPE=RELEASE -DQUDA_DIRAC_DEFAULT_OFF=ON -DQUDA_DIRAC_STAGGERED=ON -DQUDA_GPU_ARCH=sm_80 \
  $USQCD_FLAGS -DQUDA_QIO=ON -DQUDA_QMP=ON \
  -DQUDA_MULTIGRID=ON -DQUDA_MULTIGRID_NVEC_LIST="24,64,96" -DQUDA_MULTIGRID_MRHS_LIST="8,16,32" $QUDA_SRC
nice make -j
popd

Modify $QUDA_SRC and $QUDA_BUILD as appropriate, as well as sm_80 depending on your target architecture. A few extra flags of note:

-DQUDA_MULTIGRID=ON is what triggers building MG support. This increases the compile time.
This build assumes you do not have a local build of QMP and QIO. If you do have an existing compile, modify $USQCD_FLAGS to -DQUDA_QIOHOME=$PATH_TO_QIO_INSTALL -DQUDA_LIMEHOME=$PATH_TO_QIO_INSTALL -DQUDA_QMPHOME=$PATH_TO_QMP_INSTALL, where $PATH_TO_QIO_INSTALL and $PATH_TO_QMP_INSTALL are set appropriately.
-DQUDA_MULTIGRID_NVEC_LIST="24,64,96" instantiates support for Nc = 24, 64, and 96. These values have typically been found to be ideal (if not required to properly cover the near-null space) for HISQ MG.
-DQUDA_MULTIGRID_MRHS_LIST="8,16,32" instantiates optimized support for batched multigrid solves. At this time this is only relevant for multigrid setup but will also be relevant for multi-rhs MG support coming in https://github.com/lattice/quda/pull/1565 .

Note: previous to June 2, 2022, the flags -DQUDA_GAUGE_ALG=ON -DQUDA_GAUGE_TOOLS=ON were recommended for building QUDA for MILC HISQ MG support. The flags -DQUDA_FORCE_HISQ=ON -DQUDA_FORCE_GAUGE=ON were also necessary for building QUDA with support for RHMC as well. This changed as of commit 1285.

Building MILC with HISQ MG

In the ks_spectrum subdirectory of the develop branch of MILC, there is a script compile_ks_spectrum_hisq_quda.sh that can be used to simplify building MILC with MG support. The script takes in the following command line arguments:

Required: PATH_TO_CUDA --- location of CUDA, most commonly /usr/local/cuda but may vary on some machines (for ex Summit), use which nvcc to help find the CUDA directory
Required: PATH_TO_QUDA --- location of the QUDA build
Required: PATH_TO_QMP --- location of a QMP install. When using QUDA to build QMP, this is ${PATH_TO_QUDA}/usqcd
Required: PATH_TO_QIO --- location of a QIO install. When using QUDA to build QIO, this is ${PATH_TO_QUDA}/usqcd
Optional: MULTIGRID -- this can be set to 1 to add -DMULTIGRID to the compile flags, enabling the MG interface in MILC

Note: At this time, you also need to add the flag PATH_TO_NVHPCSDK="" to the make command at the bottom of the compile_* script. This will be addressed in the near future.

A representative workflow that downloads and builds MILC is given below. Note that the [...] needs to be filled in as appropriate for your machine. Instead of passing environment variables before the call to the compile script, PATH_TO_CUDA, etc, can be exported on the command line.

pushd .
git clone --branch develop https://github.com/milc-qcd/milc_qcd.git
cd milc_qcd/ks_spectrum
# cp ../Makefile . # unnecessary, the compile script handles this for you.

PATH_TO_CUDA=[...] PATH_TO_QUDA=[...] PATH_TO_QMP=[...] PATH_TO_QIO=[...] ./compile_ks_spectrum_hisq_quda.sh

popd

As a note, no changes are needed for a build of MILC RHMC because it does not support MG acceleration (yet).

Running MILC with MG: modifying existing measurement files

As a baseline, we'll consider a compile without -DMULTIGRID defined during the build.

We'll start from the following subset of an input script which computes four propagators: one set of three, and another set with one heavy quark inversion. As written, this script will perform a multimass (aka multishift CG) solve for the set of three, and a single CG solve for the fourth propagator.

number_of_sets 2

# Parameters for set 0

set_type multimass
max_cg_iterations 3000
max_cg_restarts 5
check yes
momentum_twist 0 0 0
precision 2

source 0

number_of_propagators 3

# propagator 0

mass 0.0012
naik_term_epsilon 0
error_for_propagator 1e-6
rel_error_for_propagator 0

fresh_ksprop
forget_ksprop

# propagator 1

mass 0.01
naik_term_epsilon 0
error_for_propagator 1e-6
rel_error_for_propagator 0

fresh_ksprop
forget_ksprop


# propagator 2

mass 0.0363
naik_term_epsilon 0
error_for_propagator 1e-6
rel_error_for_propagator 0

fresh_ksprop
forget_ksprop

# Parameters for set 1

set_type multimass
max_cg_iterations 3000
max_cg_restarts 5
check yes
momentum_twist 0 0 0
precision 2

source 0

number_of_propagators 1

# propagator 3

mass 0.432
naik_term_epsilon -0.11620
error_for_propagator 5e-17
rel_error_for_propagator 1e-7
# mixed_rsq 1e-5

fresh_ksprop
forget_ksprop

When we change to compiling with -DMULTIGRID, a new parameter needs to be added after set_type multimass in the definition of the set: the inversion type. This can take one of two values, CG or MG. We will consider the two cases separately below:

`inv_type CG`

In the case where you still want to use CG for a given set of propagators, all that you need to do is add inv_type CG and make no other changes. As a representative example, the parameters for set 0 are changed as:

[...]
set_type multimass
inv_type CG
max_cg_iterations 3000
max_cg_restarts 5
check yes
momentum_twist 0 0 0
precision 2
[...]

That's all.

`inv_type MG`

In the case where you want to use MG for a set of propagators, you need to make two changes: first, adding inv_type MG, and second, specifying an MG parameter input file. As a representative example, the parameters for set 0 are changed as:

[...]
set_type multimass
inv_type MG
MGparams /absolute/path/to/mgparams.txt
max_cg_iterations 3000
max_cg_restarts 5
check yes
momentum_twist 0 0 0
precision 2
[...]

The reason why multigrid parameters are specified in a separate file is because the implementation of MG is an ever-evolving creature---over time new parameters worth tuning may be added, and it is easier to abstract that into a file that QUDA parses than to have to update the parsing within MILC with each new feature. This is also helpful for forwards/backwards portability.

Note: For the time being MGparams needs to be the same for every propagator set where MG is used. This will be addressed in the future.

On top of modifying the definition of the parameter pack, an extra parameter needs to be added to the description of each parameter, rebuild_type, which can take one of three values: FULL, THIN, and CG.

An example of the modification for the first mass is given below:

mass 0.01
naik_term_epsilon 0
error_for_propagator 1e-6
rel_error_for_propagator 0
rebuild_type FULL

fresh_ksprop
forget_ksprop

A description of the three options:

FULL: perform a full rebuild of the multigrid coarse operators before the propagator is computed. This may be necessary if the gauge links change or if the mass changes (more likely). This incurs a potentially expensive overhead. The reason why you would perform a FULL rebuild as opposed to a THIN rebuild (described below) is because the mass is non-linearly encoded in the coarse operators. The MG coarse operator at a lighter mass may be a poor preconditioner for an intermediate mass, but a rebuilt MG coarse operator at the intermediate mass may be net worth it even with the overhead of the rebuild.
THIN: perform a "thin" rebuild of the MG setup, which is only updating the fine-level operator with potentially new gauge links and the updated mass. This should be used in the case where the MG coarse operator at a lighter mass is still a sufficiently good preconditioner at a marginally heavier mass.
CG: use CG as the solver instead of MG. This is generally meant to be used for rapid experimentation. In practice it is better to move any solves better suited for CG to an inv_type CG set of propagators.

Format of Multigrid Parameters File

At the time of writing the MG parameters file exposes a minimal set of tunable parameters for a given MG solve. An example file which includes every possible parameter is described below. The parameters can be specified in any order. Blank lines and lines starting with # are ignored. If a parameter is unset a reasonable default is assigned internally.

The following mgparam.txt file is tuned for a 16^4, beta = 6.5 quenched configuration, but is easily customizable. For more information on tuning HISQ MG refer to the page on the Staggered Multigrid Solver.

mg_levels 4 # number of levels, not including deflation level.
verify_results false # true; for debugging purposes only
preconditioner_precision half # single; for sanity-check purposes only
use_mma true # false; use tensor cores for the coarse operator construction (false is there for debugging only)
optimized_kd true # false or drop
                  # "true" creates an optimized Kahler-Dirac operator which reduces memory overheads
                  # "drop" creates an optimized Kahler-Dirac operator with the long links dropped
                  # "false" uses the legacy explicit coarse Kahler-Dirac operator
allow_drop_long false # true; when the aggregation size in a direction is less than 3, skip the long links instead of erroring out
dagger_approximation false # true; whether or not to use the dagger approximation to the KD inverse

# misc
mg_verbosity 0 true # false
mg_verbosity 1 true # false
mg_verbosity 2 true # false 
mg_verbosity 3 true # false

# setup

# build level 2 from level 1
nvec 1 64 # or 96, which may be necessary for moderately configures
geo_block_size 1 4 4 4 4 # coarsening of pseudo-fine Kahler-Dirac level
# geo_block_size 1 2 2 2 2 # necessary for "optimized_kd false"
setup_inv 1 cgnr # empirical; bicgstab, cgne, bicgstab-l, ca-cgnr, ca-cgne also supported
# setup_ca_basis-size 1 4 # CA basis size for CA-CGN(R/E), or L parameter for bicgstab-l, ignored otherwise
setup_tol 1 1e-5 # empirical
setup_maxiter 1 500 # empirical
mg_vec_infile 1 l16_vectors # change as appropriate
# mg_vec_outfile 1 l16_vectors # only specify one of the two
# mg_vec_partfile 1 false # specify whether or not to do partfile saving
# ^Specify neither mg_vec_infile nor mg_vec_outfile to avoid saving/loading files

# build level 3 from level 2
nvec 2 96 # empirical
geo_block_size 2 2 2 2 2
setup_inv 2 cgnr # again, empirical
# setup_ca_basis-size 2 4 # see above
setup_tol 2 1e-5
setup_maxiter 2 500
mg_vec_infile 2 l16_vectors
# mg_vec_outfile 2 l16_vectors
# mg_vec_partfile 2 false

# solvers

# level 0 only needs smoother info
smoother_type 0 ca-gcr
nu_pre 0 0 # empirical, not worth changing
nu_post 0 4 # empirical, may be worth testing 6 or 8

# level 1, pseudo-fine 
coarse_solver 1 gcr
coarse_solver_tol 1 0.25 # empirical, may be worth testing 5e-2
coarse_solver_maxiter 1 4 # empirical, may be worth testing 8
smoother_type 1 ca-gcr
nu_pre 1 0 # empirical, not worth changing
nu_post 1 2 # empirical, generally not worth changing

# level 2, intermediate
coarse_solve_type 2 direct-pc # direct, *-pc corresponds to the preconditioned op
coarse_solver 2 gcr
coarse_solver_tol 2 0.25 # empirical, generally not worth changing
coarse_solver_maxiter 2 4 # empirical, may be worth testing 8
smoother_type 2 ca-gcr
nu_pre 2 0 # empirical, not worth changing
nu_post 2 2 # empirical, generally not worth changing

# level 3, coarsest, doesn't need smoother info
coarse_solve_type 3 direct-pc # direct, *-pc corresponds to the preconditioned op
coarse_solver 3 ca-gcr # or bicgstab-l, ca-cgnr, ca-cgne
# coarse_solve_ca_basis_size 3 16 # basis size of ca-gcr, bicgstab-l, ca-cgnr, ca-cgne
coarse_solver_tol 3 0.25 # empirical, not worth changing
coarse_solver_maxiter 3 16 # empirical, not worth changing; uses an optimal codepath when it matches the ca_basis_size

# deflation
nvec 3 0 # 32 # if set to zero, do not deflate
mg_vec_infile 3 l16_vectors
# mg_vec_outfile 3 l16_vectors
# deflate_vec_partfile false # whether or not to save partfile
deflate_n_ev 34 # generally nvec+2
deflate_n_kr 64 # 2*nvec for smaller nvec, 1.5*nvec otherwise
deflate_max_restarts 50 # generally overkill
deflate_tol 1e-3 # should suffice, coarsest operator is in half precision
deflate_use_poly_acc true # polynomial acceleration is good
deflate_a_min 1e-2 # empirical but generally robust
deflate_poly_deg 20 # vary depending on matrix conditioning.
# ^200 good for 96^3, 500 good for 144^3

Notes on Running

With the above changes, ks_spectrum_hisq can be run as normal.