Validating Large Multigrid Changes - lattice/quda GitHub Wiki
This page is a WIP
This page is meant to document sequences of tests that should be used to validate large changes to multigrid routines. These are arguably overkill for smaller changes.
Generating a 16^4, beta = 7.0 configuration
We recommend using a relatively physical lattice to test multigrid. These can be generated using the built-in heatbath_test
executable. A representative command to generate the above-described configuration is:
mpirun -np 1 ./heatbath_test --dim 16 16 16 16 --save-gauge l16t16b7p0 \
--heatbath-beta 7.0 --heatbath-coldstart true --heatbath-num-steps 10 --heatbath-warmup-steps 1000
Testing Wilson-type modifications
- Build with
-DQUDA_PRECISION=14
or15
(quarter isn't fully functional for now) - This requires two separate builds:
-DQUDA_FLOAT8=ON
andOFF
. ForON
you only need to testhalf
for the MG build/solve. - This test requires compiling with Wilson, clover, twisted mass, and twisted clover.
- This test requires building with MPI or QMP to test
--partition 15
. - This test uses a reference 16^4, beta = 7.0, quenched configuration, generated as described above
- This test only greps the iteration counts on a solve as a proxy for stability; if a line doesn't give any output that means there was an error.
The biggest bash for loop ever:
FLOAT=FLOAT4 # just cosmetic
FLOAT=FLOAT4 # just cosmetic
for r in 1 2 # one tuning run, one post-tuning run
do
for PARTITION in " 0" 15
do
for MG_MMA in " true" false
do
for PREC in " half" single # remove single for FLOAT8=ON
do
for DSLASH in " wilson" " clover" " twisted-mass" twisted-clover
do
for OP_0 in " direct" direct-pc
do
for OP_1 in " direct" direct-pc
do
for COARSE_NC_1 in 24 32
do
for COARSE_NC_2 in 24 32
do
if [ "$COARSE_NC_2" -ge "$COARSE_NC_1" ]
then
echo -n "$FLOAT $PARTITION $MG_MMA $PREC $DSLASH $OP_0 $OP_1 $COARSE_NC_1 $COARSE_NC_2 "
mpirun -np 1 ./invert_test \
--prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
--recon 12 --recon-sloppy 8 --recon-precondition 8 \
--dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
--kappa 0.1394265 --mu 0.00072 --clover-coeff 0.001 \
--dslash-type $DSLASH --compute-clover true --tol 1e-10 \
--verbosity verbose --solve-type $OP_0 --solution-type mat --inv-type gcr \
--inv-multigrid true --mg-levels 3 \
--mg-block-size 0 4 4 4 4 --mg-nvec 0 $COARSE_NC_1 \
--mg-block-size 1 2 2 2 2 --mg-nvec 1 $COARSE_NC_2 \
--mg-setup-tol 0 5e-7 --mg-setup-tol 1 5e-7 --mg-setup-inv 0 cgnr --mg-setup-inv 1 cgnr \
--mg-mu-factor 2 70.0 \
--nsrc 1 --niter 250 \
--mg-setup-use-mma 0 $MG_MMA --mg-setup-use-mma 1 $MG_MMA --mg-setup-use-mma 2 $MG_MMA \
--mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 $OP_0 --mg-nu-pre 0 0 --mg-nu-post 0 4 \
--mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 $OP_1 --mg-nu-pre 1 0 --mg-nu-post 1 4 \
--mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 $OP_1 --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
--mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
--mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose | grep "Done:"
fi # nc2 >= nc1
done # COARSE_NC_2
done # COARSE_NC_1
done # direct, direct-pc
done # direct, direct-pc
done # dslash type
done # precision
done # mma on/off
done # partition 0, 15
done # tuning and post-tune run
Testing Staggered-type modifications
There are currently three approaches to multigrid with staggered-type operators:
- MG with the optimized Kahler-Dirac operator (preferred) --- depends on the branch
feature/optimized-kd-gk
- MG with the coarse pseudo-fine operator (less preferred but functionally correct)
- Direct aggregation MG (functionally correct, non-convergent)
The first test, the optimized Kahler-Dirac operator, is the most robust approach to staggered multigrid and thus is tested more rigorously. The other two approaches are smaller tests that only target the key components that aren't covered by the optimized Kahler-Dirac tests.
These tests require the following build options:
- Build with
-DQUDA_PRECISION=14
(quarter isn't templated over yet) - This test requires compiling with staggered
-DQUDA_DIRAC_STAGGERED=ON
. - This test requires building with MPI or QMP to test
--partition 15
. - This test uses a reference 16^4, beta = 7.0, quenched configuration. Reach out to Evan (@weinbe2) for a copy.
- This test only greps the iteration counts on a solve as a proxy for stability; if a line doesn't give any output that means there was an error.
Testing MG with the optimized KD operator
for r in 1 2 # one tuning run, one post-tune run
do
for PARTITION in " 0" 15
do
for MG_MMA in " true" false
do
for PREC in " half" single
do
for DSLASH in staggered " asqtad"
do
for OP_0 in " direct" direct-pc # test using the full vs Schur outer operator
do
for OP_2 in " direct" direct-pc # test the level 2 coarse op
do
for OP_3 in " direct" direct-pc # test the level 3 coarse op
do
for COARSE_NC_2 in 64 96
do
for COARSE_NC_3 in 96 # empirically 64 -> 64 isn't helpful, so we don't compile it
do
echo -n "$PARTITION $MG_MMA $PREC $DSLASH $OP_0 $OP_2 $OP_3 $COARSE_NC_2 $COARSE_NC_3 "
mpirun -np 1 ./staggered_invert_test \
--prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
--mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
--dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
--dslash-type $DSLASH --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
--verbosity verbose --solve-type $OP_0 --solution-type mat --inv-type gcr \
--inv-multigrid true --mg-levels 4 --mg-coarse-solve-type 0 $OP_0 --mg-staggered-coarsen-type kd-optimized \
--mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
--mg-block-size 1 4 4 4 4 --mg-nvec 1 $COARSE_NC_2 \
--mg-block-size 2 2 2 2 2 --mg-nvec 2 $COARSE_NC_3 \
--mg-setup-tol 1 1e-5 --mg-setup-tol 2 1e-5 --mg-setup-inv 1 cgnr --mg-setup-inv 2 cgnr \
--nsrc 1 --niter 25 \
--mg-setup-use-mma 0 $MG_MMA --mg-setup-use-mma 1 $MG_MMA --mg-setup-use-mma 2 $MG_MMA --mg-setup-use-mma 3 $MG_MMA \
--mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 $OP_0 --mg-nu-pre 0 0 --mg-nu-post 0 4 \
--mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct --mg-nu-pre 1 0 --mg-nu-post 1 4 \
--mg-smoother 2 ca-gcr --mg-smoother-solve-type 2 $OP_2 --mg-nu-pre 2 0 --mg-nu-post 2 4 \
--mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 0.25 --mg-coarse-solver-maxiter 1 16 \
--mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 $OP_2 --mg-coarse-solver-tol 2 0.25 --mg-coarse-solver-maxiter 2 16 \
--mg-coarse-solver 3 ca-gcr --mg-coarse-solve-type 3 $OP_3 --mg-coarse-solver-tol 3 0.25 --mg-coarse-solver-maxiter 3 16 \
--mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose --mg-verbosity 3 verbose 2>&1 | grep "Done\|ERROR"
done
done
done
done
done
done
done
done
done
done
Note that multigrid with the asqtad operator supports dropping the long links by replacing --mg-staggered-coarsen-type kd-optimized
with --mg-staggered-coarsen-type kd-optimized-drop-long
. This doesn't need to be tested separately because the "dropped" long link operator is just the naive staggered operator.
Testing staggered MG with the coarse pseudo-fine operator
This is maintained for legacy reasons because it's sub-optimal. It may be removed in future versions.
for r in 1 2 # one tuning run, one post-tuning run
do
for PARTITION in " 0" 15
do
for PREC in " half" single
do
for DSLASH in staggered " asqtad"
do
for COARSE_NC_2 in 64 96
do
echo -n "$PARTITION $PREC $DSLASH $COARSE_NC_2 "
mpirun -np 1 ./staggered_invert_test \
--prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
--mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
--dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
--dslash-type $DSLASH --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
--verbosity verbose --solve-type direct --solution-type mat --inv-type gcr \
--inv-multigrid true --mg-levels 3 --mg-staggered-coarsen-type kd-coarse \
--mg-block-size 0 2 2 2 2 --mg-nvec 0 24 \
--mg-block-size 1 2 2 2 2 --mg-nvec 1 $COARSE_NC_2 \
--mg-setup-tol 1 1e-5 --mg-setup-inv 1 cgnr \
--nsrc 1 --niter 25 \
--mg-use-mma true \
--mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct --mg-nu-pre 0 0 --mg-nu-post 0 4 \
--mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct-pc --mg-nu-pre 1 0 --mg-nu-post 1 4 \
--mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct-pc --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
--mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
--mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose 2>&1 | grep "Done\|ERROR"
done # COARSE_NC_2
done # dslash type
done # precision
done # partition 0, 15
done # tuning and post-tune run
Testing aggregated staggered MG
Note: this is only included as a formality for now. Directly aggregating the staggered operator (as opposed to KD preconditioning first) leads to an unstable operator. These solves will hit the maximum iteration count: that's to be expected. The primary purpose of this test is to verify that prolongation/restriction/block-orthogonalize/coarsening between the fine Nc = 3, Nspin = 1 and coarse Nc = 64, Nspin = 2 lattice works correctly. Other tests, such as coarsening Nc = 64 -> Nc = 96, are implicitly covered by the test above.
for r in 1 2 # one tuning run, one post-tuning run
do
for PARTITION in " 0" 15
do
for PREC in " half" single
do
for DSLASH in staggered " asqtad"
do
for OP_1 in " direct" # direct-pc is implicitly tested in the staggered KD test
do
for COARSE_NC_1 in 64 96
do
echo -n "$PARTITION $PREC $DSLASH $OP_1 $COARSE_NC_1 "
mpirun -np 1 ./staggered_invert_test \
--prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
--mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
--dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
--dslash-type $DSLASH --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
--verbosity verbose --solve-type direct --solution-type mat --inv-type gcr \
--inv-multigrid true --mg-levels 2 --mg-staggered-coarsen-type aggregate \
--mg-block-size 0 4 4 4 4 --mg-nvec 0 $COARSE_NC_1 \
--mg-setup-tol 0 1e-5 --mg-setup-inv 0 cgnr \
--nsrc 1 --niter 20 \
--mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct --mg-nu-pre 0 0 --mg-nu-post 0 4 \
--mg-coarse-solver 1 cgnr --mg-coarse-solve-type 1 $OP_1 --mg-coarse-solver-tol 1 0.01 --mg-coarse-solver-maxiter 1 50 \
--mg-verbosity 0 verbose --mg-verbosity 1 verbose 2>&1 | grep "Done\|ERROR"
done # COARSE_NC_1
done # direct
done # dslash type
done # precision
done # partition 0, 15
done # tuning and post-tune run