Validating Large Multigrid Changes - lattice/quda GitHub Wiki

This page is a WIP

This page is meant to document sequences of tests that should be used to validate large changes to multigrid routines. These are arguably overkill for smaller changes.

Generating a 16^4, beta = 7.0 configuration

We recommend using a relatively physical lattice to test multigrid. These can be generated using the built-in heatbath_test executable. A representative command to generate the above-described configuration is:

mpirun -np 1 ./heatbath_test --dim 16 16 16 16 --save-gauge l16t16b7p0 \
  --heatbath-beta 7.0 --heatbath-coldstart true --heatbath-num-steps 10 --heatbath-warmup-steps 1000

Testing Wilson-type modifications

  • Build with -DQUDA_PRECISION=14 or 15 (quarter isn't fully functional for now)
  • This requires two separate builds: -DQUDA_FLOAT8=ON and OFF. For ON you only need to test half for the MG build/solve.
  • This test requires compiling with Wilson, clover, twisted mass, and twisted clover.
  • This test requires building with MPI or QMP to test --partition 15.
  • This test uses a reference 16^4, beta = 7.0, quenched configuration, generated as described above
  • This test only greps the iteration counts on a solve as a proxy for stability; if a line doesn't give any output that means there was an error.

The biggest bash for loop ever:

FLOAT=FLOAT4 # just cosmetic
FLOAT=FLOAT4 # just cosmetic
for r in 1 2 # one tuning run, one post-tuning run
do
  for PARTITION in " 0" 15
  do
    for MG_MMA in " true" false
    do
      for PREC in "  half" single # remove single for FLOAT8=ON
      do
        for DSLASH in "        wilson" "        clover" "  twisted-mass" twisted-clover
        do
          for OP_0 in "   direct" direct-pc
          do
            for OP_1 in "   direct" direct-pc
            do
              for COARSE_NC_1 in 24 32
              do
                for COARSE_NC_2 in 24 32
                do
                  if [ "$COARSE_NC_2" -ge "$COARSE_NC_1" ]
                  then
                    echo -n "$FLOAT $PARTITION $MG_MMA $PREC $DSLASH $OP_0 $OP_1 $COARSE_NC_1 $COARSE_NC_2 "
                    mpirun -np 1 ./invert_test \
                    --prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
                    --recon 12 --recon-sloppy 8 --recon-precondition 8 \
                    --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
                    --kappa 0.1394265 --mu 0.00072 --clover-coeff 0.001 \
                    --dslash-type $DSLASH --compute-clover true --tol 1e-10 \
                    --verbosity verbose --solve-type $OP_0 --solution-type mat --inv-type gcr \
                    --inv-multigrid true --mg-levels 3 \
                    --mg-block-size 0 4 4 4 4 --mg-nvec 0 $COARSE_NC_1 \
                    --mg-block-size 1 2 2 2 2 --mg-nvec 1 $COARSE_NC_2 \
                    --mg-setup-tol 0 5e-7 --mg-setup-tol 1 5e-7 --mg-setup-inv 0 cgnr --mg-setup-inv 1 cgnr \
                    --mg-mu-factor 2 70.0 \
                    --nsrc 1 --niter 250 \
                    --mg-setup-use-mma 0 $MG_MMA --mg-setup-use-mma 1 $MG_MMA --mg-setup-use-mma 2 $MG_MMA \
                    --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 $OP_0 --mg-nu-pre 0 0 --mg-nu-post 0 4 \
                    --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 $OP_1 --mg-nu-pre 1 0 --mg-nu-post 1 4 \
                    --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 $OP_1 --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
                    --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
                    --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose | grep "Done:"
                  fi # nc2 >= nc1
                done # COARSE_NC_2
              done # COARSE_NC_1
            done # direct, direct-pc
          done # direct, direct-pc
        done # dslash type
      done # precision
    done # mma on/off
  done # partition 0, 15
done # tuning and post-tune run

Testing Staggered-type modifications

There are currently three approaches to multigrid with staggered-type operators:

  • MG with the optimized Kahler-Dirac operator (preferred) --- depends on the branch feature/optimized-kd-gk
  • MG with the coarse pseudo-fine operator (less preferred but functionally correct)
  • Direct aggregation MG (functionally correct, non-convergent)

The first test, the optimized Kahler-Dirac operator, is the most robust approach to staggered multigrid and thus is tested more rigorously. The other two approaches are smaller tests that only target the key components that aren't covered by the optimized Kahler-Dirac tests.

These tests require the following build options:

  • Build with -DQUDA_PRECISION=14 (quarter isn't templated over yet)
  • This test requires compiling with staggered -DQUDA_DIRAC_STAGGERED=ON.
  • This test requires building with MPI or QMP to test --partition 15.
  • This test uses a reference 16^4, beta = 7.0, quenched configuration. Reach out to Evan (@weinbe2) for a copy.
  • This test only greps the iteration counts on a solve as a proxy for stability; if a line doesn't give any output that means there was an error.

Testing MG with the optimized KD operator

for r in 1 2 # one tuning run, one post-tune run
do
  for PARTITION in " 0" 15
  do
    for MG_MMA in " true" false
    do
      for PREC in "  half" single
      do
        for DSLASH in staggered "   asqtad"
        do
          for OP_0 in "   direct" direct-pc # test using the full vs Schur outer operator
          do
            for OP_2 in "   direct" direct-pc # test the level 2 coarse op
            do
              for OP_3 in "   direct" direct-pc # test the level 3 coarse op
              do
                for COARSE_NC_2 in 64 96
                do
                  for COARSE_NC_3 in 96 # empirically 64 -> 64 isn't helpful, so we don't compile it
                  do
                    echo -n "$PARTITION $MG_MMA $PREC $DSLASH $OP_0 $OP_2 $OP_3 $COARSE_NC_2 $COARSE_NC_3 "
                    mpirun -np 1 ./staggered_invert_test \
                      --prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
                      --mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
                      --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
                      --dslash-type $DSLASH --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
                      --verbosity verbose --solve-type $OP_0 --solution-type mat --inv-type gcr \
                      --inv-multigrid true --mg-levels 4 --mg-coarse-solve-type 0 $OP_0 --mg-staggered-coarsen-type kd-optimized \
                      --mg-block-size 0 1 1 1 1 --mg-nvec 0 3 \
                      --mg-block-size 1 4 4 4 4 --mg-nvec 1 $COARSE_NC_2 \
                      --mg-block-size 2 2 2 2 2 --mg-nvec 2 $COARSE_NC_3 \
                      --mg-setup-tol 1 1e-5 --mg-setup-tol 2 1e-5 --mg-setup-inv 1 cgnr --mg-setup-inv 2 cgnr \
                      --nsrc 1 --niter 25 \
                      --mg-setup-use-mma 0 $MG_MMA --mg-setup-use-mma 1 $MG_MMA --mg-setup-use-mma 2 $MG_MMA --mg-setup-use-mma 3 $MG_MMA \
                      --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 $OP_0  --mg-nu-pre 0 0 --mg-nu-post 0 4 \
                      --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct --mg-nu-pre 1 0 --mg-nu-post 1 4 \
                      --mg-smoother 2 ca-gcr --mg-smoother-solve-type 2 $OP_2  --mg-nu-pre 2 0 --mg-nu-post 2 4 \
                      --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct --mg-coarse-solver-tol 1 0.25 --mg-coarse-solver-maxiter 1 16 \
                      --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 $OP_2 --mg-coarse-solver-tol 2 0.25 --mg-coarse-solver-maxiter 2 16 \
                      --mg-coarse-solver 3 ca-gcr --mg-coarse-solve-type 3 $OP_3 --mg-coarse-solver-tol 3 0.25 --mg-coarse-solver-maxiter 3 16 \
                      --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose --mg-verbosity 3 verbose 2>&1 | grep "Done\|ERROR"
                  done
                done
              done
            done
          done
        done
      done
    done
  done
done

Note that multigrid with the asqtad operator supports dropping the long links by replacing --mg-staggered-coarsen-type kd-optimized with --mg-staggered-coarsen-type kd-optimized-drop-long. This doesn't need to be tested separately because the "dropped" long link operator is just the naive staggered operator.

Testing staggered MG with the coarse pseudo-fine operator

This is maintained for legacy reasons because it's sub-optimal. It may be removed in future versions.

for r in 1 2 # one tuning run, one post-tuning run
do
  for PARTITION in " 0" 15
  do
    for PREC in "  half" single
    do
      for DSLASH in staggered "   asqtad"
      do
        for COARSE_NC_2 in 64 96
        do
          echo -n "$PARTITION $PREC $DSLASH $COARSE_NC_2 "
          mpirun -np 1 ./staggered_invert_test \
            --prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
            --mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
            --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
            --dslash-type $DSLASH --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
            --verbosity verbose --solve-type direct --solution-type mat --inv-type gcr \
            --inv-multigrid true --mg-levels 3 --mg-staggered-coarsen-type kd-coarse \
            --mg-block-size 0 2 2 2 2 --mg-nvec 0 24 \
            --mg-block-size 1 2 2 2 2 --mg-nvec 1 $COARSE_NC_2 \
            --mg-setup-tol 1 1e-5 --mg-setup-inv 1 cgnr \
            --nsrc 1 --niter 25 \
            --mg-use-mma true \
            --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct    --mg-nu-pre 0 0 --mg-nu-post 0 4 \
            --mg-smoother 1 ca-gcr --mg-smoother-solve-type 1 direct-pc --mg-nu-pre 1 0 --mg-nu-post 1 4 \
            --mg-coarse-solver 1 gcr --mg-coarse-solve-type 1 direct-pc --mg-coarse-solver-tol 1 0.35 --mg-coarse-solver-maxiter 1 10 \
            --mg-coarse-solver 2 gcr --mg-coarse-solve-type 2 direct-pc --mg-coarse-solver-tol 2 0.01 --mg-coarse-solver-maxiter 2 20 \
            --mg-verbosity 0 verbose --mg-verbosity 1 verbose --mg-verbosity 2 verbose 2>&1 | grep "Done\|ERROR"
        done # COARSE_NC_2
      done # dslash type
    done # precision
  done # partition 0, 15
done # tuning and post-tune run

Testing aggregated staggered MG

Note: this is only included as a formality for now. Directly aggregating the staggered operator (as opposed to KD preconditioning first) leads to an unstable operator. These solves will hit the maximum iteration count: that's to be expected. The primary purpose of this test is to verify that prolongation/restriction/block-orthogonalize/coarsening between the fine Nc = 3, Nspin = 1 and coarse Nc = 64, Nspin = 2 lattice works correctly. Other tests, such as coarsening Nc = 64 -> Nc = 96, are implicitly covered by the test above.

for r in 1 2 # one tuning run, one post-tuning run
do
  for PARTITION in " 0" 15
  do
    for PREC in "  half" single
    do
      for DSLASH in staggered "   asqtad"
      do
        for OP_1 in "   direct" # direct-pc is implicitly tested in the staggered KD test
        do
          for COARSE_NC_1 in 64 96
          do
            echo -n "$PARTITION $PREC $DSLASH $OP_1 $COARSE_NC_1 "
            mpirun -np 1 ./staggered_invert_test \
            --prec double --prec-sloppy single --prec-null $PREC --prec-precondition $PREC \
            --mass 0.01 --recon 13 --recon-sloppy 9 --recon-precondition 9 \
            --dim 16 16 16 16 --gridsize 1 1 1 1 --load-gauge l16t16b7p0 --partition $PARTITION \
            --dslash-type $DSLASH --compute-fat-long true --tadpole-coeff 0.905160183 --tol 1e-10 \
            --verbosity verbose --solve-type direct --solution-type mat --inv-type gcr \
            --inv-multigrid true --mg-levels 2  --mg-staggered-coarsen-type aggregate \
            --mg-block-size 0 4 4 4 4 --mg-nvec 0 $COARSE_NC_1 \
            --mg-setup-tol 0 1e-5 --mg-setup-inv 0 cgnr \
            --nsrc 1 --niter 20 \
            --mg-smoother 0 ca-gcr --mg-smoother-solve-type 0 direct --mg-nu-pre 0 0 --mg-nu-post 0 4 \
            --mg-coarse-solver 1 cgnr --mg-coarse-solve-type 1 $OP_1 --mg-coarse-solver-tol 1 0.01 --mg-coarse-solver-maxiter 1 50 \
            --mg-verbosity 0 verbose --mg-verbosity 1 verbose 2>&1 | grep "Done\|ERROR"
          done # COARSE_NC_1
        done # direct
      done # dslash type
    done # precision
  done # partition 0, 15
done # tuning and post-tune run