Running some internal tests (blas, dslash, invert) - lattice/quda GitHub Wiki

Running Google Tests in General

As mentioned elsewhere QUDA comes with a large number of internal tests. These are often run as part of CI and the entire test suite can be kicked off via

make test

after a build is complete. When tracking down bugs, or portability failures, one can also run the tests directly via ctest using e.g.:

ctest --output-on-failure

which will trigger output for failing tests.

CTest can also be used to zoom in on a failing test. E.g. if tests 5 and 6 fail one can just get CTest to execute those, with verbosity enabled

ctest -V -I 5,6

BLAS Tests

The BLAS tests test all the BLAS routines including the multi-blas. Useful arguments are:

  • --prec (which can be e.g. quarter, half, single or double) depending on the build.
  • --verbosity (e.g. summarize, verbose, debug) which control test verbosity
  • --gtest_list_tests -- since blas_test has many tests, we can list them all in case we want to focus in on some
  • --gtest_filter=<Testname> -- where <Testname> corresponds to a name of the test
  • --sdim <SPATIAL> --tdim <TEMPORAL> --Lsdim <FIFTH> corresponding to the lattice spatial (X,Y,Z), temporal (T), and 5th dimension (for 5D chiral fermions) respectively

If no arguments are given, blas_test will execute all of its tests for all of its compiled precisions. If a precision is not compiled (e.g. quarter) its tests will be skipped, denoted by e.g.:

[ RUN      ] QUDA/BlasTest.verify/caxpbypzYmbwcDotProductUYNormY_quarter_double
[  SKIPPED ] QUDA/BlasTest.verify/caxpbypzYmbwcDotProductUYNormY_quarter_double (0 ms)

in the test output.

Samples:

  • To run all the correctness, and performance tests for all enabled precisions
./blas_test
  • To run all the correctness, and performance tests for single precision:
./blas_test --prec single
  • To run all the the performance tests for half precision:
./blas_test --prec half --gtest_filter=QUDA/BlasTest.benchmark/*
  • To run a specific test with full debug output (e.g. of all the tuning and kernel launch parameters) on a lattice of size 32x32x32x64 sites:
 ./blas_test --gtest_filter=QUDA/BlasTest.verify/cDotProductNormA_single_single --verbosity debug --sdim 32 --tdim 64 --Lsdim 1

Dslash Tests

After the blas_test the next nontrivial test is the Dslash test which can be used to test the various Fermion Matrix operators. There are two test programs for Dslash, one is dslash_test for Wilson Like fermions (Wilson, clover, twisted mass etc), and staggered_dslash_test. In order to build the staggered tests one must enable the MILC interface during build time (-DQUDA_MILC_INTERFACE=ON )

The dslash operators in this sense are full fermion operators (rather than just the derivative piece) and can have Mass parameters, preconditioning styles etc. They can also feature gauge compression (e.g. storing the gauge links using 18,12, or 8 real numbers for wilson like operators).

Some example invocations:

  • single checkerboard Wilson dslash operator, in single precision, with default 4D lattice sizes
./dslash_test --prec single --Lsdim 1
  • single checkerboard Clover dslash operator (AD), in single precision, with default 4D Lattice sizes, computing the clover term on the GPU:
./dslash_test --prec single --Lsdim 1 --dslash-type clover --compute-clover 1 
  • single checkerboard Wilson Dslash with half precision and 8-compression, on a lattice with 32x32x32x32 sites:
./dslash_test --prec half --dslash-type wilson --sdim 32 --tdim 32 --Lsdim 1 --recon 8

Invert Tests

The next level of complexity is to run a full solver. QUDA features invert_test staggered_invert_test, again for Wilson-like and Staggered-like operators. One now is faced with several extra parameters:

  • --solve-type defines what kind of solve to perform with what kind of preconditioned linear operator. For example, a direct-solver like BiCGStab would use either direct or direct-pc indicating that it was solving with the regular linear operator whether it is solving on the full solution, or whether it is solving on a checkerboard. A solver like CG may opt for a normal operator using normop or normop-pc.
  • --solution-type determines which system to solve, e.g.
    1. mat -- solve the system A x = b where A is the unpreconditioned matrix. If the solve type is direct-pc or normop-pc this would indicate solving the Schur preconditioned system under the hood but reconstructing the solution.
    2. mat-pc -- solve the system A_p x = b where A_p is the Schur preconditioned matrix. Presumably needs --solve-type being either normop-pc or direct-pc.
    3. mat-dag-mat -- solve the system A^\dagger A x = b, in other words with the normal operator, as one would do in a fermion matrix calculation
    4. there are many more combinations.
  • --matpc -- selects the checkerboard on which the solve is to be done (e.g. odd-odd)
  • --prec-sloppy -- for solves with mixed precision, selects the 'lower' precision (e.g. in inner solves)
  • --recon-sloppy -- for solves with mixed precision, selects the gauge field compression strategy (e.g. in inner solves)
  • --reliable-delta -- for solves using mixed precision and reliable updates the delta parameter to use for reliable updating ( e.g. 0.1 in half precision, or 0.001 in single).

Of course there are many more parameters one can turn to, this is just the tip of the iceberg for simple manual testing.

Some example invocations:

  • Uniform precision (double-prec) conjugate gradients with solve for normal equations
./invert_test --sdim 32 --tdim 64 --Lsdim 1 \
  --dslash-type clover  --mass 0 --mass-normalization mass --clover-coeff 1.0  \
  --prec double --prec-sloppy double \
  --inv-type cgne --solution-type mat --solve-type normop-pc \
  --tol 1.0e-14 --verbosity verbose
  • Mixed half-double precision BiCGStab. Use a reliable restart cofficient of 0.1. Use 8-real numbers to store the sloppy gauge fields, and use 12 numbers to store the precise ones.
./invert_test --sdim 32 --tdim 64 --Lsdim 1 \
  --dslash-type clover  --mass 0 --mass-normalization mass --clover-coeff 1.0 \
  --prec double --prec-sloppy half --recon 12 --recon-sloppy 8 \
  --inv-type bicgstab --solution-type mat --solve-type direct-pc \
  --tol 1.0e-14 --reliable-delta 0.1 --verbosity verbose
⚠️ **GitHub.com Fallback** ⚠️