EigCG, GMRES DR, and FGMRES DR - lattice/quda GitHub Wiki

First of all, all deflation solvers have to be compiled with eigen library. Please note that MAGMA interface was removed.

EigCG

Let's start with eigCG/incremental eigCG. For details on the algorithm see the paper by Stathopoulos and Orginos [1]. There are 13 parameters in these solvers. (Don't be scared, most of them can be used with default values).

  • --eig-n-conv specifies how many eigenvectors you want to compute during one eigCG call. This parameter corresponds to parameter nev in [1]. The eigCG algorithm can compute only a small number of low eigenvectors, typically not more than 10-16. This means that there is little freedom here to play with. Typically we use nev = 8 always.

  • --eig-n-kr specifies the eigenvector search space, this corresponds to parameter m in [1]. In theory, the bigger the better, e.g., if m = "number of iterations of the CG" we get full Lancosz basis (unrestarted). In practice, we are limited in memory, typical values include 64 or 128. Although it is advisable for computational reasons to keep m large, nev should not be too large because it increases the eigCG cost without providing more than a few good eigenpair approximations from one linear system.

  • --eig-n-ev specify how many eigenvectors one wants to compute using incremental eigCG. The solver will automatically decide how many physical RHS are needed to fill up the deflation grid.

    • In the case of solo-precision incremental eigCG, --eig-n-ev coincides with the number of physical RHS.
    • In the case of mixed-precision incremental eigCG, the deflation grid gets also contribution from "logical" RHS, which in fact corresponds to iterative refinement cycles within a given linear system. This can reduce the number of physical RHS to fill the eigenvector space significantly, e.g., a factor of two or more.

    Note here that an end-user may request an arbitrary big number of RHS in the incremental stage, the solver will detect when the deflation grid will be complete and switch to the deflation (initCG) stage.

  • --eig-max-restarts is used in the incremental stage by the mixed precision eigCG. It specifies the maximum number of refinement cycles that must be done with low-precision eigCG calls. This is also the maximum number of logical RHS within a single linear system. If the solution requires more refinement cycles the remaining will be using with (projected) low-precision CG calls.

  • --eig-tol is used to set the selection criterion for the low eigenmodes computed in the incremental stage. All eigenvectors with eigenvalue residual norm smaller than the parameter will be selected.

  • --df-tol-restart is used in the deflation stage by the initCG solver. It specifies at what tolerance to perform the first restart of the deflated CG solver.

  • --df-tol-inc specifies how much relative residual vector should decrease for the subsequent restarts of the initCG solver. E.g.: if inv_param.tol = 1e-10 and inv_param.tol_restart = 1e-5, and inc_tol = 1e-2, then the first restart will be when the relative iterative residual drops to 1e-5 during the initial iterations, the second will happen when the relative residual drops to 1e-2 after the first restart and so on.

  • --df-max-restart-num is used to set number of restarts needed for the initCG solver. Typically more than three restarts are not advised (mostly due to overhead of the restarts).

  • --prec-ritz specifies the precision of eigenvectors, it defaults to QudaInvertParam::cuda_prec_sloppy, but you may try full precision Ritz vectors as well (even if you choose mixed-precision solver).

  • --pipeline specifies the pipeline length for the eigCG solver. This parameter must not exceed 2*nev.

  • --reliable-delta specifies a desired frequency of the reliable updates.

For running the invert_test application with the incremetal eigcg, you should use the following example (assuming QUDA was configured with -DQUDA_MAGMA=ON):

./invert_test --inv-deflate true --prec double --prec-sloppy single --prec-precondition half --recon 12 --recon-sloppy 12 --recon-precondition 12  --dim 24 24 24 64 --anisotropy 2.38 --mass -0.4125  --inv-type inc-eigcg --niter 1000 --tol 1e-8 --prec-ritz single --eig-n-ev 8 --eig-n-Kr 64 --eig-n-conv 128 --eig-max-restarts 4 --df-tol-restart 5e-5 --eig-tol 1e-1 --df-tol-inc 1e-2 --df-max-restart-num 3 --nsrc 32 --pipeline 8

GMRES-DR

The important advantage of the GMRES-DR is that it can be used for non-symmetric systems, and only for one system. In practice, GMRES-DR performs (almost) the same as a full GMRES but with much less memory restrictions (for the cost of extra projection operations between restarts). What is really interesting is that information extracted from one GMRES-DR call can be used further for other RHS. This version is called GMRES-Proj, indicating that one uses computed (harmonic) Ritz vectors to perform projections between restarts in the regular GMRES solver. Morgan and Wilcox observed that this significantly improves performance of GMRES in terms of iterations, but this has not yet been observed here and so it requires additional investigation.

In order to run it, you need to adjust the following parameters, noting that we are overloading the same parameters as are used for eigCG, therefore it might be not so obvious for some parameters what they are responsible for. This solver requires 4 parameters.

  • --df-max-search-dim specifies Arnoldi basis dimension. It corresponds to parameter m in the paper by Morgan [2].

  • --df-nev specifies how many harmonic Ritz vectors one wants to compute. This corresponds to the parameter k in [2].

  • --df-deflation-grid is obscure: it is the maximum number of restarts performed by GMRES-DR solver. This functionality may be repurposed elsewhere in the future, e.g., to re-use the max_iter parameter.

  • --solver-ext-lib-type is used to specify an external library for the GMRES-DR internal routines. This is needed, e.g., for dense eigensolvers in the harmonic Ritz vector computation. By default, QUDA employs the Eigen library. However, one may choose any other supported external library, like MAGMA library. It is the user responsibility that the application is properly linked against the selected library. Currently QUDA allows eigen and magma options. The latter might be preferable when --df-max-search-dim is of order O(100), so the GPU-accelerated library may give better performance.

There are two implementations here, namely solo and mixed precision. Our mixed precision implementation follows Frommer et al [3].

FGMRES-DR

The GMRES-DR solver can be used with a variable preconditioner; the corresponding flexible version, FGMRES-DR inherits the parameter set from GMRES-DR, see Giraud et al [4] for further details about the solver. The flexible version is activated if one specifies the preconditioner via --precon-type option.

For running the deflated_invert_test application with the FGMRES-DR, you should use the following example:

./deflated_invert_test --prec double --prec-sloppy single --prec-precondition half --recon 12 --recon-sloppy 12 --recon-precondition 12  --dim 24 24 24 64 --anisotropy 2.38 --mass -0.4125  --inv-type gmresdr --tol 1e-8 --prec-ritz single --df-nev 7 --df-max-search-dim 15 --df-deflation-grid 16 --nsrc 1 --precon-type mr