QUDA Accessor Framework In Dslash Kernels. [WIKI PAGE WORK IN PROGRESS] - lattice/quda GitHub Wiki

QUDA's Dslash implementation and structure.

All of QUDA's Dslash kernels are derived from a highly abstracted base class defined in include/dslash.h. This base class handles how each of the interior and exterior kernels are constructed at compile time, and also handles launching those kernels at run time, i.e., there is one launch function in the base class that every Dslash type calls. The specifics of each Dslash operator are defined in the files lib/dslash4_DSLASH_TYPE.cu (* see note at the end of this section) and include/kernels/dslash_DSLASH_TYPE.cuh where DSLASH_TYPE can be any of the types currently offered in QUDA. We enumerate them here as they appear in the library for clarity:

wilson
wilson_clover
wilson_clover_preconditioned
twisted_mass
twisted_mass_preconditioned
twisted_clover
twisted_clover_preconditioned
ndeg_twisted_mass
ndeg_twisted_mass_preconditioned
staggered
improved_staggered
domain_wall_4d
domain_wall_5d

A minor exception to this naming convention are the first and second order gauge covariant derivative operators which are not prepended with dslash. They appear as simply as

covDev
laplace

The splitting of abstracted, GPU oriented boiler-plate code in the Dslash base class and the code that represents the mathematics of the discrete operator allows the QUDA developer to better augment existing Dslash operators, or indeed create a brand new one. We shall use the well known Wilson operator defined in lib/dslash4_wilson.cu and include/dslash4_wilson.cuh as a pedagogical example on how to use this new framework.

(* the number 4 that appears in the name is a placeholder to separate the new code filenames from the old ones. When the legacy code is expunged from the library, the 4s will be removed.)

Wilson Dslash: A worked example.

In order to get a feel for how the new framework applies the fermion operator, we shall simply follow scope through the calling sequence, making a note of important features, then going back and expounding on those features in more detail. We shall delineate between features that are common to all the Dslash operators, and those that are specific to Wilson, in order to properly clarify the structure.

We start at the file dirac_wilson.cpp and the function

void DiracWilson::Dslash(...)

from which all calls to the Wilson dslash are initiated. A kernel call to ApplyWilson initiates the GPU kernel in lib/dslash4_wilson.cu. Aside from some perfunctory internal checks, this function calls the instantiate function defined in the base dslash class:

instantiate<WilsonApply, WilsonReconstruct>(out, in, U, a, x, parity, dagger, comm_override, profile);

Notice that this function is defined in the dslash.h file only and takes WilsonApply as a template parameter. This is the first call to a cascade of instantiate templates defined in dslash.h, which are essentially hidden from both the user, and a developer creating a new fermion operator. We shall return to the details of this template later. For now, it is sufficient to note that after this call to instantiate, scope switches to the WilsonApply function.