Design & Implementation Guidelines - HenrikBengtsson/matrixStats GitHub Wiki
Requirement
A requirement for the native implementation is to:
- avoid calling ISNAN(x)as far as possible wheneverxis a double.
This is because that operation to test for missing value is fairly expensive for doubles.  It is often better to let the floating-point arithmetic of CPU to take of this; it will correctly propagate NAs whenever doing additions, subtractions, multiplications, divisions etc.  This also means that we cannot to do early stopping for na.rm=FALSE cases.  See below for an example.
Note that testing for missing values when x is an integer is not expensive.  This is because the test is a simple equality comparison, which is very cheap.  More precisely, it is done as x == NA_INTEGER.  Since testing for missing values for integers are cheap, we can also use it for early stopping whenever na.rm=FALSE.  Actually, we have to test for NA_INTEGER, because the CPU does not handle them for us.  This is because missing values for integers are not part of the IEEE standard(s).  Instead, in R they are defined as the the smallest possible 4-byte signed integer, i.e.
LibExtern int	 R_NaInt;	/* NA_INTEGER:= INT_MIN currently */
#define NA_INTEGER	R_NaInt
This value is -2^31 = -2147483648. Note that the smallest possible integer in R is this value plus one, i.e. -.Machine$integer.max = -2147483647.  Also, if one would not account this special value it would be treated as just another value, e.g. NA_INTEGER + 1 == -2147483647.
Strategy
A typically approach to avoid the overhead of ISNAN(x), which only exists for doubles, is to implement algorithms slightly different for integers and doubles.  We use preprocessing macros for this, e.g.
#if X_TYPE == 'i'
      if (!X_ISNAN(value)) {
        sum += (LDOUBLE)value;
      } else if (!narm) {
          sum = R_NaReal;
          break;
      }
#elif X_TYPE == 'r'
      if (!narm || !X_ISNAN(value)) {
        sum += (LDOUBLE)value;
      }
#endif
Here we see that for integers (X_TYPE == 'i') we always test for missing values in order to handle them properly. At the same time we can typically do "early stopping" in case there is a missing value  and na.rm=FALSE.
For doubles (X_TYPE == 'r') we minimize the testing for missing values by only testing for them when na.rm=TRUE; hence the (!narm || !X_ISNAN(value)) contruct.
BTW, in order to have a somewhat more consistent notation, we X_ISNAN(x) for both integers and doubles by defining:
  #define X_ISNA(x) (x == NA_INTEGER)
for integers (X_TYPE == 'i') and
  #define X_ISNAN(x) ISNAN(x) /* NA or NaN */
for doubles (X_TYPE == 'r').