Computation on empty vectors - lmmx/devnotes GitHub Wiki
Backup of wiki.r-project.org/rwiki/doku.php?id=tips:surprises:emptysetfuncs, which various pages online such as this archived mailgroup conversation refer to yet is no longer available. A single archiving on the Wayback Machine has saved it for posterity yet the site is not indexed by Google. Below is the page as recorded in August 2007.
— Tony Plate 2006/01/09 [last tested: 2006/02/01 on R 2.2.1]
It surprises some people that the sum of the empty set is zero and the product of the empty set is one, i.e.:
> sum(numeric(0)) [1] 0 > prod(numeric(0)) [1] 1
(numeric(0)
creates a numeric vector of zero length).
There are good reasons for this, and it always will be like this. Here are some of the reasons:
-
Or more generally, as Thomas Lumley points out: The output of sum and prod is always of length 1, so
sum(numeric(0))
andprod(numeric(0))
should be of length 1 (or give an error). It is convenient thatsum(c(x, y))
is equal tosum(x) + sum(y)
and thatprod(c(x, y))
is equal toprod(x) * prod(y)
, which motivates makingsum(numeric(0))
give 0 andprod(numeric(0))
give 1.
-
Code that inadvertently (or perhaps intentionally) sums empty sets gives sensible results, e.g., (from Duncan Murdoch):
> x <- 1:10 > sum(x) [1] 55 > sum(x[x>5]) [1] 40 > sum(x[x>10]) [1] 0
-
The general principle is that a function made by ‘reducing’ a vector with an associative binary operator, when applied to an empty vector, gives the identity element for the operator. E.g., the identity element for
+
is zero, for*
is one, for AND is TRUE. (from Thomas Lumely on R-help). -
Here’s an excerpt from what Wikipedia says about Operations on the empty set:
Operations performed on the empty set (as a set of things to be operated upon) can also be confusing. (Such operations are nullary operations.) For example, the sum of the elements of the empty set is zero, but the product of the elements of the empty set is one (see empty product). This may seem odd, since there are no elements of the empty set, so how could it matter whether they are added or multiplied (since “they” do not exist)? Ultimately, the results of these operations say more about the operation in question than about the empty set. For instance, notice that zero is the identity element for addition, and one is the identity element for multiplication. (http://en.wikipedia.org/wiki/Empty_set)
-
An additional example of surprising behavior when dealing with logical comparisons of empty sets (from Marc Schwartz):
> all(c(NA, NA, NA) > NA, na.rm = TRUE) [1] TRUE
By evaluating the logical comparisons within the parens and then applying all() to the result, we get:
> x <- c(NA, NA, NA) > NA > x [1] NA NA NA > x <- x[!is.na(x)] # remove NA's from the result > x logical(0) > all(logical(0)) [1] TRUE
logical(0)
is an empty set, thus is TRUE.
To expand on the comments from the Wikipedia page on Empty Sets referenced above:
The empty set is not the same thing as nothing; it is a set with nothing inside it, and a set is something. This often causes difficulty among those who first encounter it. It may be helpful to think of a set as a bag containing its elements; an empty bag may be empty, but the bag itself certainly exists.
By the definition of subset, the empty set is a subset of any set A, as every element x of {} belongs to A. If it is not true that every element of {} is in A, there must be at least one element of {} that is not present in A. Since there are no elements of {} at all, there is no element of {} that is not in A, leading us to conclude that every element of {} is in A and that {} is a subset of A. Any statement that begins “for every element of {}” is not making any substantive claim; it is a vacuous truth. This is often paraphrased as “everything is true of the elements of the empty set.”
Another example of interest:
R> sum(NA) [1] NA R> sum(NA, na.rm=TRUE) [1] 0
The latter summation is the sum of an empty set and is (by the above rules) 0.