R Programming Efficiently - gizotso/R GitHub Wiki

R i works best with vector processing, not loops !

R supports iteration through vectors, but not through nonvector sets. We have to use indirect (but still convenient) means of accomplishing such looping. The lapply() function and the get() functions, for example, make this possible.

apply, lapply

Explicit Loops are generally slow, and it is better to avoid them when it is possible.

High Order Function : applies a given function to each element of a list - map function

apply() can apply a function to elements of a matrix or an array. This may be the rows of a matrix (1) or the columns (2).
lapply() applies a function to each column of a dataframe and returns a list.
sapply() is similar but the output is simplified. It may be a vector or a matrix depending on the function.
tapply() applies the function for each level of a factor. -> apply function BY
vapply ?
see also aggregate()

apply

apply(): agit sur les lignes et/ou les colonnes d?une matrice

# apply(X, MARGIN, FUN,...), o? X est la matrice, MARGIN indique si l?action doit ?tre appliqu?e sur les lignes (1), les colonnes (2) ou les deux (c(1, 2)), FUN est la fonction (ou l?op?rateur mais dans ce cas il doit ?etre sp?cifi? entre guillemets doubles) doit ?tre sp?cifi? entre guillemets doubles)
x <- rnorm(10, -5, 0.1)
y <- rnorm(10, 5, 2)
X <- cbind(x, y) # les colonnes de la matrice X gardent les noms "x" et "y"

apply(X, 2, mean) # apply sur les colonnes
	# x         y
	# -4.995027  5.455004
mean(x) ##[1] -4.995027
mean(y) ##[1] 5.455004
lapply(X, mean) # will not work properly
lapply(data.frame(X), mean)
	# $x
	# [1] -5.001045

	# $y
	# [1] 4.74219

lapply() va agir sur une liste

# lapply() va agir sur une liste
# variante sapply()
forms <- list(y ~ x, y ~ poly(x, 2))
lapply(forms, lm)

sapply()

# sapply : getting the number of unique values per data frame column
d1 <- data.frame(v1 = c(1,2,3,2), v2 = c("a","a","b","b"))
d1_nb_uniques = sapply(d1, function(x)length(unique(x))) # vector
d1_nb_uniques = rapply(d1, function(x)length(unique(x))) # vector
d1_nb_uniques = lapply(d1, function(x)length(unique(x))) # list

N = 10
x1 = rnorm(N); x2 = rnorm(N) + x1 + 1; male = rbinom(N,1,.48); y = 1 + x1 + x2 + male + rnorm(N)
mydat = data.frame(y,x1,x2,male)
> mydat
            y         x1          x2 male
1   1.6120728  0.4086131  0.04585462    0
2   5.6836625  0.8569270  3.06609627    1
3   6.0125131  1.6433535  2.00352930    1
4   1.1058325 -0.4295005 -0.68948938    1
5   4.6543794  2.1458691  2.31930332    0
6   2.7312134  0.1874176  0.39057549    1
7   5.7486838  1.1639011  3.85327129    0
8   3.5423505  0.4021064  2.23848358    0
9   0.4873023 -0.7594786  0.52266250    0
10 -1.5319902 -1.0027291 -0.33610895    0

lapply(mydat,mean) # returns a list
	# $y
	# [1] 3.004602

	# $x1
	# [1] 0.461648

	# $x2
	# [1] 1.341418

	# $male
	# [1] 0.4

apply(mydat,2,mean) # applies the function to each column, returns a vector
       y       x1       x2     male
3.004602 0.461648 1.341418 0.400000

sapply(mydat,mean)
       y       x1       x2     male
3.004602 0.461648 1.341418 0.400000
apply(mydat,1,mean) # applies the function to each row   ##[1]  1.1654  2.8347 -0.9728  0.6512 -0.0696  3.9206 -0.2492  3.1060  2.0478  0.5116
tapply(mydat$y, mydat$male, mean) # applies the function to each level of the factor > mean(y) BY male
    0     1
1.040 5.454

x = matrix(round(rnorm(100)),10,10)
col.sums = apply(x, 2, sum)
row.sums = apply(x, 1, sum)

Performance, vetorized loops

Using plyr : (next iteration of plyr is dplyr)

http://blog.revolutionanalytics.com/2009/12/why-use-plyr.html

library(microbenchmark)

Iterators

Loops in R are generally slow. iterators may be more efficient than loops. See this entry in the Revolution Computing Blogs http://blog.revolution-computing.com/2009/07/counting-with-iterators.html

moved to http://blog.revolutionanalytics.com/2009/07/counting-with-iterators.html