Hadley Wickham Advanced R - statnet/computing GitHub Wiki
Notes on Hadley Wickham - Advanced R
[Hadley Wickham's Advanced R Book] (http://adv-r.had.co.nz/)
Data structures
Vectors
- Atomic vectors
- Vectors are all elements of the same type, list can contain elements of different types
- Three properties are type
typeof()
, lengthlength()
, and attributesattributes()
is.atomic(x) || is.list(x)
are some of the tests that can be run- Logical, integer, double (numeric), and character, as well as complex and raw
- Forced coercion of a vector into another type with
as.XX()
- Lists are recursive vectors
list(1, 3)
- Include data frames and linear models objects
Attributes
- Metadata about objects
- Function as named lists
attributes()
orattr(y, "my_attribute")
- Names, dimensions, and class are most important
- Names:
x <- ...
names(x) <- c("a", "b", "c")
- Great for subsetting
- Names:
- Factors
- Built on top of integer vectors using
class()
andlevels()
sex_factor <- factor(sex_char, levels = c("m", "f"))
- Convert factors to character vectors if you need string-like behavior
- Built on top of integer vectors using
Matrices and Arrays
- Matrix is a type of array with dimensions
a <- matrix(1:6, ncol = 3, nrow = 2)
b <- array(1:12, c(2, 3, 2))
c <- 1:6
dim(c) <- c(3, 2)
Data frames
- Two-dimensional
- Can be combined using
cbind()
andrbind()
, but don'tcbind()
vectors together (creates a matrix)
df <- data.frame(x = 1:3, y = c("a", "b", "c"))
Subsetting
Data types
-
Atomic vectors
- Positive integers return elements at the specified positions
x[c(3,1)]
- Negative integers omit elements at the specified positions
x[-c(3,1)]
- Logical vectors select where corresponding value is TRUE
x[x>3]
- Nothing returns the original vector
x[]
- Zero returns a zero-length vector
x[]
- Can also use character vectors - exact matching with [
z[c("a", "d")]
- Positive integers return elements at the specified positions
-
Lists
- Same as a vector
[]
returns a list[[
and$
let you pull components out of the list
- Same as a vector
-
Matrices and Arrays
- Can be subset with multiple vectors, with a single vector, or with a matrix
- Can be subset with multiple vectors, with a single vector, or with a matrix
a <- matrix(1:9, nrow=3)`
colnames(a) <- c("A", "B", "C")
a[c(T, F, T), c("B", "A")
vals <- outer(1:5, 1:5, FUN = "paste", sep = ","))
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] "1,1" "1,2" "1,3" "1,4" "1,5"
#> [2,] "2,1" "2,2" "2,3" "2,4" "2,5"
#> [3,] "3,1" "3,2" "3,3" "3,4" "3,5"
#> [4,] "4,1" "4,2" "4,3" "4,4" "4,5"
#> [5,] "5,1" "5,2" "5,3" "5,4" "5,5"
vals[c(4, 15)]
vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")
select <- matrix(ncol = 2, byrow = TRUE, c(
1, 1,
3, 1,
2, 4
))
vals[select]
#> [1] "1,1" "3,1" "2,4"
* Data frames
* If you subset with a single vector, behave like lists, subsetting with 2 vectors makes them behave like matrices
df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3]) df[c(1, 3), ]
Like a list:
df[c("x", "z")]
Like a matrix
df[, c("x", "z")]
* S3 objects use the same techniques as above
* S4 objects use `@` (equivalent to `$`) and `slot()` (equivalent to `[[`)
#### Subsetting operators
* `[[` returns a single value, while `[` returns a list
* `x[4:6]` is the train cars # 4-6, `x[5](/statnet/computing/wiki/5)` is the object in car 5
* Can also extract columns from data frames `mtcars[1](/statnet/computing/wiki/1)`
* *Simplifying* returns the simplest possible data structure to represent the output, while *preserving* keeps the structure of the output the same as the input
Simplifying Preserving
Vector x1 x[1] List x1 x[1] Factor x[1:4, drop = T] x[1:4] Array x[1, ] or x[, 1] x[1, , drop = F] or x[, 1, drop = F] Data frame x[, 1] or x1 x[, 1, drop = F] or x[1]
* Simplifying
* Vector: removes names
x <- c(a = 1, b = 2) x[1] #> 1 x1 #> 1
* List: return object inside list, not single element list
y <- list(a = 1, b = 2) str(y[1]) #> List of 1 #> $ a: num 1 str(y1) #> num 1
* Factor: drops unused levels
z <- factor(c("a", "b")) z[1] #> [1] a #> Levels: a b z[1, drop = TRUE] #> [1] a #> Levels: a
* Matrix or array: If any dimension has length 1, it is dropped
a <- matrix(1:4, nrow = 2) a[1, , drop = FALSE] #> 1 3 a[1, ] #> [1] 1 3
* Data frame: If output is a single column, returns a vector instead of a data fram
df <- data.frame(a = 1:2, b = 1:2) str(df[1]) #> 'data.frame': 2 obs. of 1 variable: #> $ a: int 1 2 str(df1) #> int [1:2] 1 2 str(df[, "a", drop = FALSE]) #> $ a: int 1 2 str(df[, "a"]) #> int [1:2] 1 2
* `$`: shorthand operator for `[[` that uses partial matching
* Used to access variables in a data frame
* *Not to be used* when name of a column is stored in a variable, rather use `[[`
* `var <- "cyl"` `mtcars$var` yields NULL, use `mtcars[var](/statnet/computing/wiki/var)`
* Missing and out of bounds indices (5th element of a length four vector)
#### Subsetting and assignment
x <- 1:5
x[c(1, 2)] <- 2:3
df <- data.frame(a = c(1, 10, NA))
df$a[df$a < 5] <- 0
mtcars[] <- lapply(mtcars, as.integer) ## Stays a data frame
mtcars <- lapply(mtcars, as.integer) ## Becomes a list
x <- list(a = 1, b = 2)
x"b" <- NULL
#### Applications
* Duplicate info table to have a row for each value in grades
grades <- c(1, 2, 2, 3, 1) info <- data.frame( grade = 3:1, desc = c("Excellent", "Good", "Poor"), fail = c(F, F, T) ) id <- match(grades, info$grade) info[id, ] rownames(info) <- info$grade info[as.character(grades), ]
* Random samples/bootstrap
* Use integer indices to perform random sampling or bootstrapping of a vector or data frame `sample()`
df <- data.frame(x = rep(1:3, each = 2), y = 6:1, z = letters[1:6]) set.seed(10) df[sample(nrow(df)), ] # Randomly reorder df[sample(nrow(df), 3), ] # Select 3 random rows df[sample(nrow(df), 6, rep = T), ] # Select 6 bootstrap replicates
* Ordering
x <- c("b", "c", "a") order(x) x[order(x)] df2 <- df[sample(nrow(df)), 3:1] df2[order(df2$x), ] df2[, order(names(df2))]
* Expanding aggregated counts
df <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1)) rep(1:nrow(df), df$n) df[rep(1:nrow(df), df$n), ]