Hadley Wickham Advanced R - statnet/computing GitHub Wiki

Notes on Hadley Wickham - Advanced R

[Hadley Wickham's Advanced R Book] (http://adv-r.had.co.nz/)

Data structures

Vectors

Atomic vectors
- Vectors are all elements of the same type, list can contain elements of different types
- Three properties are type typeof(), length length(), and attributes attributes()
- is.atomic(x) || is.list(x) are some of the tests that can be run
- Logical, integer, double (numeric), and character, as well as complex and raw
- Forced coercion of a vector into another type with as.XX()
Lists are recursive vectors
- list(1, 3)
- Include data frames and linear models objects

Attributes

Metadata about objects
- Function as named lists
- attributes() or attr(y, "my_attribute")
Names, dimensions, and class are most important
- Names: x <- ... names(x) <- c("a", "b", "c")
  - Great for subsetting
Factors
- Built on top of integer vectors using class() and levels()
- sex_factor <- factor(sex_char, levels = c("m", "f"))
- Convert factors to character vectors if you need string-like behavior

Matrices and Arrays

Matrix is a type of array with dimensions

a <- matrix(1:6, ncol = 3, nrow = 2)
b <- array(1:12, c(2, 3, 2))
c <- 1:6
dim(c) <- c(3, 2)

Data frames

Two-dimensional
Can be combined using cbind() and rbind(), but don't cbind() vectors together (creates a matrix)

df <- data.frame(x = 1:3, y = c("a", "b", "c"))

Subsetting

Data types

Atomic vectors
- Positive integers return elements at the specified positions
  - x[c(3,1)]
- Negative integers omit elements at the specified positions
  - x[-c(3,1)]
- Logical vectors select where corresponding value is TRUE
  - x[x>3]
- Nothing returns the original vector
  - x[]
- Zero returns a zero-length vector
  - x[]
- Can also use character vectors - exact matching with [
  - z[c("a", "d")]
Lists
- Same as a vector
  - [] returns a list
  - [[ and $ let you pull components out of the list
Matrices and Arrays
- Can be subset with multiple vectors, with a single vector, or with a matrix

a <- matrix(1:9, nrow=3)` colnames(a) <- c("A", "B", "C") a[c(T, F, T), c("B", "A") vals <- outer(1:5, 1:5, FUN = "paste", sep = ",")) #> [,1] [,2] [,3] [,4] [,5] #> [1,] "1,1" "1,2" "1,3" "1,4" "1,5" #> [2,] "2,1" "2,2" "2,3" "2,4" "2,5" #> [3,] "3,1" "3,2" "3,3" "3,4" "3,5" #> [4,] "4,1" "4,2" "4,3" "4,4" "4,5" #> [5,] "5,1" "5,2" "5,3" "5,4" "5,5" vals[c(4, 15)] vals <- outer(1:5, 1:5, FUN = "paste", sep = ",") select <- matrix(ncol = 2, byrow = TRUE, c( 1, 1, 3, 1, 2, 4 )) vals[select] #> [1] "1,1" "3,1" "2,4"


* Data frames
* If you subset with a single vector, behave like lists, subsetting with 2 vectors makes them behave like matrices

df <- data.frame(x = 1:3, y = 3:1, z = letters[1:3]) df[c(1, 3), ]

Like a list:

df[c("x", "z")]

Like a matrix

df[, c("x", "z")]

* S3 objects use the same techniques as above
* S4 objects use `@` (equivalent to `$`) and `slot()` (equivalent to `[[`)

#### Subsetting operators
* `[[` returns a single value, while `[` returns a list 
* `x[4:6]` is the train cars # 4-6, `x[5](/statnet/computing/wiki/5)` is the object in car 5
* Can also extract columns from data frames `mtcars[1](/statnet/computing/wiki/1)`
* *Simplifying* returns the simplest possible data structure to represent the output, while *preserving* keeps the structure of the output the same as the input

       Simplifying 	Preserving

Vector x1 x[1] List x1 x[1] Factor x[1:4, drop = T] x[1:4] Array x[1, ] or x[, 1] x[1, , drop = F] or x[, 1, drop = F] Data frame x[, 1] or x1 x[, 1, drop = F] or x[1]

* Simplifying
  * Vector: removes names

x <- c(a = 1, b = 2) x[1] #> 1 x1 #> 1

* List: return object inside list, not single element list

y <- list(a = 1, b = 2) str(y[1]) #> List of 1 #> $ a: num 1 str(y1) #> num 1

* Factor: drops unused levels

z <- factor(c("a", "b")) z[1] #> [1] a #> Levels: a b z[1, drop = TRUE] #> [1] a #> Levels: a

* Matrix or array: If any dimension has length 1, it is dropped

a <- matrix(1:4, nrow = 2) a[1, , drop = FALSE] #> 1 3 a[1, ] #> [1] 1 3

* Data frame: If output is a single column, returns a vector instead of a data fram

df <- data.frame(a = 1:2, b = 1:2) str(df[1]) #> 'data.frame': 2 obs. of 1 variable: #> $ a: int 1 2 str(df1) #> int [1:2] 1 2 str(df[, "a", drop = FALSE]) #> $ a: int 1 2 str(df[, "a"]) #> int [1:2] 1 2

* `$`: shorthand operator for `[[` that uses partial matching
* Used to access variables in a data frame
* *Not to be used* when name of a column is stored in a variable, rather use `[[`
  * `var <- "cyl"` `mtcars$var` yields NULL, use `mtcars[var](/statnet/computing/wiki/var)`
* Missing and out of bounds indices (5th element of a length four vector)

#### Subsetting and assignment

x <- 1:5 x[c(1, 2)] <- 2:3 df <- data.frame(a = c(1, 10, NA)) df$a[df$a < 5] <- 0 mtcars[] <- lapply(mtcars, as.integer) ## Stays a data frame mtcars <- lapply(mtcars, as.integer) ## Becomes a list x <- list(a = 1, b = 2) x"b" <- NULL


#### Applications
* Duplicate info table to have a row for each value in grades

grades <- c(1, 2, 2, 3, 1) info <- data.frame( grade = 3:1, desc = c("Excellent", "Good", "Poor"), fail = c(F, F, T) ) id <- match(grades, info$grade) info[id, ] rownames(info) <- info$grade info[as.character(grades), ]

* Random samples/bootstrap
  * Use integer indices to perform random sampling or bootstrapping of a vector or data frame `sample()`

df <- data.frame(x = rep(1:3, each = 2), y = 6:1, z = letters[1:6]) set.seed(10) df[sample(nrow(df)), ] # Randomly reorder df[sample(nrow(df), 3), ] # Select 3 random rows df[sample(nrow(df), 6, rep = T), ] # Select 6 bootstrap replicates

* Ordering

x <- c("b", "c", "a") order(x) x[order(x)] df2 <- df[sample(nrow(df)), 3:1] df2[order(df2$x), ] df2[, order(names(df2))]

* Expanding aggregated counts

df <- data.frame(x = c(2, 4, 1), y = c(9, 11, 6), n = c(3, 5, 1)) rep(1:nrow(df), df$n) df[rep(1:nrow(df), df$n), ]