R Objects DataTypes - gizotso/R GitHub Wiki

http://dbarneche.github.io/2014-12-11-ufsc/lessons/01-intro_r/data-structures.html https://www.programiz.com/r-programming/data-frame https://www.slideshare.net/TomsJAragn/understanding-r-for-epidemiologists https://www.slideshare.net/sohomg/r-programming-basic-advanced

https://www.slideshare.net/RsquaredIn/r-programming-variables-data-types

http://rpy.sourceforge.net/rpy2/doc-2.1/html/robjects.html rpy2

R Objects (data structures)

R Language Definition

  • str(object): object structure summary
  • dput(object): Writes an ASCII text representation of an R object
  • typeof(object): R internal object "type". ~storage.mode()
    • mode(object): object "type" in the sense of Becker, Chambers & Wilks (1988))
    • storage.mode: storage mode of an object in the sense of Becker et al. (1988). (often same than mode)
  • class(object): R is object oriented, everything is an object. class() returns the names of the classes from which the object inherits.
  • View(object) : opens object viewer
  • edit(object) : opens object editor (<=> data.entry(obj))

Vectors

Vector: 1-D array storing data elements of same mode (numeric, complex, logical, character or raw). Single number 33 or strings "toto" are still vectors of length 1. There are no more basic types in R which explains we call them atomic vectors.

R has 6 basic ('atomic') vector types :

typeof mode storage.mode
logical logical logical
integer numeric integer
double numeric double
complex complex complex
character character character
raw raw raw
v = 28.4   # mode(v) = [1] "numeric", typeof(v) = [1] "double"
i = 4L     # typeof(i) = [1] "integer"
s = "abc"  # mode(s) = [1] "character"
b = TRUE   # mode(b) = [1] "logical"

v = c(1, 2, 3)
##  [1] 1 2 3
length(v)
## [1] 3

names(v) = c('v1','v2','v3')
> v
v1 v2 v3
 1  2  3

Constructors

v = character(4) ##[1] "" "" "" ""
v = logical(2)    ##[1] FALSE FALSE

v = numeric(3)
v = vector(mode="numeric", 3)
v = vector("numeric", 3)
#  [1] 0 0 0

i = 1:3 #typeof(i): integer
##  [1] 1 2 3

rep(1, 5)
## [1] 1 1 1 1 1

seq(1, 5, .5)
##  [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
ip = installed.packages()[,c(1,3:4)]
> colnames(ip)
[1] "Package"  "Version"  "Priority"

> attributes(ip)$dim
[1] 150   3

> attributes(ip)$dimnames
[1](/gizotso/R/wiki/1)
  [1] "acepack"       "assertthat"    "backports"     "base64enc"     "BH"            "bindr"         "bindrcpp"     
  ...
  [148] "tools"         "translations"  "utils"        

[2](/gizotso/R/wiki/2)
[1] "Package"  "Version"  "Priority"

# removing row names
rownames(ip) = NULL
numeric Integer character Logical Complex
x = 12 i = 4L s = 'dude' b = FALSE z = 3 + 2i
Values 2.1e23, 3.14 Re(z), Im(z)
Constructor numeric(1) integer(1) character(1) logical(1) complex(1)
[1] 0 [1] 0 [1] "" [1] FALSE [1] 0+0i
Type checking is.numeric() is.integer() is.character() is.logical() is.complex()
mode() [1] "numeric" [1] "numeric" [1] "character" [1] "logical" [1] "complex"
Object Type : typeof() or storage.mode() [1] "double" [1] "integer" [1] "character" [1] "logical"
class() [1] "numeric" [1] "integer" [1] "character" [1] "logical" [1] "complex"
length() [1] 1 [1] 1 [1] 1 [1] 1 [1] 1
nchar() [1] 1 [1] 1 [1] 4 [1] 5 [1] 4

Special Values

  • Constants: pi, letters, LETTERS, month.abb, month.name
  • NA (Not Available: missing value / empty)
    • x = NA
    • is.na(x)
  • Inf (infinite): Infinite. ex: -5/0 ## [1] -Inf
    • is.infinite(x), is.finite(x)
  • NaN: Not A Number. ex: 1/0 -1/0 ## [1] NaN
    • is.nan(x)
  • NULL
getOption("digits") ## 7 is default
options(digits = 4)
pi
is.na(NA) ## [1] TRUE
NA == 1   ## NA
NA == NA  ## NA

anyNA(x) ## any NA value found TRUE-FALSE
na.omit(df) ## remove rows with missing values (for any var)

df <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA))
##   x  y
## 1 1  0
## 2 2 10
## 3 3 NA
na.omit(df)
##   x  y
## 1 1  0
## 2 2 10

Remove NA from a Vector

x = c(1,2,NA,4)
x[!is.na(x)]
which(is.na(x)) # [1] 3

Warning with NA

x <- c(1,2,3,4,5,6,NA,8,9,10)
mean(x) ##[1] NA
mean(x, na.rm = TRUE) ##[1] 5.33

Vector Constructors

x<-1:10 ## x [1] 1 2 3 4 5 6 7 8 9 10

x <- -3:4    ## [1] -3 -2 -1  0  1  2  3  4
x <- 5:3     ## [1] 5 4 3
x <- 1:10-1  ## <=> (1:10)-1 = [1] 0 1 2 3 4 5 6 7 8 9

Seq and sequenced

seq(1, 2, 0.25)             ##[1] 1.00 1.25 1.50 1.75 2.00
seq(length=9, from=1, to=5) ##[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
sequence(2:4)    ##[1] 1 2 1 2 3 1 2 3 4
sequence(c(2,4)) ##idem

Repetitions: rep(x, n): x is repeated n times

rep(1,   10)     ##[1] 1 1 1 1 1 1 1 1 1 1
rep(1:3, 2)      ##[1] 1 2 3 1 2 3
rep(1:3, 1:3)    ##[1] 1 2 2 3 3 3
x<-1:5; rep(x,3) ## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

Vectors: Accessing elements

  • x[i]
  • x[c(2,4,10)]
  • x[1:10]

Vectors: MOdify elements

  • x[1]<-0: replace first vector element's value by 0
  • x[-1:-5]: removes elements 1-5

Vector Conversion

to numeric

as.numeric(expr)

  • FALSE-> 0
  • TRUE-> 1
  • "3"-> 3
  • "foo"-> NA

implicit Conversion

  • 5 + TRUE -> 6

to integer: as.integer(expr)

  • 3.14-> 3

to character: as.character(expr)

  • 12 -> "12"
  • FALSE -> "FALSE"
  • NA -> NA

coercicion v <- c(1, 2, 3, 4, "Pi")

[1] "1" "2" "3" "4" "Pi"

to logical: as.logical(expr)

  • 0 -> FALSE
  • Numeric ≠ 0 -> TRUE
  • NA -> NA
  • "FALSE", "F" > FALSE
  • "TRUE","T" -> TRUE
  • Others strings -> NA

Vectore to Array:

adding a dimension attribute to a vector will change object's class to matrix.

v = c(1,2,3,4)
dim(v)
## null

# set de la dim
dim(v) <- c(2,2)
v
##         [,1] [,2]
## [2,]    2    4
## [1,]    1    3

attributes(v)
## $dim

class(v)
## [1] "matrix"

This is equivalent to more simply v = matrix(v, 2, 2)

Factor/Ordered (Vector)

Factors are special vectors to handle categorical data. A factor includes values of associated categorical variable but also different levels allowed. Ordered inherits from factor.

factor(x, levels = sort(unique(x), na.last = TRUE)
        , labels = levels, exclude = NA, ordered = is.ordered(x))
  • class: factor
  • typeof: integer
  • mode: numeric
  • levels(): get levels attribute. <=> attributes(f)$levels
gender <- factor(c("male", "female", "female", "male"))

## [1] male   female female male  
## Levels: female male
of = ordered(4:1)
## [1] 4 3 2 1
## Levels: 1 < 2 < 3 < 4

class(of)
## [1] "ordered" "factor"
attributes(gender)
## $levels
## [1] "female" "male"  
##
## $class
## [1] "factor"

levels(gender)
## [1] "female" "male"
x = c(0, 1, 0, 1, 0)
# 0 -> Non, 1-> Oui

factor(x, levels=c(0,1), labels=c('Non','Oui') )
## [1] Non Oui Non Oui Non
## Levels: Non Oui
factor(1:3)
factor(1:3, labels=c("A", "B", "C"))
## [1] A B C  Levels: A B C

factor(1:3, levels=1:5)
## [1] 1 2 3  Levels: 1 2 3 4 5

f = factor(c(2, 4), levels=2:5)
levels(f) ## [1] "2" "3" "4" "5"						  
as.numeric(f) #[1] 1 2
f = factor(substring("hello", 1:5, 1:5), levels = letters)
## [1] h e l l o
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
as.integer(f)  
## [1]  8 5 12 1

factor(f) #<=> f[, drop = TRUE]
## [1] h e l l o
## Levels: e h l o

Factor Constructors

  • gl(k, n) : Generate Levels (k : #levels/class of factor n)
f = gl(3, 5)
## [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
## Levels: 1 2 3
s = gl(2, 4, label=c("M", "F"))
## [1] M M M M F F F F
## Levels: M F

10 alternating 1s and 2s

f = gl(2, 1, 10)
## [1] 1 2 1 2 1 2 1 2 1 2
## Levels: 1 2

alternating pairs of 1s and 2s

f = gl(2, 2, 10)
## [1] 1 1 2 2 1 1 2 2 1 1

Conversion

g = unclass(gender)
## [1] 2 1 1 2
## attr(,"levels")
## [1] "female" "male"

attributes(g) <- NULL
g
## [1] 2 1 1 2
  • as.character(gender): [1] "male" "female" "female" "male"
  • as.numeric(gender): [1] 2 1 1 2

Conversion of numeric factor : preserving numeric value Tip:

as.numeric(factor(c(1,3)))
## [1] 1 2
as.numeric(as.character(factor(c(1,3))))
## [1] 1 3

Arrays

Array: n-D Array storing data of same mode. n>2, Class: array Matrix: 2-D Array storing data of same mode. Class: matrix Matrix are simply a Vector with a Dimension added to it. (But a vector is not a one-col or one-row matrix)

Array

A = array(1:8, dim = c(2, 2, 2)): 3D array

> A
, , 1

     [,1] [,2]
[1,]    1    3
[2,]    2    4

, , 2

     [,1] [,2]
[1,]    5    7
[2,]    6    8
dimnames(A) <- list(c("a", "b"), c("c", "d"), c("e", "f"))
A[,,"e"]

Matrix (2-D Array)

matrix(data = 1, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL) <=> matrix(1)

      [,1]
[1,]     1

byrow=FALSE indicates that the matrix should be filled by columns (the default)

  • dim(M) = nrow(M) x ncol(M)
  • length(M): nb elements
  • dimnnames(M): list(colnames, rownames)
M = array(1:6, c(2,3)) # by default fill by column
M = matrix(1:6, ncol=3)
M = matrix(data=1:6, nr=2, nc=3)
M = matrix(1:6, 2,3)
> M
        [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

colnames(M) = paste("y", 1:3, sep="") # <=> c("y1", "y2", "y3")
rownames(M) = c("x1", "x2")
> M
    y1 y2 y3
x1  1  3  5
x2  2  4  6

> dimnames(M)
## [1](/gizotso/R/wiki/1)
## [1] "x1" "x2"
##
## [2](/gizotso/R/wiki/2)
## [1] "y1" "y2" "y3"

> M[, 2]
> M[, "y2"]
## [1] 3 4

Sub-Matrix of 1 column: drop=FALSE prevents casting to a vector

 M[, 2, drop=FALSE]
##      [,1]
## [1,]    3
## [2,]    4
m1 = matrix(2, nr = 1, nc = 2)
m2 = matrix(2, nr = 2, nc = 2)
##      [,1] [,2]
## [1,]    2    2
## [2,]    2    2

cbind(m1, m2) # concat cols
rbind(m1, m2) # concat rows

Matrix Constructors

M = cbind(1:5, rnorm(5))
colnames(M) = c("i", "norm")
> M
     i        norm
[1,] 1 -2.10724485
[2,] 2  0.41537749
[3,] 3 -0.50847283
[4,] 4 -0.08884656
[5,] 5  0.29271269
# rbind(x,y) matrice combinant par lignes les éléments x et y
M <- rbind(1:5, seq(2, 10, by=2))
rownames(M) = c("x1","x2")
> M
##    [,1] [,2] [,3] [,4] [,5]
## x1    1    2    3    4    5
## x2    2    4    6    8   10
M = diag(3)

M = diag(1:3)
> M
        [,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    2    0
[3,]    0    0    3

diag(Z) = 2 #replace values on the diag
> diag(3, nr = 2, nc = 3)
##      [,1] [,2] [,3]
## [1,]    3    0    0
## [2,]    0    3    0

Array: Accessing elements

M = matrix(1:6, ncol=3)

  • M[i, j]: element at row i, col j
  • M[, j]: col j
  • M[i, ]: row i
  • M[5]: 5th element
  • M[,c(1,3)] columns 1 and 3
  • M["rname", ] row named "rname"
  • M[, "name"] column named "name"

Array: Modify elements

  • M[-1, ]: removes first line
  • M[, -2]: removes 2nd column

Conversion: Matrix/Array to Vector

M = diag(3)
> as.vector(M)
## [1] 1 0 0 0 1 0 0 0
c(M)
## [1] 1 0 0 0 1 0 0 0 1
# Removing dimension attribute casts matrix to vector
> dim(M)
[1] 3 3
dim(M) = NULL
> M
[1] 1 0 0 0 1 0 0 0 1

Conversion: Matrix/Array to DataFrame

as.matrix( data.frame(x=(1:4), n=10) )

Lists

List: ordered collection of objects of possibly different types. (elements of mixed data types - Heterogeneous data structure) Lists are also referred to as recursive vectors as a list can contain other lists. Lists indexing is not like that of vectors & matrix.

L = list('foo', c(1, 2))
> L
[1](/gizotso/R/wiki/1)
[1] "foo"

[2](/gizotso/R/wiki/2)
[1] 1 2

names(L) <- c('txt', 'val')

Constructor

L = list(txt = 'foo', val = c(1,2))
> L
$txt
[1] "foo"

$val
[1] 1 2
x = 1:4; y = 7:9
L = list(x, y);

address <- list("Larry Pace", "102 San Mateo Dr.", "Anderson", "SC", 29625)
lst <- list(numbers = c(1, 2), logical = TRUE, strings = c("a", "b", "c"))

Accessing List elements

L[1]: Sub-List (still a list)

$txt [1] "foo"

  • L$val or L["val"] or L[2](/gizotso/R/wiki/2) : 2nd element of the list >[1] 1 2
  • L$val[2] or L[2](/gizotso/R/wiki/2)[2] : 2nd element of the 2nd list element >[1] 2

Conversion: List to Vector

unlist() : Returns a Vector (of same mode) with named elements. Names can be removed using names(v)=NULL or unname().

L = list(txt = 'foo', val = c(1,2))
> unlist(L)
## txt  val1  val2
## "foo"   "1"   "2"

>unlist(unname(L))
## [1] "foo" "1"   "2"

Conversion: List to Matrix

L = list(id=123, color='red')
> matrix(unlist(L), ncol=length(unlist(L)), nrow=1)
     [,1]  [,2]
[1,] "123" "red"

Data Frames

Data Frame: 2D-Array where different columns can store data from different mode. Data frame is a special List which elements are vectors of equal length. df = data.frame(): Data Frame with 0 columns and 0 rows.

  • dim(df) = nrow() x ncol()
  • ncol(df) = length(df)
  • nrow(df)
  • colnames(df) = names(df): colname attribute is created by default
  • rownames(df)
  • str(df)
df = data.frame(1,2,3)
> df
  X1 X2 X3
1  1  2  3

names(df) = c("x","y","z")
<=> data.frame(x=1, y=2, z=3)
# data frame from vectors
df = data.frame( x = c(1, 2, 3)
                ,y = c(0, 10, NA)
               )
> df
  x  y
1 1  0
2 2 10
3 3 NA
# nb obs for col x
> length(df$x)
[1] 3

#  nb NA for col y
> sum(is.na(df$y))
[1] 1

#nb not NA for col y
> sum(!is.na(df$y))
[1] 1

> table(df$y, useNA="always")
0   10   <NA>
1    1    1
# data frame from vectors of different length (repetition happens)
x <- 1:4; n <- 10;
df = data.frame(x, n)
> df
##   x  n
## 1 1 10
## 2 2 10
## 3 3 10
## 4 4 10

# rename 2nd col into y
colnames(df)[2] = "y"  
L3 <- LETTERS[1:3]  ##[1] "A" "B" "C"
df <- data.frame(x = 1, y = 1:5, fac = sample(L3, 5, replace = TRUE))
> df
##   x y fac
## 1 1 1   A
## 2 1 2   A
## 3 1 3   C
## 4 1 4   C
## 5 1 5   B

Data Frame Constructors

data frame from the editor

Min_Wage <- data.frame(Year = numeric(), Value = numeric())
Min_Wage <- edit(Min_Wage)

expand.grid() : créer un data.frame avec toutes les combinaisons des vecteurs ou facteurs donnés comme arguments

expand.grid(h=c(60,80), w=c(100, 300), sex=c("M", "F"))
##    h   w sex
## 1 60 100   M
## 2 80 100   M
## 3 60 300   M
## 4 80 300   M
## 5 60 100   F
## 6 80 100   F
## 7 60 300   F
## 8 80 300   F

Accessing data

  • df$y : column named y (<=> df[, "y"] or df[, 2]), returns a vector
    • ##[1] 10 10 10 10
  • df["y"]: returns a data.frame.

access cols by name df[c("x1","x2")] df[,c(1,2)] subset(df, select=c(1,2))

Conversion DataFrame to Vector

> as.matrix(df)
     x  y
[1,] 1  0
[2,] 2 10
[3,] 3 NA

R objects and Classes

https://www.programiz.com/r-programming/object-class-introduction http://docs.renjin.org/en/latest/library/moving-data-between-java-and-r-code.html https://stackoverflow.com/questions/24052158/in-r-why-is-matrix-a-class-but-a-vector-is-not

NULL: NULL symbol: a variable name (mode: "name") pairlist: a pairlist object (similar to list, mainly internal) closure: a function environment: an environment promise: an object used to implement lazy evaluation language: an R language construct (mode: "(" or "call") special: an internal function that does not evaluate its arguments (mode: "function") builtin: an internal function that evaluates its arguments (mode: "function") char: a ‘scalar’ string object (internal only) *** logical: a vector containing logical values integer: a vector containing integer values (mode: "numeric") double: a vector containing real values (mode: "numeric") complex: a vector containing complex values character: a vector containing character values ...: the special variable length argument *** any: a special type that matches all types: there are no objects of this type expression: an expression object list: a list bytecode: byte code (internal only) *** externalptr: an external pointer object weakref: a weak reference object raw: a vector containing bytes S4: an S4 object which is not a simple object

Modes have the same set of names as types (see typeof) except listed mode above.

 Vector Array / Matrix(2D) List Data Frame
v = c(1, 2, 3) M = array(1:6, c(2,3)) L = list('foo', c(1, 2)) df = data.frame(x = c(1, 2, 3), y = c(0, 10, NA) )
Type Checking is.vector() is.matrix(M), is.array(A) is.list() is.data.frame()
class() [1] "numeric" [1] "matrix" [1] "list" [1] "data.frame"
mode() [1] "numeric" [1] "numeric" [1] "list" [1] "list"
typeof() [1] "double" [1] "integer" [1] "list" [1] "list"
length() [1] 3 [1] 6 [1] 2 [1] 2 #nb cols
nrow() NULL [1] 2 NULL [1] 3 #nb rows
dim() NULL [1] 2 3 NULL [1] 3 2 # nrows, ncols
dimnames() NULL dimnames(M) NULL
colnames(A)dimnames(A)[2](/gizotso/R/wiki/2) x colnames(M) = paste("y", 1:3, sep="") x [1] "x" "y"
rownames(A)dimnames(A)[1](/gizotso/R/wiki/1) x rownames(M) = c("x1","x2") x [1] 1 2 3
names() (unname() to remove names) names(v) = c('v1','v2','v3') names(L) = c('txt', 'val') names(df) = c('x','y')
attributes(): list object attributes $names $dim, $dimnames $names $names, $row.names, $class
Conversion as.vector(M) as.matrix(df) as.list() as.dataframe(M)