R Objects DataTypes - gizotso/R GitHub Wiki
http://dbarneche.github.io/2014-12-11-ufsc/lessons/01-intro_r/data-structures.html https://www.programiz.com/r-programming/data-frame https://www.slideshare.net/TomsJAragn/understanding-r-for-epidemiologists https://www.slideshare.net/sohomg/r-programming-basic-advanced
https://www.slideshare.net/RsquaredIn/r-programming-variables-data-types
http://rpy.sourceforge.net/rpy2/doc-2.1/html/robjects.html
R Objects (data structures)
str(object)
: object structure summarydput(object)
: Writes an ASCII text representation of an R objecttypeof(object)
: R internal object "type". ~storage.mode()mode(object)
: object "type" in the sense of Becker, Chambers & Wilks (1988))storage.mode
: storage mode of an object in the sense of Becker et al. (1988). (often same than mode)
class(object)
: R is object oriented, everything is an object. class() returns the names of the classes from which the object inherits.View(object)
: opens object vieweredit(object)
: opens object editor (<=> data.entry(obj))
-
Data Structures in R
https://www.pinterest.fr/pin/540713498989659750/
-
https://ramnathv.github.io/pycon2014-r/learn/structures.html
-
http://fr.slideshare.net/TomsJAragn/understanding-r-for-epidemiologists
-
http://stats.stackexchange.com/questions/3212/mode-class-and-type-of-r-objects
Vectors
Vector: 1-D array storing data elements of same mode (numeric, complex, logical, character or raw). Single number 33 or strings "toto" are still vectors of length 1. There are no more basic types in R which explains we call them atomic vectors.
R has 6 basic ('atomic') vector types :
typeof | mode | storage.mode |
---|---|---|
logical | logical | logical |
integer | numeric | integer |
double | numeric | double |
complex | complex | complex |
character | character | character |
raw | raw | raw |
v = 28.4 # mode(v) = [1] "numeric", typeof(v) = [1] "double"
i = 4L # typeof(i) = [1] "integer"
s = "abc" # mode(s) = [1] "character"
b = TRUE # mode(b) = [1] "logical"
v = c(1, 2, 3)
## [1] 1 2 3
length(v)
## [1] 3
names(v) = c('v1','v2','v3')
> v
v1 v2 v3
1 2 3
Constructors
v = character(4) ##[1] "" "" "" ""
v = logical(2) ##[1] FALSE FALSE
v = numeric(3)
v = vector(mode="numeric", 3)
v = vector("numeric", 3)
# [1] 0 0 0
i = 1:3 #typeof(i): integer
## [1] 1 2 3
rep(1, 5)
## [1] 1 1 1 1 1
seq(1, 5, .5)
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
ip = installed.packages()[,c(1,3:4)]
> colnames(ip)
[1] "Package" "Version" "Priority"
> attributes(ip)$dim
[1] 150 3
> attributes(ip)$dimnames
[1](/gizotso/R/wiki/1)
[1] "acepack" "assertthat" "backports" "base64enc" "BH" "bindr" "bindrcpp"
...
[148] "tools" "translations" "utils"
[2](/gizotso/R/wiki/2)
[1] "Package" "Version" "Priority"
# removing row names
rownames(ip) = NULL
numeric | Integer | character | Logical | Complex | |
---|---|---|---|---|---|
x = 12 |
i = 4L |
s = 'dude' |
b = FALSE |
z = 3 + 2i |
|
Values | 2.1e23, 3.14 | Re(z), Im(z) | |||
Constructor | numeric(1) |
integer(1) |
character(1) |
logical(1) |
complex(1) |
[1] 0 | [1] 0 | [1] "" | [1] FALSE | [1] 0+0i | |
Type checking | is.numeric() |
is.integer() |
is.character() |
is.logical() |
is.complex() |
mode() |
[1] "numeric" | [1] "numeric" | [1] "character" | [1] "logical" | [1] "complex" |
Object Type : typeof() or storage.mode() |
[1] "double" | [1] "integer" | [1] "character" | [1] "logical" | |
class() |
[1] "numeric" | [1] "integer" | [1] "character" | [1] "logical" | [1] "complex" |
length() |
[1] 1 | [1] 1 | [1] 1 | [1] 1 | [1] 1 |
nchar() |
[1] 1 | [1] 1 | [1] 4 | [1] 5 | [1] 4 |
Special Values
- Constants: pi, letters, LETTERS, month.abb, month.name
NA
(Not Available: missing value / empty)- x = NA
- is.na(x)
Inf
(infinite): Infinite. ex: -5/0 ## [1] -Infis.infinite(x)
,is.finite(x)
NaN
: Not A Number. ex: 1/0 -1/0 ## [1] NaNis.nan(x)
NULL
getOption("digits") ## 7 is default
options(digits = 4)
pi
is.na(NA) ## [1] TRUE
NA == 1 ## NA
NA == NA ## NA
anyNA(x) ## any NA value found TRUE-FALSE
na.omit(df) ## remove rows with missing values (for any var)
df <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA))
## x y
## 1 1 0
## 2 2 10
## 3 3 NA
na.omit(df)
## x y
## 1 1 0
## 2 2 10
Remove NA from a Vector
x = c(1,2,NA,4)
x[!is.na(x)]
which(is.na(x)) # [1] 3
Warning with NA
x <- c(1,2,3,4,5,6,NA,8,9,10)
mean(x) ##[1] NA
mean(x, na.rm = TRUE) ##[1] 5.33
Vector Constructors
x<-1:10 ## x [1] 1 2 3 4 5 6 7 8 9 10
x <- -3:4 ## [1] -3 -2 -1 0 1 2 3 4
x <- 5:3 ## [1] 5 4 3
x <- 1:10-1 ## <=> (1:10)-1 = [1] 0 1 2 3 4 5 6 7 8 9
Seq and sequenced
seq(1, 2, 0.25) ##[1] 1.00 1.25 1.50 1.75 2.00
seq(length=9, from=1, to=5) ##[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
sequence(2:4) ##[1] 1 2 1 2 3 1 2 3 4
sequence(c(2,4)) ##idem
Repetitions: rep(x, n)
: x is repeated n times
rep(1, 10) ##[1] 1 1 1 1 1 1 1 1 1 1
rep(1:3, 2) ##[1] 1 2 3 1 2 3
rep(1:3, 1:3) ##[1] 1 2 2 3 3 3
x<-1:5; rep(x,3) ## [1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Vectors: Accessing elements
x[i]
x[c(2,4,10)]
x[1:10]
Vectors: MOdify elements
x[1]<-0
: replace first vector element's value by 0x[-1:-5]
: removes elements 1-5
Vector Conversion
to numeric
as.numeric(expr)
FALSE
-> 0TRUE
-> 1"3"
-> 3"foo"
-> NA
implicit Conversion
- 5 + TRUE -> 6
as.integer(expr)
to integer: 3.14
-> 3
as.character(expr)
to character: 12
-> "12"FALSE
-> "FALSE"NA
-> NA
coercicion v <- c(1, 2, 3, 4, "Pi")
[1] "1" "2" "3" "4" "Pi"
as.logical(expr)
to logical: 0
-> FALSE- Numeric ≠ 0 -> TRUE
NA
-> NA"FALSE", "F"
> FALSE"TRUE","T"
-> TRUE- Others strings -> NA
Vectore to Array:
adding a dimension attribute to a vector will change object's class to matrix.
v = c(1,2,3,4)
dim(v)
## null
# set de la dim
dim(v) <- c(2,2)
v
## [,1] [,2]
## [2,] 2 4
## [1,] 1 3
attributes(v)
## $dim
class(v)
## [1] "matrix"
This is equivalent to more simply v = matrix(v, 2, 2)
Factor/Ordered (Vector)
Factors are special vectors to handle categorical data. A factor includes values of associated categorical variable but also different levels allowed. Ordered inherits from factor.
factor(x, levels = sort(unique(x), na.last = TRUE)
, labels = levels, exclude = NA, ordered = is.ordered(x))
class
: factortypeof
: integermode
: numericlevels()
: get levels attribute. <=> attributes(f)$levels
gender <- factor(c("male", "female", "female", "male"))
## [1] male female female male
## Levels: female male
of = ordered(4:1)
## [1] 4 3 2 1
## Levels: 1 < 2 < 3 < 4
class(of)
## [1] "ordered" "factor"
attributes(gender)
## $levels
## [1] "female" "male"
##
## $class
## [1] "factor"
levels(gender)
## [1] "female" "male"
x = c(0, 1, 0, 1, 0)
# 0 -> Non, 1-> Oui
factor(x, levels=c(0,1), labels=c('Non','Oui') )
## [1] Non Oui Non Oui Non
## Levels: Non Oui
factor(1:3)
factor(1:3, labels=c("A", "B", "C"))
## [1] A B C Levels: A B C
factor(1:3, levels=1:5)
## [1] 1 2 3 Levels: 1 2 3 4 5
f = factor(c(2, 4), levels=2:5)
levels(f) ## [1] "2" "3" "4" "5"
as.numeric(f) #[1] 1 2
f = factor(substring("hello", 1:5, 1:5), levels = letters)
## [1] h e l l o
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
as.integer(f)
## [1] 8 5 12 1
factor(f) #<=> f[, drop = TRUE]
## [1] h e l l o
## Levels: e h l o
Factor Constructors
gl(k, n)
: Generate Levels (k : #levels/class of factor n)
f = gl(3, 5)
## [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
## Levels: 1 2 3
s = gl(2, 4, label=c("M", "F"))
## [1] M M M M F F F F
## Levels: M F
10 alternating 1s and 2s
f = gl(2, 1, 10)
## [1] 1 2 1 2 1 2 1 2 1 2
## Levels: 1 2
alternating pairs of 1s and 2s
f = gl(2, 2, 10)
## [1] 1 1 2 2 1 1 2 2 1 1
Conversion
g = unclass(gender)
## [1] 2 1 1 2
## attr(,"levels")
## [1] "female" "male"
attributes(g) <- NULL
g
## [1] 2 1 1 2
as.character(gender)
: [1] "male" "female" "female" "male"as.numeric(gender)
: [1] 2 1 1 2
Conversion of numeric factor : preserving numeric value Tip:
as.numeric(factor(c(1,3)))
## [1] 1 2
as.numeric(as.character(factor(c(1,3))))
## [1] 1 3
Arrays
Array: n-D Array storing data of same mode. n>2, Class: array Matrix: 2-D Array storing data of same mode. Class: matrix Matrix are simply a Vector with a Dimension added to it. (But a vector is not a one-col or one-row matrix)
Array
A = array(1:8, dim = c(2, 2, 2))
: 3D array
> A
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
, , 2
[,1] [,2]
[1,] 5 7
[2,] 6 8
dimnames(A) <- list(c("a", "b"), c("c", "d"), c("e", "f"))
A[,,"e"]
Matrix (2-D Array)
matrix(data = 1, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
<=> matrix(1)
[,1]
[1,] 1
byrow=FALSE indicates that the matrix should be filled by columns (the default)
- dim(M) = nrow(M) x ncol(M)
- length(M): nb elements
- dimnnames(M): list(colnames, rownames)
M = array(1:6, c(2,3)) # by default fill by column
M = matrix(1:6, ncol=3)
M = matrix(data=1:6, nr=2, nc=3)
M = matrix(1:6, 2,3)
> M
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
colnames(M) = paste("y", 1:3, sep="") # <=> c("y1", "y2", "y3")
rownames(M) = c("x1", "x2")
> M
y1 y2 y3
x1 1 3 5
x2 2 4 6
> dimnames(M)
## [1](/gizotso/R/wiki/1)
## [1] "x1" "x2"
##
## [2](/gizotso/R/wiki/2)
## [1] "y1" "y2" "y3"
> M[, 2]
> M[, "y2"]
## [1] 3 4
Sub-Matrix of 1 column: drop=FALSE
prevents casting to a vector
M[, 2, drop=FALSE]
## [,1]
## [1,] 3
## [2,] 4
m1 = matrix(2, nr = 1, nc = 2)
m2 = matrix(2, nr = 2, nc = 2)
## [,1] [,2]
## [1,] 2 2
## [2,] 2 2
cbind(m1, m2) # concat cols
rbind(m1, m2) # concat rows
Matrix Constructors
M = cbind(1:5, rnorm(5))
colnames(M) = c("i", "norm")
> M
i norm
[1,] 1 -2.10724485
[2,] 2 0.41537749
[3,] 3 -0.50847283
[4,] 4 -0.08884656
[5,] 5 0.29271269
# rbind(x,y) matrice combinant par lignes les éléments x et y
M <- rbind(1:5, seq(2, 10, by=2))
rownames(M) = c("x1","x2")
> M
## [,1] [,2] [,3] [,4] [,5]
## x1 1 2 3 4 5
## x2 2 4 6 8 10
M = diag(3)
M = diag(1:3)
> M
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3
diag(Z) = 2 #replace values on the diag
> diag(3, nr = 2, nc = 3)
## [,1] [,2] [,3]
## [1,] 3 0 0
## [2,] 0 3 0
Array: Accessing elements
M = matrix(1:6, ncol=3)
M[i, j]
: element at row i, col jM[, j]
: col jM[i, ]
: row iM[5]
: 5th elementM[,c(1,3)]
columns 1 and 3M["rname", ]
row named "rname"M[, "name"]
column named "name"
Array: Modify elements
M[-1, ]
: removes first lineM[, -2]
: removes 2nd column
Conversion: Matrix/Array to Vector
M = diag(3)
> as.vector(M)
## [1] 1 0 0 0 1 0 0 0
c(M)
## [1] 1 0 0 0 1 0 0 0 1
# Removing dimension attribute casts matrix to vector
> dim(M)
[1] 3 3
dim(M) = NULL
> M
[1] 1 0 0 0 1 0 0 0 1
Conversion: Matrix/Array to DataFrame
as.matrix( data.frame(x=(1:4), n=10) )
Lists
List: ordered collection of objects of possibly different types. (elements of mixed data types - Heterogeneous data structure) Lists are also referred to as recursive vectors as a list can contain other lists. Lists indexing is not like that of vectors & matrix.
L = list('foo', c(1, 2))
> L
[1](/gizotso/R/wiki/1)
[1] "foo"
[2](/gizotso/R/wiki/2)
[1] 1 2
names(L) <- c('txt', 'val')
Constructor
L = list(txt = 'foo', val = c(1,2))
> L
$txt
[1] "foo"
$val
[1] 1 2
x = 1:4; y = 7:9
L = list(x, y);
address <- list("Larry Pace", "102 San Mateo Dr.", "Anderson", "SC", 29625)
lst <- list(numbers = c(1, 2), logical = TRUE, strings = c("a", "b", "c"))
Accessing List elements
L[1]
: Sub-List (still a list)
$txt [1] "foo"
L$val
orL["val"]
orL[2](/gizotso/R/wiki/2)
: 2nd element of the list >[1] 1 2L$val[2]
orL[2](/gizotso/R/wiki/2)[2]
: 2nd element of the 2nd list element >[1] 2
Conversion: List to Vector
unlist()
: Returns a Vector (of same mode) with named elements. Names can be removed using names(v)=NULL
or unname()
.
L = list(txt = 'foo', val = c(1,2))
> unlist(L)
## txt val1 val2
## "foo" "1" "2"
>unlist(unname(L))
## [1] "foo" "1" "2"
Conversion: List to Matrix
L = list(id=123, color='red')
> matrix(unlist(L), ncol=length(unlist(L)), nrow=1)
[,1] [,2]
[1,] "123" "red"
Data Frames
Data Frame: 2D-Array where different columns can store data from different mode.
Data frame is a special List which elements are vectors of equal length.
df = data.frame()
: Data Frame with 0 columns and 0 rows.
- dim(df) = nrow() x ncol()
- ncol(df) = length(df)
- nrow(df)
- colnames(df) = names(df): colname attribute is created by default
- rownames(df)
- str(df)
df = data.frame(1,2,3)
> df
X1 X2 X3
1 1 2 3
names(df) = c("x","y","z")
<=> data.frame(x=1, y=2, z=3)
# data frame from vectors
df = data.frame( x = c(1, 2, 3)
,y = c(0, 10, NA)
)
> df
x y
1 1 0
2 2 10
3 3 NA
# nb obs for col x
> length(df$x)
[1] 3
# nb NA for col y
> sum(is.na(df$y))
[1] 1
#nb not NA for col y
> sum(!is.na(df$y))
[1] 1
> table(df$y, useNA="always")
0 10 <NA>
1 1 1
# data frame from vectors of different length (repetition happens)
x <- 1:4; n <- 10;
df = data.frame(x, n)
> df
## x n
## 1 1 10
## 2 2 10
## 3 3 10
## 4 4 10
# rename 2nd col into y
colnames(df)[2] = "y"
L3 <- LETTERS[1:3] ##[1] "A" "B" "C"
df <- data.frame(x = 1, y = 1:5, fac = sample(L3, 5, replace = TRUE))
> df
## x y fac
## 1 1 1 A
## 2 1 2 A
## 3 1 3 C
## 4 1 4 C
## 5 1 5 B
Data Frame Constructors
data frame from the editor
Min_Wage <- data.frame(Year = numeric(), Value = numeric())
Min_Wage <- edit(Min_Wage)
expand.grid()
: créer un data.frame avec toutes les combinaisons des vecteurs ou facteurs donnés comme arguments
expand.grid(h=c(60,80), w=c(100, 300), sex=c("M", "F"))
## h w sex
## 1 60 100 M
## 2 80 100 M
## 3 60 300 M
## 4 80 300 M
## 5 60 100 F
## 6 80 100 F
## 7 60 300 F
## 8 80 300 F
Accessing data
df$y
: column named y (<=>df[, "y"]
ordf[, 2])
, returns a vector- ##[1] 10 10 10 10
df["y"]
: returns a data.frame.
access cols by name df[c("x1","x2")] df[,c(1,2)] subset(df, select=c(1,2))
Conversion DataFrame to Vector
> as.matrix(df)
x y
[1,] 1 0
[2,] 2 10
[3,] 3 NA
R objects and Classes
https://www.programiz.com/r-programming/object-class-introduction http://docs.renjin.org/en/latest/library/moving-data-between-java-and-r-code.html https://stackoverflow.com/questions/24052158/in-r-why-is-matrix-a-class-but-a-vector-is-not
NULL
: NULL
symbol
: a variable name (mode: "name")
pairlist
: a pairlist object (similar to list, mainly internal)
closure
: a function
environment
: an environment
promise
: an object used to implement lazy evaluation
language
: an R language construct (mode: "(" or "call")
special
: an internal function that does not evaluate its arguments (mode: "function")
builtin
: an internal function that evaluates its arguments (mode: "function")
char
: a ‘scalar’ string object (internal only) ***
logical
: a vector containing logical values
integer
: a vector containing integer values (mode: "numeric")
double
: a vector containing real values (mode: "numeric")
complex
: a vector containing complex values
character
: a vector containing character values
...
: the special variable length argument ***
any
: a special type that matches all types: there are no objects of this type
expression
: an expression object
list
: a list
bytecode
: byte code (internal only) ***
externalptr
: an external pointer object
weakref
: a weak reference object
raw
: a vector containing bytes
S4
: an S4 object which is not a simple object
Modes have the same set of names as types (see typeof) except listed mode above.
| Vector | Array / Matrix(2D) | List | Data Frame |
---|---|---|---|---|
v = c(1, 2, 3) | M = array(1:6, c(2,3)) | L = list('foo', c(1, 2)) | df = data.frame(x = c(1, 2, 3), y = c(0, 10, NA) ) | |
Type Checking | is.vector() | is.matrix(M), is.array(A) | is.list() | is.data.frame() |
class() |
[1] "numeric" | [1] "matrix" | [1] "list" | [1] "data.frame" |
mode() |
[1] "numeric" | [1] "numeric" | [1] "list" | [1] "list" |
typeof() |
[1] "double" | [1] "integer" | [1] "list" | [1] "list" |
length() |
[1] 3 | [1] 6 | [1] 2 | [1] 2 #nb cols |
nrow() |
NULL | [1] 2 | NULL | [1] 3 #nb rows |
dim() |
NULL | [1] 2 3 | NULL | [1] 3 2 # nrows, ncols |
dimnames() |
NULL | dimnames(M) | NULL | |
colnames(A) ⇔ dimnames(A)[2](/gizotso/R/wiki/2) |
x | colnames(M) = paste("y", 1:3, sep="") | x | [1] "x" "y" |
rownames(A) ⇔ dimnames(A)[1](/gizotso/R/wiki/1) |
x | rownames(M) = c("x1","x2") | x | [1] 1 2 3 |
names() (unname() to remove names) |
names(v) = c('v1','v2','v3') | names(L) = c('txt', 'val') | names(df) = c('x','y') | |
attributes() : list object attributes |
$names | $dim, $dimnames | $names | $names, $row.names, $class |
Conversion | as.vector(M) | as.matrix(df) | as.list() | as.dataframe(M) |