2.3 Factors - JulTob/R GitHub Wiki
The term factor refers to a statistical data type used to store categorical variables. The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values.
It is important that R knows whether it is dealing with a continuous or a categorical variable, as the statistical models you will develop in the future treat both types differently.
To create factors in R, you make use of the function factor(). First thing that you have to do is create a vector that contains all the observations that belong to a limited number of categories. For example, sex_vector contains the sex of 5 different individuals:
# Sex vector
sex_vector <- c("Male", "Female", "Female", "Male", "Male")
# Convert sex_vector to a factor
factor_sex_vector <- factor(sex_vector)
# Print out factor_sex_vector
factor_sex_vector
[1] Male Female Female Male Male
Levels: Female Male
There are two types of categorical variables: a nominal categorical variable and an ordinal categorical variable.
-
A nominal variable is a categorical variable without an implied order. This means that it is impossible to say that 'one is worth more than the other'.
-
In contrast, ordinal variables do have a natural ordering. Consider for example the categorical variable temperature_vector with the categories: "Low", "Medium" and "High". Here it is obvious that "Medium" stands above "Low", and "High" stands above "Medium"
# Animals
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
factor_animals_vector <- factor(animals_vector)
factor_animals_vector
# Temperature
temperature_vector <- c("High", "Low", "High","Low", "Medium")
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector
Alias
You can rename the factors by setting specifications:
# Code to build factor_survey_vector
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector
# Specify the levels of factor_survey_vector
levels(factor_survey_vector) <- c("Female", "Male")
factor_survey_vector
For ordered factors:
# Create speed_vector
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
# Convert speed_vector to ordered factor vector
factor_speed_vector <- factor(speed_vector,
ordered = TRUE,
levels = c("slow", "medium", "fast" ))
# Print factor_speed_vector
factor_speed_vector
summary(factor_speed_vector)
These CAN be compared `` `# Create factor_speed_vector speed_vector <- c("medium", "slow", "slow", "medium", "fast") factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("slow", "medium", "fast"))
Factor value for second data analyst
da2 <- factor_speed_vector[2]
Factor value for fifth data analyst
da5 <-factor_speed_vector[5]
Is data analyst 2 faster than data analyst 5?
da2 > da5