Lab 3 - mayumispice/R GitHub Wiki
#3 - Central Tendency
examAnxiety <- read.csv("examAnxiety.csv", stringsAsFactors = FALSE)
The mean
mean (examAnxiety$Exam.Score)
mean (examAnxiety[,3])
The median
median (examAnxiety$Exam.Score)
median (examAnxiety[,3])
The mode
median (examAnxiety$Exam.Score)
median (examAnxiety[,3])
SD
sd (examAnxiety$Exam.Score)
sd (examAnxiety[,3])
IQR and range - two ways
range(examAnxiety$Exam.Score)
min(examAnxiety$Exam.Score)
max(examAnxiety$Exam.Score)
also... IQR
> IQR(examAnxiety$Exam.Score)
[1] 40
> quantile(examAnxiety$Exam.Score)
0% 25% 50% 75% 100%
2 40 60 80 100
Summary
> summary(examAnxiety$Exam.Score)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.00 40.00 60.00 56.57 80.00 100.00
Running commands by groups - introduction to the "by" function
(variable of interest, group you want to split "by", function)
> by(examAnxiety$Exam.Score, examAnxiety$Sex, mean)
examAnxiety$Sex: Female
[1] 56.45098
-----------------------------------------------------------------
examAnxiety$Sex: Male
[1] 56.69231
> mean(examAnxiety$Exam.Score[examAnxiety$Sex=="Male"])
[1] 56.69231
> mean(examAnxiety$Exam.Score[examAnxiety$Sex=="Female"])
[1] 56.45098
missing data
let's remove some data to simulate missing data case
> examAnxiety$Exam.Score[c(1,4,5)] <- NA
> mean (examAnxiety$Exam.Score)
[1] NA
now, we can workaround by..using na.rm argument
> mean (examAnxiety$Exam.Score,na.rm = TRUE)
[1] 56.67
introduction to the "apply" function - running functions across multiple columns/rows of data!
apply(examAnxiety[,-c(1,5)],2, median) # apply(dataframe, row(1)/column(2), function)
> apply(examAnxiety[,-c(1,5)],2, median)
Study.Hours Exam.Score Anxiety
15.000 60.000 79.044
or
> apply(examAnxiety[-c(1,5)],2, median)
> apply(examAnxiety[-c(1,5)],2, median) # apply(dataframe, row(1)/column(2), function)
Study.Hours Exam.Score Anxiety
15.000 60.000 79.044
similarly
> summary(examAnxiety[-c(1,5)])
Study.Hours Exam.Score Anxiety
Min. : 0.00 Min. : 2.00 Min. : 0.056
1st Qu.: 8.00 1st Qu.: 40.00 1st Qu.:69.775
Median :15.00 Median : 60.00 Median :79.044
Mean :19.85 Mean : 56.57 Mean :74.344
3rd Qu.:23.50 3rd Qu.: 80.00 3rd Qu.:84.686
Max. :98.00 Max. :100.00 Max. :97.582
basic scatterplots and histograms in R
hist(examAnxiety$Study.Hours)
plot(examAnxiety$Study.Hours,examAnxiety$Study.Hours)