Lab 3 - mayumispice/R GitHub Wiki

#3 - Central Tendency

examAnxiety <- read.csv("examAnxiety.csv", stringsAsFactors = FALSE)

The mean

mean (examAnxiety$Exam.Score)
mean (examAnxiety[,3])

The median

median (examAnxiety$Exam.Score)
median (examAnxiety[,3])

The mode

median (examAnxiety$Exam.Score)
median (examAnxiety[,3])

SD

sd (examAnxiety$Exam.Score)
sd (examAnxiety[,3])

IQR and range - two ways

range(examAnxiety$Exam.Score)
min(examAnxiety$Exam.Score)
max(examAnxiety$Exam.Score)

also... IQR

> IQR(examAnxiety$Exam.Score)
[1] 40
> quantile(examAnxiety$Exam.Score)
  0%  25%  50%  75% 100% 
   2   40   60   80  100 

Summary

> summary(examAnxiety$Exam.Score)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   2.00   40.00   60.00   56.57   80.00  100.00 

Running commands by groups - introduction to the "by" function

(variable of interest, group you want to split "by", function)

> by(examAnxiety$Exam.Score, examAnxiety$Sex, mean)
examAnxiety$Sex: Female
[1] 56.45098
----------------------------------------------------------------- 
examAnxiety$Sex: Male
[1] 56.69231


> mean(examAnxiety$Exam.Score[examAnxiety$Sex=="Male"])
[1] 56.69231
> mean(examAnxiety$Exam.Score[examAnxiety$Sex=="Female"])
[1] 56.45098

missing data

let's remove some data to simulate missing data case

> examAnxiety$Exam.Score[c(1,4,5)] <- NA

> mean (examAnxiety$Exam.Score)
[1] NA

now, we can workaround by..using na.rm argument

> mean (examAnxiety$Exam.Score,na.rm = TRUE)
[1] 56.67

introduction to the "apply" function - running functions across multiple columns/rows of data!

apply(examAnxiety[,-c(1,5)],2, median) # apply(dataframe, row(1)/column(2), function)

> apply(examAnxiety[,-c(1,5)],2, median)
Study.Hours  Exam.Score     Anxiety 
     15.000      60.000      79.044 

or

> apply(examAnxiety[-c(1,5)],2, median)

> apply(examAnxiety[-c(1,5)],2, median) # apply(dataframe, row(1)/column(2), function)
Study.Hours  Exam.Score     Anxiety 
     15.000      60.000      79.044 

similarly

> summary(examAnxiety[-c(1,5)])
  Study.Hours      Exam.Score        Anxiety      
 Min.   : 0.00   Min.   :  2.00   Min.   : 0.056  
 1st Qu.: 8.00   1st Qu.: 40.00   1st Qu.:69.775  
 Median :15.00   Median : 60.00   Median :79.044  
 Mean   :19.85   Mean   : 56.57   Mean   :74.344  
 3rd Qu.:23.50   3rd Qu.: 80.00   3rd Qu.:84.686  
 Max.   :98.00   Max.   :100.00   Max.   :97.582  

basic scatterplots and histograms in R

hist(examAnxiety$Study.Hours)

plot(examAnxiety$Study.Hours,examAnxiety$Study.Hours)