Math Underlying a one‐way repeated measures ANOVA - Private-Projects237/Statistics GitHub Wiki
Here we will be taking a look at the math that underlies a one-way ANOVA. Additionally we will create a function that will be able to spit out step by step the calculation for each part.
We are essentially taking the variance of the outcome and then identifying what proportion of the outcome is explained by a single repeated factor (typically times) in the model vs what is left over. To do this we need to calculate five types of sums of squares:
- Total Sums of Squares
- Between Subject Sum of Squares
- Within Subject Sum of Squares
- Time Within Subject Sum of Squares
- Residual (Error) Sums of Squares
Then we will need to calculate two types of degrees of freedom:
- Degrees of Freedom for Time
- Degrees of Freedom for the Residuals
- Degrees of Freedom for the Between Subjects
Then we will use the Sum of Squares and Degrees of Freedom to Calculate Mean Squares
- Mean Squares for Time
- Mean Squares for the Residuals
- Mean Squares for the Between-Subjects
Lastly we will use the Mean Squares to calculate the F statistic
- F for Time
The function below is pretty big but we will be breaking down its components to explain exactly how it is calculating each component needed for the repeated measures ANOVA results. We can essentially calculate everything except the p-values.
Comprehensive_one_way_repeated_ANOVA <- function(Subject,Outcome, Time){
# Create the dataset
dat <- data.frame(Subject,
Outcome,
Time)
# Convert the data into wide format
df <- dat %>%
pivot_wider(names_from = Time, values_from = Outcome)
# Extract the matrix of values
Y <- df[, -1] # Remove Subject column
n <- nrow(Y) # Subjects
a <- ncol(Y) # Time points
# Calculate the means
df$grand_mean <- mean(dat$Outcome)
df$subject_means <- rowMeans(Y)
within_factor_means <- matrix(colMeans(Y), nrow = nrow(Y), ncol = length(Y), byrow = TRUE)
colnames(within_factor_means) <- paste0(names(Y),"_mean")
df <- cbind(df,within_factor_means)
# Calculate total sums of squares
SS_total <- sum((dat$Outcome - df$grand_mean)^2)
df2 <- paste0("Total Sums of Squares: sum((Outcome - grand_mean)^2) = ", round(SS_total,3))
# Calculate the Between-Subjects Sum of Squares
df3 <- df %>%
select(subject_means, grand_mean) %>%
mutate(mean_diff = subject_means - grand_mean,
mean_diff_sq = mean_diff^2,
sum_mean_diff_sq = sum(mean_diff_sq),
within_factor_lvls = ncol(Y),
SS_between = sum_mean_diff_sq * within_factor_lvls)
# Calculate Within-Subjects Sum of Squares
SS_within <- SS_total - df3$SS_between[1]
df4 <- paste0("Within Subjects Sum of Squares: SS_within <- SS_total - SS_between = ", round(SS_within,3))
# Calculate Time Effect Sum of Squares
df5 <- df[,c(colnames(within_factor_means),"grand_mean")]
within_grand_diff <- within_factor_means - df$grand_mean
colnames(within_grand_diff) <- paste0(colnames(within_grand_diff),"_diff")
within_grand_diff_sq <- within_grand_diff^2
colnames(within_grand_diff_sq) <- paste0(colnames(within_grand_diff),"_sq")
sum_within_grand_diff_sq <- sum(unique(within_grand_diff_sq))
SS_time <- n * sum_within_grand_diff_sq
df5 <- cbind(df5, within_grand_diff, within_grand_diff_sq, sum_within_grand_diff_sq,n,SS_time)
# Calculate Residual (Error) Sum of Squares
SS_residual <- SS_within - SS_time
df6 <- paste0("Residual (Error) Sum of Squares: SS_residual <- SS_within - SS_time = ", round(SS_residual,3))
# Calculate degrees of freedom
df_time <- a - 1
df_subjects <- n - 1
df_residual <- (a - 1) * (n - 1)
df_total <- n * a - 1
# Create a dataframe to show how degrees of freedom are calculated
df7 <- data.frame(
Source = c("Within (Time)","BetweenSubjects" , "Residual", "Total"),
Equation = c("a-1","n-1","(a-1)*(n-1)","n*a-1"),
df = c(df_time, df_subjects, df_residual, df_total)
)
# Calculate the Mean Squares and F statistic
MS_time <- SS_time / df_time
MS_between <- df3$SS_between[1] / df_subjects
MS_residual <- SS_residual / df_residual
F_time <- MS_time / MS_residual
# Build an ANOVA table
anova_table <- data.frame(
Source = c("Time (Within)", "Residuals", "BetweenSubjects", "Total"),
df = c(df_time, df_residual, df_subjects, df_total),
SS = c(SS_time, SS_residual, df3$SS_between[1], SS_total),
MS = c(MS_time, MS_residual, MS_between, NA),
F = c(F_time, NA, NA, NA)
)
# Return list
return_list <- list(
All_means = df,
TSS = df2,
BSS = df3,
WSS = df4,
Time_SS = df5,
RSS = df6,
Degrees_of_Freedom = df7,
Anova_Table = anova_table
)
return(return_list)
}
# Run the cumstom function
Comprehensive_one_way_ANOVA(dat$score, dat$group)
Comprehensive_one_way_ANOVA() Output Part 1 |
Comprehensive_one_way_ANOVA() Output Part 2 |
---|---|
![]() |
![]() |