Math Underlying a one‐way repeated measures ANOVA - Private-Projects237/Statistics GitHub Wiki

Overview

Here we will be taking a look at the math that underlies a one-way ANOVA. Additionally we will create a function that will be able to spit out step by step the calculation for each part.

Underlying Math Behind a Repeated Measures One-Way ANOVA

We are essentially taking the variance of the outcome and then identifying what proportion of the outcome is explained by a single repeated factor (typically times) in the model vs what is left over. To do this we need to calculate five types of sums of squares:

  1. Total Sums of Squares
  2. Between Subject Sum of Squares
  3. Within Subject Sum of Squares
  4. Time Within Subject Sum of Squares
  5. Residual (Error) Sums of Squares

Then we will need to calculate two types of degrees of freedom:

  1. Degrees of Freedom for Time
  2. Degrees of Freedom for the Residuals
  3. Degrees of Freedom for the Between Subjects

Then we will use the Sum of Squares and Degrees of Freedom to Calculate Mean Squares

  1. Mean Squares for Time
  2. Mean Squares for the Residuals
  3. Mean Squares for the Between-Subjects

Lastly we will use the Mean Squares to calculate the F statistic

  1. F for Time

Custom Function

The function below is pretty big but we will be breaking down its components to explain exactly how it is calculating each component needed for the repeated measures ANOVA results. We can essentially calculate everything except the p-values.

Comprehensive_one_way_repeated_ANOVA <- function(Subject,Outcome, Time){
  # Create the dataset
  dat <- data.frame(Subject, 
                    Outcome,
                    Time)
  
  # Convert the data into wide format
  df <- dat %>%
    pivot_wider(names_from = Time, values_from = Outcome)
  
  # Extract the matrix of values
  Y <- df[, -1]  # Remove Subject column
  n <- nrow(Y)                     # Subjects
  a <- ncol(Y)                     # Time points
  
  # Calculate the means
  df$grand_mean <- mean(dat$Outcome)
  df$subject_means <- rowMeans(Y)
  within_factor_means <- matrix(colMeans(Y), nrow = nrow(Y), ncol = length(Y), byrow = TRUE)
  colnames(within_factor_means) <- paste0(names(Y),"_mean")
  df <- cbind(df,within_factor_means)
  
  # Calculate total sums of squares
  SS_total <- sum((dat$Outcome - df$grand_mean)^2)
  df2 <- paste0("Total Sums of Squares: sum((Outcome - grand_mean)^2) = ", round(SS_total,3))
  
  # Calculate the Between-Subjects Sum of Squares
  df3 <- df %>%
    select(subject_means, grand_mean) %>%
    mutate(mean_diff = subject_means - grand_mean,
           mean_diff_sq = mean_diff^2,
           sum_mean_diff_sq = sum(mean_diff_sq),
           within_factor_lvls = ncol(Y),
           SS_between = sum_mean_diff_sq * within_factor_lvls)
  
  # Calculate Within-Subjects Sum of Squares
  SS_within <- SS_total - df3$SS_between[1]
  df4 <- paste0("Within Subjects Sum of Squares: SS_within <- SS_total - SS_between = ", round(SS_within,3))
  
  # Calculate Time Effect Sum of Squares
  df5 <- df[,c(colnames(within_factor_means),"grand_mean")]
  within_grand_diff <- within_factor_means - df$grand_mean
  colnames(within_grand_diff) <- paste0(colnames(within_grand_diff),"_diff") 
  within_grand_diff_sq <- within_grand_diff^2
  colnames(within_grand_diff_sq) <- paste0(colnames(within_grand_diff),"_sq") 
  sum_within_grand_diff_sq <- sum(unique(within_grand_diff_sq)) 
  SS_time <- n * sum_within_grand_diff_sq
  
  df5 <- cbind(df5, within_grand_diff, within_grand_diff_sq, sum_within_grand_diff_sq,n,SS_time)
  
  # Calculate Residual (Error) Sum of Squares
  SS_residual <- SS_within - SS_time
  df6 <- paste0("Residual (Error) Sum of Squares: SS_residual <- SS_within - SS_time = ", round(SS_residual,3))
  
  # Calculate degrees of freedom
  df_time <- a - 1
  df_subjects <- n - 1
  df_residual <- (a - 1) * (n - 1)
  df_total <- n * a - 1
  
  # Create a dataframe to show how degrees of freedom are calculated
  df7 <- data.frame(
    Source = c("Within (Time)","BetweenSubjects" , "Residual", "Total"),
    Equation = c("a-1","n-1","(a-1)*(n-1)","n*a-1"),
    df = c(df_time, df_subjects, df_residual, df_total)
  )
  
  # Calculate the Mean Squares and F statistic
  MS_time <- SS_time / df_time
  MS_between <- df3$SS_between[1] / df_subjects
  MS_residual <- SS_residual / df_residual
  F_time <- MS_time / MS_residual
  
  # Build an ANOVA table
  anova_table <- data.frame(
    Source = c("Time (Within)", "Residuals", "BetweenSubjects", "Total"),
    df = c(df_time, df_residual, df_subjects, df_total),
    SS = c(SS_time, SS_residual, df3$SS_between[1], SS_total),
    MS = c(MS_time, MS_residual, MS_between, NA),
    F = c(F_time, NA, NA, NA)
  )
  
  # Return list
  return_list <- list(
    All_means = df,
    TSS = df2,
    BSS = df3,
    WSS = df4,
    Time_SS = df5, 
    RSS = df6,
    Degrees_of_Freedom = df7,
    Anova_Table = anova_table
  )
  
  return(return_list)

}

Running the Comprehensive_one_way_repeated_ANOVA() function

# Run the cumstom function
Comprehensive_one_way_ANOVA(dat$score, dat$group)
Comprehensive_one_way_ANOVA() Output Part 1 Comprehensive_one_way_ANOVA() Output Part 2
Screenshot 2025-04-20 at 3 39 35 PM Screenshot 2025-04-20 at 3 39 52 PM
⚠️ **GitHub.com Fallback** ⚠️