Testing for Moderation in a Meta Analysis - Private-Projects237/Statistics GitHub Wiki

Overview

This wikipage is going to be pretty straightforward, we are going to be demonstrating how to run moderation analyses for metas that have heterogenous data.

Generating the data

Below we will be generating a dataset that contains means, sample sizes, and standard deviations of a repeated measures study, where subjects participant in a cognitive task before (pre) and after (post) some type of stressor. Additionally, testosterone was measured before and after as well, and that was generated into an effect size (test_ES). We also have characteristics about the studies, such as their quality (Bad, Okay, Good), and about the stressor (Acute, Chronics). This information is important since we can use these variables to test as moderators.

We essentially generated the dataset below in a way where we kinda will know how the model will turn out. For example, we are using the following formula to calculate the mean performance after experiencing stress (mean_post)

$$mean_{post_i} = mean_{pre_i} - \ 0.4 \times SD_{pooled} - 7 \times stress + 8 \times test_{ES} + e_i$$

where:

  • $mean_{post_i}$: Mean performance in cognition after a stressor for study i
  • $mean_{pre_i}$: Mean performance in cognition before a stressor for study i
  • $-0.4$: The effect size of stress on cognitive performance (fixed)
  • $SD_{pooled}$: The pooled standard deviation between before and after exercise conditions
  • $stress$: Categorical variable where Chronic = 1, thus representing the effect of Chronic compared to Acute
  • $test_{ES}$: The effect of testosterone effect size on cognitive performance (continuous)
  • $e_i$: Error for each study (SD = 3)
# Load required packages
library(tidyverse)
library(ggplot2)
library(meta)

# Set seed for reproducibility
set.seed(135) 

# Number of studies
n_studies <- 24

# Generate study characteristics
dat <- data.frame(
  study_quality = rep(c("Good", "Okay", "Bad")), 
  stress_type = rep(c("acute", "chronic")),
  test_ES = round(runif(n_studies, 0.1, 1.5),2),
  n_pre = sample(25:55, n_studies, replace = TRUE)
)

# Rearrange the rows
dat <- dat[sample(1:nrow(dat), replace = FALSE),]

# Add a study id
dat$study_id <- paste0("Study_",1:nrow(dat))
dat <- select(dat, study_id, everything())
row.names(dat) <- 1:nrow(dat)

# Create the sample size after stress exposure
dat$n_post <- dat$n_pre - sample(0:10, n_studies, replace = TRUE)

# Create standard deviations for before and after condtions
dat$sd_pre <- round(runif(n = n_studies, min = 7, max = 12),2)
dat$sd_post <- round(runif(n = n_studies, min = 7, max = 12),2)

# Calculate the mean performance of cognition for pre stress
dat$mean_pre = round(rnorm(n = n_studies, mean = 75, sd = 12),2)

# Make stress_type a factor
dat$stress_type <- as.factor(dat$stress_type)
contrasts(dat$stress_type)

# Calculate the mean of stress condition by including overall effect, and moderation effects (type & test)
# d = (post_mean - pre_mean / SD_pooled)
# post_mean = d * SD_pooled + pre_mean
dat <- dat %>%
  mutate(SD_pooled = round(sqrt(((n_pre - 1)*sd_pre^2 + (n_post - 1)*sd_post^2)/(n_pre + n_post - 2)),2),
         d = - .4,
         mean_post = round((d * SD_pooled + mean_pre) - 7 * as.numeric(stress_type) + 8 * test_ES + rnorm(n = n_studies, mean = 0, sd = 3),2))

# View the data as a table
library(kableExtra)

dat %>%
  kbl(full_width = F) %>%
  kable_minimal()

Below we see that we have 20 unique studies (k = 20). From these studies there are some that are bad, okay, and good quality, and we have studies that measured stress as either acute or chronic. We see that we have the variable test_ES, which is just an effect size representing how stress influenced testosterone. We have sample sizes, standard deviations, and performance means within the same subjects before a stressor and after a stressor. Visually we can see that some means from post_mean look smaller than means from pre_mean, indicating that there is likely an effect of stress on cognitive performance. However, we will need to run a random effects meta analysis to confirm.

Viewing the data as a table
Screenshot 2025-06-11 at 2 20 29 AM

Visualize the data

Because we can, and because it is good practice, we can use ggplot() from the 'ggplot2' package to start viewing how different variables may have an effect on the effect sizes. For example, we can use boxplots to look at group mean differences in cognitive performance before and after the stressor. We can also look at how this relationship may differ at levels of moderators in our dataset. Below are examples of different plots we can generate from our data.

# Convert data to long format to then plot it
dat_long <- dat %>%
  pivot_longer(cols = c(mean_pre, mean_post), names_to = "Condition", values_to = "Mean_Performance") %>%
  mutate(Condition = factor(Condition, levels = c("mean_pre", "mean_post")),
         study_quality = factor(study_quality, levels = c("Bad", "Okay", "Good")))

# Visualize the unadjusted effects (stress, stress_type, testosterone)
dat_long %>%
  ggplot(aes(x = Condition, y = Mean_Performance)) +
  geom_boxplot() +
  theme_classic() +
  labs(title = "Unadjusted effect of Stress on Cognition")

dat_long %>%
  ggplot(aes(x = Condition, y = Mean_Performance, fill = stress_type)) +
  geom_boxplot() +
  theme_classic() +
  labs(title = "Effect of Stress on Cognition Moderated by Stress Type")

dat_long %>%
  ggplot(aes(x = test_ES, y = Mean_Performance, color = Condition)) +
  geom_point() +
  geom_smooth(method = "lm", se = F) +
  theme_classic() +
  labs(title = "Relationship Between Testosteron and Cognition\nBefore and After Stress")

dat_long %>%
  ggplot(aes(x = study_quality, y = Mean_Performance)) +
  geom_boxplot() +
  theme_classic() +
  labs(title = "Effect of Stress on Cognition Moderated by Study Quality")
Mean Difference Before and After a Stressor Mean Difference Before and After a Stressor at Levels of Stressor Type
Screenshot 2025-06-11 at 2 21 14 AM Screenshot 2025-06-11 at 2 21 43 AM
Adjusted Mean Difference Before and After a Stressor at Varying Levels of Testosterone Overall Mean Performance by Study Quality
Screenshot 2025-06-11 at 2 22 04 AM Screenshot 2025-06-11 at 2 22 27 AM

Running a Random Effects Model Meta

Running a meta analysis is really easy with the metacont() function, we just insert the means, standard deviations, and sampling sizes and it calculates a random effects meta using Hedges' g for the effect sizes.

# Run a random-effects meta-analysis
meta_result <- metacont(
  n.e = n_post, mean.e = mean_post, sd.e = sd_post,
  n.c = n_pre, mean.c = mean_pre, sd.c = sd_pre,
  data = dat, sm = "SMD", studlab = study_id, method.smd = "Hedges")

# Summary of results
summary(meta_result)

# Show a forest plot
forest(meta_result)

Quick takeaways from our model output below:

  • Our pooled effect size is -0.99 ( p < .05), which is a very strong effect size. This means that stress overall negatively affects cognitive performance in a strong manner.
  • Our test for heterogenous data came back significant (p < .001) and our $I^2$ is 87.6%, which is ridiculously large, indicating we have a substantial amount of heterogeneity in our data. This means that this dataset is appropriate to run moderation analysis on.
Model Output forest()
Screenshot 2025-06-11 at 2 27 16 AM

Moderation Analysis: Subgroup Analysis

One type of moderation analysis is to do a subgroup analysis. This when we have reason to believe that a categorical variable has some influence on the effect sizes. This basically generates pooled effect sizes for each level of the categorical variable and tests whether the effect sizes significantly differ across these levels (between-subgroup differences). The goal is to see if these subgroups can explain heterogeneity in the data (variation in effect sizes). Testing for this using functions from the 'meta' package is really easy. We can use the update() function to basically change or update an argument we did not use when we ran our meta, this argument is 'subgroup' and then mention the categorical variable in the dataset. Below we will do this for both study_quality and stress_type.

# Run a subgroup analysis (study_quality)
update(meta_result, subgroup = study_quality)

# Run a subgroup analysis (stress_type)
update(meta_result, subgroup = stress_type)

# Subgroup by study_quality
forest(update(meta_result, subgroup = study_quality), main = "Subgroup Analysis by Study Quality")
forest(update(meta_result, subgroup = stress_type), main = "Subgroup Analysis by Study Quality")

Study Quality: We have three pooled effect sizes, one for each level of the variable (Bad = -0.7193, Okay = -1.3281, Good = -0.9150). We are interested in the results of the random-effects model, which is Q = 3.75, df = 2, p = 0.1536 (not significant). This means that there is no statistically significant differences between data quality in terms of explaining effect size variation. Additionally the model produces $I^2$ for each of the subgroups and we can see that there is still a lot of heterogeneity present, suggesting that study quality is not doing a good job in explaining this variation.

Stress Type: We have two pooled effect sizes, one for each level of the variable (Acute = -0.4648, Chronic = -1.5127). Just by looking at the values, which are standard deviations, we see that chronic stress leads to substantially more poor performance in cognition than acute stress. This is corroborated with the test for subgroup differences Q = 24.78, sd = 1, p < 0.0001, which is significant. This means that stress type significantly explains heterogeneity in our effect sizes. However, when looking at $I^2$ for each level, we see that the percentage is still pretty high for both (Acute = 78.2%, Chronic = 72.4%). This means that there is still a lot of heterogeneity remaining in our effect sizes after controlling for stress type, thus more moderators need to be tested.

Subgroup Analysis for Study Quality
Screenshot 2025-06-11 at 11 13 47 AM
Subgroup Analysis for Stress Type
Screenshot 2025-06-11 at 11 24 22 AM

We can also run forest() on these updated models that have subgroup analyses, and then showcase that. However, these plots become more impractical with larger study numbers.

Subgroup Analysis for Stress Type forest()
Screenshot 2025-06-11 at 11 27 01 AM

Moderation Analysis: Meta-Regression (Simple)

The section above showed how we can assess for heterogeneity in our data through categorical variables. However, that approach does not work if we have a continuous moderator that we want to investigate. In these cases a meta-regression is what we would use. These work in a very similar way to a typical regression, it produces a y-intercept and a slope that represents the relationship between the continuous moderator and the effect sizes. We can run this analysis using the metareg() function from the 'metafor' package. We just need to tell it what the name of our random effects meta model is plus the moderator we want to test. Below we will be testing testosterone effect size as a moderator.

# Load metafor package for meta-regression
library(metafor)

# Meta-regression 
meta_reg <- metareg(meta_result, ~ test_ES)

# Summary of the meta-regression
summary(meta_reg)

test_ES: Testosterone as an effect size has a positive and significant relationship (p < 0.001) with our effect sizes. This means that we would expect for every 1 standard deviation increase in our testosterone effect size, to expect our effect size of stress on cognition to increase by 1.0750 standard deviations. In other words, there is a positive relationship, with more testosterone and better cognitive performance, hinting that this is a protective factor (higher testosterone protects an individual from the negative effects of stress on cognition). We see that testosterone effect sizes explains 37.14% of the variance of heterogeneity in effect sizes but $I^2$ = 81.97% of heterogeneity remains.

This is because the equation for $R^2$ is based on how much of $tau^2$ was able to be explained:

$$R^2 = \frac{\tau_{null}^2 - \tau_{model}^2}{\tau_{null}^2} = \frac{0.4681 - 0.2943}{0.4681} \times 100 = 37.13\%$$

For $I^2$ we calculate it from this formula below, so this value depends on what your Q value is for the analysis.

$$I^2 = \left( \frac{Q-df}{Q} \right) \times 100\% = \ \frac{117.6273 - 22}{117.6273} \times 100\% = 81.3\%$$
Model Summary of Testosterone as a Moderator
Screenshot 2025-06-11 at 11 39 43 AM

Moderation Analysis: Meta-Regression (Multiple)

We can also run a moderation analysis using more than one moderator. This can be extremely beneficial since now we are representing adjusted slopes, which represent the effect of the moderator on the variability of our effect sizes after controlling for the effects of other moderators within the model. Thus, here we can get a clearer picture of what exactly is occurring within our data.

Multiple Meta-Regression
Screenshot 2025-06-11 at 4 56 14 PM
⚠️ **GitHub.com Fallback** ⚠️