Meta Analysis Fixed Effects vs Random Effects Models - Private-Projects237/Statistics GitHub Wiki
Overview
This wikipage will be comprehensive and detail the difference between fixed effects and random effects models used in meta-analysis. This guide will include generated dummy data by AI, model specification, and model output interpretation.
Equations for meta-analysis (Fixed Effects Model)
Below we will be covering equations that are good to know to run a fixed-effects meta-analysis. We will be focusing specifically on effect sizes that are used to compare mean differences between two levels of a factor (group differences). Since we are working with samples (not populations), notations for means and standard deviations will use sample notation.
1. Pooled Standard Deviation
The pooled standard deviation is basically a combination of the standard deviations for the treatment group and the control group. For our meta-analysis, this will be a combination of standard deviations between before stress condition and after stress condition. This math here is basically to calculate a value that will be used to standardized mean differences when calculating an effect size. Thus, this value is something we will calculate for each study for it to function as a denominator.
SD_{pooled} = \sqrt{\frac{(n_{pre} - 1)s_{pre}^2 + (n_{post} - 1)s_{post}^2}{n_{pre} + n_{post} - 2}}
Where:
- $
n_{pre}
$: The sample size before a stressor is introduced - $
n_{post}
$: The sample size after a stressor is introduced - $
s_{pre}
$: The standard deviation of cognitive performance before a stressor is introduced - $
s_{pst}
$: The standard deviation of cognitive performance after a stressor is introduced
2. Cohen's d
This is how we calculate our effect size, which represents how much influence a predictor has on our outcome of interest. In our meta, our main effect size of interest will tell us how stress influences cognitive performance. We will expect most of our effect sizes to be negative, which means stress has a negative effect on cognition. The math for this is pretty simple, we are subtracting the mean performance of cognitive ability after stress by the mean performance of cognitive ability before stress. However subtracting mean differences is not enough, this needs to be standardized to make them comparable across studies. Standardization essentially means we are removing the units of the outcome or changing the units into standard deviations, and we can accomplish this by dividing the mean difference by the pooled standard deviation ($SD_{pooled}
$).
d = \frac{\bar{x}_{post} - \bar{x}_{pre}}{SD_{pooled}}
Where:
- $
\bar{x}_{post}
$: Mean cognitive performance after being stressed - $
\bar{x}_{post}
$: Mean cognitive performance before being stressed - $
SD_{pooled}
$: The pooled standard deviation from equation 1
3. Hedges' g
We have calculate effect sizes by using the formula for Cohen's d, but there is a problem. These effect sizes are biased, meaning that if used as is in our model, we will produce inaccurate estimates (aka our conclusions would be wrong). This is because calculate Cohen's d for each study does not fully take into account the sample size, thus, studies of different samples sizes in our model will be considered to have the same level of important, which this is not true. Instead, we want effect sizes derived from larger sample sizes to be considered more important, or have more weight- this is because a larger sample size decreases standard errors, making our calculate effect size more reliable. Therefore, we can introduce a correction factor into our Cohen's d estimate to produce Hedges' g ($g
$), which is a better version of Cohen's d ($d
$). (IMPORTANT: Mathematically speaking if your full sample size for a study is 20 or larger this wont have much of an effect)
g = d \times J
where:
- $
g
$: Represents Hedges' g - $
d
$: Represents Cohen's d - $
J
$: Represents the correction factor (now this is not actually the correct equation but its close enough- it was designed by the same person)
J = 1 - \frac{3}{4(n_{post} + n_{pre}-2) - 1}
- $
n_{post} + n_{pre}-2
$ represents the total degrees of freedom
4. Standard Error of Hedges' g
Now that we have the bias addressed by calculating Hedges' g in the previous section, we need to now calculate the standard error for this estimate. We simply cannot use the traditional standard error equation ($SE_{x} = \frac{SD_{x}}{\sqrt{n_{x}}}
$). Instead, we will be using the equation below. This basically tells us a value that if we multiply by ~ 2 and then subtract from and add to Hedges' g, which would produce two values, that we are 95% confident the estimate is within that range. Basically we use standard errors to calculate 95% confidence intervals, which will become important later. Additionally, by eye we can compare standard errors across effect sizes from all the studies, and use that to see if one study has a lot more uncertainty than another.
SE_{g} = \sqrt{\frac{n_{pre} + n_{post}}{n_{pre}n_{post}} + \frac{g^2}{2(n_{pre} + n_{post})}}
5. Fixed-Effects Model: Pooled Effect Size (One value)
This is one type of model (and the easiest) that we can use to run a meta-analysis, the other being a random effects model (bonus: there is a more complicated model named a multi-level model which we will not cover in this wikipage). Basically, many of the equations we learned in the previous sections were really to get the data ready, or you can think about it as cleaned, so we can run this fixed effects model on it. This model calculates a pooled effect size, which is jargon for calculating an average effect size or a grand mean effect size. As an example, if we have 10 studies, and from each of them we calculated Hedges' g, to produced 10 Hedges' g, the fixed effects model will take those 10 effect sizes and produce one grand average effect size. However, it does not simply just add them all up and then divide by the total number of effect sizes (aka using the mean formula), but instead it weights each effect size first, and then averages them. Here is where the standard errors from Hedge's g come from- those values are used to change how influential that effect size is during the mean calculation, and we can see this with the formula below:
\hat{\theta} = \frac{\Sigma_{i=1}^k w_ig_i}{\Sigma_{i=k}^k w_i}
where:
- $
\hat{\theta}
$: The pooled effect size (An average of weighted Hedges' gs) - $
g_i
$: Hedges' g for study i - $
w_i
$: Weight for study i, calculated as the inverse variance.
w_i = \frac{1}{SE_{gi}^2}
where $SE_{gi}^2
$ is the standard error of Hedges' g for study i
6. Standard Error and 95% Confidence Intervals of the Pooled Effect Size (One value)
Above we calculated a pooled effect size, or a grand mean effect size after weighting each effect size by their inverse variance. Anyway, this effect size by itself is meaningless because we do not know if it is significant or not. For example, if the pooled effect size is .30, which is technically a moderate effect size, but then we find out it is not significant, then technically that effect size of .30 gets reduced to 0. So we need to test this. We can do this by first calculating that standard error using the formula below. After we have this value, we can use it to calculate a 95% confidence interval for the pooled effect size. If the confidence interval does not contain 0, then that means our effect size is significantly different from 0, meaning there is an effect present in our data.
SE_{\hat{\theta}} = \sqrt{\frac{1}{\Sigma_{i=1}^k w_i}}
where $w_i
$ is the inverse-variance weight
CI = \hat{\theta} \pm z_{0.975} \times SE_{\theta}
where:
- $
\hat{\theta}
$: Pooled effect size from equation 5 - $
SE_{\hat{\theta}}
$: Standard error of the pooled effect size - $
z_{0.975}
$: Represents the z-score that explains 97.5% of the area under the curve, which is 1.96. This z-score is what we use when we want to calculate 95% Confidence Intervals.
7. Z-Test for Significance (Is there an overall effect?)
Now that we have all of the information that we needed calculated by the fixed-effects model, we can start do some hypothesis testing. Our main research question is seeing if there is an overall effect of stress on cognitive performance. The first thing we need to do is to test whether our pooled effect size, which represents the average of weighted effect sizes for all studies, is significant. More specifically, if it is significantly different than 0. If it is, then by definition this means we reject the null hypothesis and accept that we have enough evidence to be confident that stress affects cognitive performance. We can do the z-test with the simple formula below to calculate a z-statistic.
z = \frac{\hat{\theta}}{SE_{\hat{\theta}}}
where:
- $
\hat{\theta}
$: Pooled effect size - $
SE_{\hat{\theta}}
$: Standard error of the pooled effect size The outcome of this calculation is a z-score, we can then see that the probability is of getting that z-score by comparing it to a normal distribution.
8. Heterogeneity Test (Cochran’s Q-statistic)
Finding out above whether we have an overall effect or not is not enough. Another big component is to identify if we have heterogeneity in our data. This means that when we look at our effect sizes across all papers, whether they are all pretty similar to each other or if there are cases where they can get very different from each other. If we have enough cases where effect sizes are deviating a lot from the average, then that means we have heterogeneity. This is something that we do not want to happen in our fixed-effects model, because this model assumes there is no heterogeneity. Finding heterogeneity in our data indicates we should use a random-effects model instead and that we should follow up our analysis by investigating moderators.
Q = \Sigma_{i=1}^k w_i(g_i - \hat{\theta})^2
where:
- $
w_i
$: Weights - $
g_i
$: Hedges' g for study i - $
\hat{\theta}
$: Pooled effect size
The outcome of this test is a $Q
$ statistic, which follows a chi-square distribution with k -1 degreed of freedom (k is the number of studies). Just like the test above, this statistic is paired with a p-value and if that p-value is smaller than 0.05, then that means we have significant heterogeneity in our data.
I^2
$ Statistic
9. $Okay so let's say we ran the heterogeneity test above and found out that we do have it. The test came back significant. We do not know from that information alone just how much heterogeneity there is in our data. So this is where the $I^2
$ comes in. This statistic essentially quantifies the proportion of variance that is due to heterogeneity. This value cannot be less than 0, however, there can be a few cases where this is the case to having too much error in the data or a small number of studies. Thus, the max()
portion is just stating that if this rare case of a negative value being produced, to just replace that value with 0. For a fixed-effects model we want this value to be close to 0.
I^2 = max(0, \frac{Q - (k-1)}{Q}) \times 100\%
Part 1: Fixed-Effects Model (By Hand)
Any clown can learn a few functions and run a meta-analysis without really knowing what they are doing. It is much better practice in my opinion to learn how to interpret the formulas above, use them to get the outputs we want and generate results that way. Then, after we think we have a good understanding of the math and the logic for this type of analysis, we compare our results to the functions that already exist.
Generating the data
Below we are creating or simulating some dummy data for us to run a meta-analysis on. This data is creating means, sample sizes, and standard deviations for two groups (before stress, after stress) for 10 studies. The outcome that the effect size represents will be change in cognitive performance. Thus, this simulated data should be a bit similar to the data we will expect for our data. Additionally we specified that the pooled effect size for this data should be around d = -0.5.
# Load required packages
library(tidyverse)
library(meta) # For meta-analysis
library(dplyr) # For data manipulation
# Set seed for reproducibility
set.seed(127)
####
######## Part 1: Generating the data
####
# Simulate data for 10 studies (fixed-effects model, low heterogeneity)
n_studies <- 10
effect_true <- -0.5 # Cohen's d
study <- paste0("Study_", 1:n_studies)
# Sample sizes of each groups across studies
n_after_stress <- sample(x = 5:35, size = n_studies, replace = TRUE)
n_before_stress <- sample(x = 5:35, size = n_studies, replace = TRUE)
# Standard deviation for each group across studies
sd_after_stress <- runif(n = n_studies, min = 8, max = 12)
sd_before_stress <- runif(n = n_studies, min = 8, max = 12)
# Calculate the mean of the before_stress group
mean_before_stress <- runif(n = n_studies, min = 50, max = 60)
# To calculate the mean of the after_stress group, we first need to calculate SD_pooled
pooled_sd <- sqrt(((n_after_stress - 1) * sd_after_stress^2 + (n_before_stress - 1) * sd_before_stress^2) / (n_after_stress + n_before_stress - 2))
# With SD_pooled calculated, we can rearrange Cohen's d formula to calculate the after_stress group means
mean_after_stress <- mean_before_stress + effect_true * pooled_sd
# Add some error into the means to simulate sampling error
mean_after_stress <- mean_after_stress + rnorm(n = n_studies, 0, 1.5)
mean_before_stress <- mean_before_stress + rnorm(n_studies, 0, 1.5)
# Create dataset
fixed_data <- data.frame(
study = study,
mean_after_stress = mean_after_stress,
mean_before_stress = mean_before_stress,
sd_after_stress = sd_after_stress,
sd_before_stress = sd_before_stress,
n_after_stress = n_after_stress,
n_before_stress = n_before_stress
)
# Round numeric columns for clarity
fixed_data <- fixed_data %>%
mutate(across(where(is.numeric), ~round(.x, 2)),
total_n = n_after_stress + n_before_stress)
# Save this dataset into another one for later testing
fixed_data3 <- fixed_data
# Visualize the data
library(kableExtra)
fixed_data %>%
kbl(caption = "Generated data to run a fixed effects model on", full_width = F) %>%
kable_minimal()
Below we can view our data and make some predictions about what we are going to find when we run our analyses. Starting with the means, we see that mean performance for cognitive performance in the mean_afer_stress
column look to be smaller than those in the mean_before_stress
, which hints that the introduction of stress leads to poor cognitive performance. We can then look at our standard deviations, we see that comparing sd_after_stress
and sd_berfore_stress
that they look very similar. This indicates that no matter what condition, people still tend to score near each other across conditions. The n_after_stress
and n_before_stress
do not make too much sense because it should not be possible that the sample size just increases after stress, we will ignore this for now but just make note that sample sizes between time points differ. It shouldn't invalidate our results tho- just something that I am lazy to modify code to make it more realistic. However, we should take note that some studies clearly have smaller sample sizes than others- this will be important for later.
Visualizing the Generated Data |
---|
Calculating Statistics to Prepare Data for the Model
As mentioned before, we need to make some transformations to the data so it can be appropriate to run a fixed effects model on it. We need to essentially do the following in this order for each study:
- Calculate the pool standard deviations
- Calculate effect sizes using Cohen's d
- Introduce a correction factor to Cohen's d to convert it into Hedges' g
- Calculate the standard errors for Hedges' g
# Calculate the pooled standard deviation
fixed_data2 <- fixed_data %>%
mutate(SD_pooled = round(sqrt(((n_after_stress - 1)*sd_after_stress^2 + (n_before_stress - 1)*sd_before_stress^2)/(n_after_stress + n_before_stress -2)),2))
# Calculate Cohen's d
fixed_data2 <- fixed_data2 %>%
mutate(Cohens_d = round((mean_after_stress - mean_before_stress)/SD_pooled,2))
# Calculate the correction factor (J)
fixed_data2 <- fixed_data2 %>%
mutate(J = round(1 - (3/(4*(n_after_stress + n_before_stress - 2) - 1)),2))
# Calculate Hedges' g
fixed_data2 <- fixed_data2 %>%
mutate(Hedges_g = round(J* Cohens_d,2))
# Calculate the standard errors for Hedges g
fixed_data2 <- fixed_data2 %>%
mutate(Hedges_g_SE = round(sqrt( ((n_after_stress + n_before_stress)/(n_after_stress * n_before_stress)) + (Hedges_g^2/(2*(n_after_stress + n_before_stress)))),2))
# Keep variables of interest
fixed_data2 <- select(fixed_data2, study, total_n, SD_pooled:Hedges_g_SE)
# View the table
fixed_data2 %>%
kbl(caption = "Calculate Effect Sizes and Standard Errors to Run Fixed Effects Model", full_width = F) %>%
kable_minimal()
Below is the information that we calculated using custom functions in R- basically we can consider these calculations as doing it by hand. A few things to note:
total_n
: this is the total sample size by adding the sample size before and after stressSD_pooled
: This represents the pooled standard deviations, notice that across studies they are all similar- this is because the standard deviations of both groups initially were similar- making these values consistent with out expectation.Cohens_d
: This is our effect size, representing the effect of stress on cognition. As hypothesized, we would expect stress to have a negative effect on cognitive performance, and here we can see that across the 10 studies that seems to be the case. However, there is variation (d = -.06 to - .77).J
: This is the correction factor, or a weight that we will give to the effect size (Cohen's d) to convert it into Hedges' g. If we go back to the table above, we can see that studies with smaller sample sizes, likeStudy_2
was given a smaller weight!Hedges_g
: This is the effect size we will be using for the fixed-effects model. Notice it is very similar to the effect sizes from Cohen's d. This is to be expected since all but one study had a decent sample size (Note: hedges_g starts to differ from Cohens d if the total sample size is less than 20)Hedges_g_SE
: These are the standard errors for each Hedges g. As is they are not very meaningful, but they are needed to run the fixed-effects model.
View Effect Sizes and Standard Errors |
---|
Running a Fixed-Effects Model (by-hand)
Now that we have the data ready for analysis, we will do four tests. The first two tests are pretty redundant but they are fun to know:
- Testing significance of the pooled effect size by using 95% CI
- Testing significance of the pooled effect using z-test
- Testing homogeneity using Cochran's Q test
- Calculating $
I^2
$
# Calculate the pooled effect size (one value)
fixed_data2 <- fixed_data2 %>%
mutate(w = round(1/Hedges_g_SE^2,2),
theta = round(sum(w*Hedges_g)/sum(w),2))
# Calculate 95% CI for the pooled effect size
fixed_data2 <- fixed_data2 %>%
mutate(theta_SE = round(sqrt(1/sum(w)),2),
lower_bound = round(theta - 1.96 * theta_SE,2),
upper_bound = round(theta + 1.96 * theta_SE,2))
# Print the fixed_data2
fixed_data2
# Calculate a Z-test
z_stat = unique(fixed_data2$theta)/unique(fixed_data2$theta_SE)
z_stat
# Get the p-value associated with that statistic
round(pnorm(z_stat),8)
# Test heterogeneity
fixed_data2 <- fixed_data2 %>%
mutate(Q = round(sum(w * (Hedges_g - theta)^2),2))
(Q = unique(fixed_data2$Q))
(k = length(unique(fixed_data$study)))
# Get the p-value of the Q-statistic
round(1 - pchisq(Q, k-1),8)
# Calculating the I^2
max(0, round(((Q - (k - 1)) / Q) * 100,2))
The model output below is kinda ugly but it does tell us everything we need to know:
- inverse variance: We see the weight added from the inverse variance for each effect size (
w
) - Pooled Effect Size: We see that the pooled effect size is -0.51 (
theta
) - Pooled Effect Size Standard Error: We see the standard error specifically for the pooled effect size is 0.1 (
theta_SE
) - Testing Significance Using 95%: With the standard error for the pooled effect size, we can construct 95% CI where if the value 0 is not within the lower (-0.71) and upper bound (-0.31), then that means -0.51 is significant (which it is).
- Testing Significance using Z-test: This is redundant to the 95% CI test, but if we take our pooled effect size (-0.51) and divide it by its standard error (0.1), then that will produce a z-score (z = -5.1). When comparing where this z-score is on a normal distribution, we get a significant p-value (p = 1.7e-07)
- Testing Homogeneity: The null hypothesis for this test is that the effect sizes are homogenous. Therefore having a significant p-value rejects this null hypothesis, which supports heterogeneity. We used math to calculate Q, which we can think of like a Q-score (similar to the idea of a z-score). We can then compare this Q-score to a chi-square distribution with the correct degrees of freedom (k= 9) to obtain a p-value. The p-value (p = 0.7753375) was not significant, which means our data is homogenous.
- Percentage of Variance Explained through Heterogeneity: This test makes sense if the homogeneity test was significant, regardless we will do the math anyway. We ended up getting a negative $
I^2
$, which is not possible, so instead we will report it as 0.
Fixed Effects Model Results |
---|
Running a Fixed-Effects Model Through Respected Functions (Recommended)
Now that we have obtained our results from above, we can use two custom functions from the 'meta' package that does all of this math for us behind the scenes. The first function is metacont()
, which runs the fixed effects model from just means, standard deviations, and sample sizes. We can specify if we want Hedges' g or not, which is what we will input as an argument. The outcome is literally every test we covered above. We can then take that summary information and run it through the forest()
function so it can display it as an image.
# Calculate everything we need to know
meta <- metacont(n.e = n_after_stress, mean.e = mean_after_stress, sd.e = sd_after_stress,
n.c = n_before_stress, mean.c = mean_before_stress, sd.c = sd_before_stress,
data = fixed_data, sm = "SMD", studlab = study, method.smd = "Hedges")
# Print out the summary
summary(meta)
# Plot it out
forest(meta)
Below we get confirmation that we were able to calculate nearly the same estimates as this output in the previous two sections. This is great since it shows that we know what we are doing and it should make interpreting this output much easier.
Summary output of metacont() |
---|
The information below is the same as the information from the summary table- it just includes a visualization for the effect sizes (Hedges' g)
forest() of the summary output |
---|
Another approach (but not as accurate) is to use the metagen()
function from the same 'meta' package. The advantage from this function is that we need to input only the effect sizes (like Cohen's d or Hedges' g), their corresponding standard errors, and then specify if you want a fixed-effects or a random effects model to be produced. The pro from this function is that this gives us more freedom to simulate data and then run this model on it, to see how it can explain our data (ex: adding or removing heterogeneity from our effect sizes without having to do that by changing means, standard deviations, and sample sizes, which we will be doing later). However a con is that rounding error can make this slightly less accurate, but overall the same math is being done.
# Run fixed-effects meta-analysis (second way)
meta2 <- metagen(
TE = Hedges_g, seTE = Hedges_g_SE, data = fixed_data2,
studlab = study, common = TRUE, sm = "SMD", random = FALSE)
# Print out summary
summary(meta2)
Summary output of metagen() |
---|
Conclusion from our Fixed Effects Analysis
By looking at the plot (or the outputs of metacont()
, metagen()
) that describes the results of our meta- we see that our pooled effect size, or the average effect of stress on cognition is negative and it is moderate (-0.51). A negative effect was seen across all of our studies but there was some variation in these effects (g = -.15 to -.76). However, when we followed this up with a test of heterogeneity, we did not have a significant p-value (p = 0.7603), indicating that our data is homogenous. Both of these statistics inform us that there is a negative effect of stress on cognition and that it is consistent, there is no need to do any moderation analysis. Therefore, the fixed-effects model for this dataset was appropriate. Had there been heterogeneity, then we would have 1) needed to use a random effects model and 2) conduct moderation analyses.
Equations for meta-analysis (Random Effects Model)
Now we will be covering equations that are needed to make sense of random-effects models. These models build on and depend on the statistics we calculated for the fixed-effects model. Thus, we cannot calculate a random-effects model without first calculating a fixed-effects model (this applies when doing it by hand, in R it won't matter because all of this math will be done behind the scenes). Essentially, from the statistics we calculated with the fixed-effects model, we will use that to calculate the between-study variance ($\tau^2
$), a measure of how varied the effects sizes are in the data (similar concept to Cochran’s Q), and then use that to recalculate the same statistics.
\tau^2
$; One Value)
10. Between-Study Variance ($Random effects models estimate between-study variance ($\tau^2
$), also known as tau, using the DerSimonian-Laird method. To calculate this, we need to have the statistics from the fixed-effects model, like the inverse variances ($w_i
$), and Cochran’s Q-statistic ($Q
$). The biggest take away from the formula below is the numerator, which is $Q - (k-1)
$, this numerator essentially indicates if there is between-study variance greater than 0 or not. Q is a measure of heterogeneity, and if this value exceeds the value k (number of studies minus 1), then that means there is heterogeneity to some extent, then the math in the denominator is used to indicate by just how much variability is present. There are cases where if there is no between-study variability, that the value produced by this formula will be negative. When that occurs that negative value gets replaced with 0, which is shown in the equation with the max() portion.
\tau^2 = max \left(0, \frac{Q - (k-1)}{\Sigma_{i=1}^k w_i - \frac{\Sigma_{i=1}^k w_i^2}{\Sigma_{i=1}^k w_i}} \right)
where:
- $
Q
$ = Cochran’s Q-statistic - $
w_i
$ = inverse variance (weight) for study i
11. Random-Effects Weights
Now that we calculated $\tau^2
$, we can use it to calculate random effects weights. The math is the same as the fixed-effects weights except now in the denominator we introduced the sum of tau. So let's think about what this means for a second. If a collection of studies for a meta have no between-subject variability, like our fixed-effects model dataset, then these random effects weights will be identical to the fixed effects weights, because we are adding 0 to the denominator. However, if we do have between-studies variability, then that means we will be making the denominator larger, which will produce smaller weights. Therefore, the more between-studies variability there is in the data, the smaller the random effects weights will be in comparison to their fixed effects counterpart!
w_i^* = \frac{1}{SE_{gi}^2 + \tau^2}
where:
- $
SE_{gi}^2
$: Within-study variance (squared standard error) - $
\tau^2
$: Between-study variance
12. Pooled Effect Size (Random-Effects)
These next sections are essentially the same as the fixed-effects part, except now we are doing our calculations with random effects weights ($w_i^*
$) rather than fixed-effects weights (w_i). Thus to calculate a pooled effect size for a random-effects model, we need to multiply these random effect weights to our Hedge's g, add them up, then divide by the sum of the random effect weights.
\hat{\theta_{RE}} = \frac{\Sigma_{i=1}^k w_i^* g_i}{\Sigma_{i=1}^k w_i^*}
where:
- $
w_i^*
$: random-effects weight for study i - $
g_i
$: Hedges' g for study i
13. Standard Errors and 95% Confidence Intervals of Pooled Effect Size (Random-Effects)
Again, having an average of all the effect sizes like the pooled effect size by itself is not meaningful. We need to know if this value is significant (aka significantly different from 0), therefore we to calculate standard errors for it as a precursor to do hypothesis testing. We can use the standard error to calculate 95% CF like we have done before.
SE_{\hat{\theta RE}} = \sqrt{\frac{1}{\Sigma_{i=1}^k w_i^*}}
CI_{RE} = \hat{\theta_{RS}} \pm z_{0.975} \times SE_{\theta RE}
where:
- $
\hat{\theta_{RE}}
$: Pooled effect size for the random effects model - $
SE_{\hat{\theta} SE}
$: The standard error for the pooled effect size for the random effects model - $
z_{0.975}
$: A way to denote the z-score that explained 97.5% of the area under the curve starting from the left size- this translates to a z-score of 1.96, which is what we use to calculate 95% CI.
14. Z-test for Significant
We take the pooled effect size and divide it by its corresponding standard error. This produces a z-score. We then see what the p-value associated with that z-score is, if its less than .05 then that means the pooled effect size is statistically significant.
z = \frac{\hat{\theta_{RE}}}{SE_{\hat{\theta RE}}}
I^2
$)
15. Heterogeneity Statistics (Q and $This aspect is completely identical to what we did for the fixed-effects model. We don't use any of the new statistics we calculated from the random effects weights for this. Thus, this should already be calculated from running a fixed-effects model. Additionally, we expect Q to be significant and for $I^2$ to be a pretty decently sized proportion.
Q = \Sigma_{i=1}^k w_i(g_i - \hat{\theta})^2
where:
- $
w_i
$: Weights - $
g_i
$: Hedges' g for study i - $
\hat{\theta}
$: Pooled effect size for the fixed effects model
The outcome of this test is a $Q
$ statistic, which follows a chi-square distribution with k -1 degreed of freedom (k is the number of studies). Just like the test above, this statistic is paired with a p-value and if that p-value is smaller than 0.05, then that means we have significant heterogeneity in our data.
I^2 = max(0, \frac{Q - (k-1)}{Q}) \times 100\%
$I^2
$: Represents the proportion of variance between effect sizes that is due to between-study variability rather than just chance. A large value indicates there are power moderators involved in the discrepancies.
Part 2: Random-Effects Model (By Hand)
For this section we will run a random-effects model on the same data that we generated for part 1. As I mentioned, we know that our Q = 5.79 and our (k - 1) = 9. Thus, when we calculate $\tau^2
$, its value will be 0. Therefore, the following calculations by hand should produce the same results that we got for the fixed-effects part.
# Generate a dataset for random-effects testing
random_data <- select(fixed_data2, study, Hedges_g, Hedges_g_SE, theta, w, Q)
# Calculating tau^2 (between-study variability)
random_data <- random_data %>%
mutate(tau2 = round(max(0, (Q - (k -1))/(sum(fixed_data2$w) - (sum(fixed_data2$w^2)/ sum(fixed_data2$w)))),2))
# Calculate random effect weights
random_data <- random_data %>%
mutate(`w*` = round(1 / (fixed_data2$Hedges_g_SE^2 + tau2),2))
# Calculate Pooled Effect Size (random-Effects)
random_data <- random_data %>%
mutate(theta = round(sum(`w*` * Hedges_g)/ sum(`w*`),2))
# Calculate the standard error of the pooled effect size + the 95% CI
random_data <- random_data %>%
mutate(theta_SE = round(sqrt(1/sum(`w*`)),2),
lower_bound = round(theta - 1.96 * theta_SE,2),
upper_bound = round(theta + 1.96 * theta_SE,2))
# Print the fixed_data2
random_data
# Calculate a Z-test
z_stat = unique(random_data$theta)/unique(random_data$theta_SE)
z_stat
# Get the p-value associated with that statistic
round(pnorm(z_stat),8)
# Test heterogeneity (Q-test remains the same)
random_data <- random_data %>%
mutate(Q = round(sum(w * (Hedges_g - theta)^2),2))
(Q = unique(fixed_data2$Q))
(k = length(unique(fixed_data$study)))
# Get the p-value of the Q-statistic
round(1 - pchisq(Q, k-1),8)
# Calculating the I^2
max(0, round(((Q - (k - 1)) / Q) * 100,2))
When comparing the model output between the fixed effects model and the random effects model for our data, we see that it produces the same results!
Fixed Effects Model By Hand | Random Effects Model by Hand |
---|---|
I didn't mention this the first time to not cause confusion but notice that the output for metacont()
actually shows you the model outputs for both a fixed-effects model and a random-effects model! Pretty cool huh? It makes sense that the function does this because the fixed-effects model is needed to calculate a random-effects model, so why not also add it with the output? Especially since it is not computationally demanding. Anyway, when comparing the results from both models we see that they are identical.
Summary output of metacont() |
---|
Part 3: Running a Random-Effects Model on a Heterogenous Dataset
Let's now explore what would happen if our dataset had more variation in the effect sizes. To do this, we can take the effect sizes we already calculated (our Hedges' g) and then add some random noise into it. After that, we can repeat a majority of the steps for both fixed effects and random effects model, but first we will need to recalculate the standard errors for the new Hedges' g.
Beginning with the fixed-effects model: We will need to calculate the fixed-effects pooled effect size using the fixed-effects weights, followed by the standard error. From this we can calculate Q and $I^2
$, after that we can move to the random-effects model calculations
- $
w_i
$ - $
\hat{\theta}
$ - $
SE_{\hat{\theta}}
$ - Q
- $
I^2
$
Ending with the random-effects model: We will now use the information from Q and fixed-effect weights (w_i) to calculate $\tau^2
$. With tau, we can now calculate random-effects weights, a pooled effect size for the random-effects model, and its standard error. With that information we can run hypothesis testing for the random-effects model.
- $
\tau^2
$ - $
w_i^*
$ - $
\hat{\theta_{RE}}
$ - $
SE_{\hat{\theta RE}}
$ - 95% CI
- Z-test
Below I tweaked our dataset by adding noise into the Hedges' g using + rnorm(n = k, mean = 0, sd = .3),2)-.03
. This was chosen because it a) adds noise into the Hedges' g and b) produces the same fixed-effects pooled effect size of -0.51, which will make these results comparable to our previous dataset. Thus we can compare the model output from this dataset to our previous one. Additionally the random-effects model will be done by hand to prove that we are in fact doing it correctly.
# Generate a dataset for random-effects testing
random_data2 <- select(fixed_data, study, n_after_stress,n_before_stress)
# Adding noise into the effect size (Hedges_g)
set.seed(123)
random_data2$Hedges_g <- fixed_data2$Hedges_g
random_data2 <- random_data2 %>%
mutate(Hedges_g = round(Hedges_g + rnorm(n = k, mean = 0, sd = .3),2)-.03)
# Calculate Standard Errors for Hedges g
random_data2 <- random_data2 %>%
mutate(Hedges_g_SE = round(sqrt( ((n_after_stress + n_before_stress)/(n_after_stress * n_before_stress)) + (Hedges_g^2/(2*(n_after_stress + n_before_stress)))),2))
# Calculate the pooled effect size (one value)
random_data2 <- random_data2 %>%
mutate(w = round(1/Hedges_g_SE^2,2),
theta = round(sum(w*Hedges_g)/sum(w),2))
# Calculate the pooled effect size standard error
random_data2 <- random_data2 %>%
mutate(theta_SE = round(sqrt(1/sum(w)),2))
# Calculate Q
random_data2 <- random_data2 %>%
mutate(Q = round(sum(w * (Hedges_g - theta)^2),2))
# Calculating tau^2 (between-study variability)
random_data2 <- random_data2 %>%
mutate(tau2 = round(max(0, (Q - (k -1))/(sum(random_data2$w) - (sum(random_data2$w^2)/ sum(random_data2$w)))),2))
# Calculate random effect weights
random_data2 <- random_data2 %>%
mutate(`w*` = round(1 / (random_data2$Hedges_g_SE^2 + tau2),2))
# Calculate Pooled Effect Size (random-Effects)
random_data2 <- random_data2 %>%
mutate(theta = round(sum(`w*` * Hedges_g)/ sum(`w*`),2))
# Calculate the standard error of the pooled effect size + the 95% CI
random_data2 <- random_data2 %>%
mutate(theta_SE = round(sqrt(1/sum(`w*`)),2),
lower_bound = round(theta - 1.96 * theta_SE,2),
upper_bound = round(theta + 1.96 * theta_SE,2))
# Print the fixed_data2
random_data2
# Calculate a Z-test
z_stat = unique(random_data2$theta)/unique(random_data2$theta_SE)
z_stat
# Get the p-value associated with that statistic
round(pnorm(z_stat),8)
# Test heterogeneity (Q-test remains the same)
random_data2 <- random_data2 %>%
mutate(Q = round(sum(w * (Hedges_g - theta)^2),2))
(Q = unique(random_data2$Q))
(k = length(unique(random_data2$study)))
# Get the p-value of the Q-statistic
round(1 - pchisq(Q, k-1),8)
# Calculating the I^2
max(0, round(((Q - (k - 1)) / Q) * 100,2))
Here we see what exactly happens when we run a random-effects model on data that is heterogenous. We can compare differences between these two datasets fairly because they both have the same fixed-effects pooled effect size and the same number of studies (k). The first thing that should pop up is the value Q, which was mentioned really starts the cascade of changes. We know that (k-1) is equal to 9, therefore by having a Q that is larger than 9 for our new dataset (Q = 21.2), that it will produce a tau that is greater than 0 ($\tau^2
$ = 0.15). Tau then has a direct effect on the random-effects weights (w_i^*
) and as expected they are smaller than their fixed effects counterpart (w). Interestingly, calculating the pooled effect size for the random effects model (theta_r
) also produced the same pooled effect size as the fixed-effects (theta
). Unsure if this is always supposed to happen or not. However, there is a clear effect on the standard error produced, with it being larger (0.16) than the fixed-effects one (0.10). This is important because having larger standard errors means we are less likely to have a significant effect, as evident by how much larger our lower and upper boundaries got for our 95% CI and by how much smaller our z-score got (in terms of absolute value). Then what remains after that is our tests of heterogeneity. We know that having a Q = 21.2, which is much larger than 9, means we very much likely have heterogeneity, and when we test for this using a chi-square distribution we see that it is the case since our p-value is significant (p = 0.01179..). Then when it comes to how much variability of our effect size is due to studies instead of random chance our $I^2
$ = 58%.
Random Effects Model Output Original Dataset (Homogeneity) | Random Effects Model Output New Dataset (Heterogeneity) |
---|---|
Lastly, we can confirm that what we did above was correct by using the metagen()
function. We will use this function since we arbitrarily added noise directly into the effect sizes of the original dataset. Thus, the metagen()
function will calculate a random effects model for us using our summary statistics of each study.
# Run fixed-effects meta-analysis
meta_r <- metagen(
TE = Hedges_g,
seTE = Hedges_g_SE,
data = random_data2,
studlab = study,
sm = "SMD",
common = FALSE,
random = TRUE
)
# Print out the summary
summary(meta_r)
# Plot the summary
forest(meta_r)
From the plot below of running a random-effects model on our dataset, we see that we got the same estimates, showcasing that what we did was correct!
Plot of our results |
---|