Standard t‐test Overview - Private-Projects237/Statistics GitHub Wiki
Here we will be discussing some general knowledge revolving a simple statistical analysis, the t-test. Specifically, the independent two-sample t-test, which is the most common. This test is used to compare mean differences in a continuous outcome between two groups (or two levels of a factor).
Here are general assumptions of a standard t-test. There are technically more assumptions but they only count to specific t-tests:
- The outcome variable is continuous
- The data were sampled at random
- The distribution of the residuals are normally distributed
- The variance of the continuous variable between groups is similar
- Observations are independent both within and between groups
# Load in packages
library(tidyverse)
library(ggplot2)
library(effsize)
# Set seed
set.seed(123)
# Create list for plots
plot_list <- list()
description_list <- list()
# Create dummy data
# Set parameters
n = 50
group1_mean = 100
group2_mean = group1_mean
# Create dummy data
x <- rep(c("group1","group2"), each = 50)
y <- c(rnorm(n=n, mean = group1_mean, sd = 15),
rnorm(n=n, mean = group2_mean, sd = 15))
for(ii in 1:10) {
# Create a dataset
dat <- data.frame(x, y)
# Increase the mean of group2
dat <- dat %>%
mutate(y = ifelse(x == "group2", y + ii, y))
# Calculate descriptives
dat <- dat %>%
group_by(x) %>%
mutate(group_mean = mean(y),
sd_y = sd(y),
se = sd_y/sqrt(n),
lower.CI = group_mean - 1.96*se,
upper.CI = group_mean + 1.96*se)
# Check variances first
var_test <- var.test(y~x, data = dat)
if(var_test$p.value > .05) {
# Run a t-test
mod <- t.test(y~x, data = dat, var.equal = TRUE)
t_test <- "Standard"
} else {
# Run a welch's t-test
mod <- t.test(y~x, data = dat, var.equal = FALSE)
t_test <- "Welch's"
}
# Calculate effect size
cohen_d <- cohen.d(y ~ x, data = dat)
# Create a dataset that contains all info we will need
df <- data.frame(Group1_mean = round(dat$group_mean[dat$x == "group1"][1],2),
Group1_n = n,
Group1_sd = round(dat$sd_y[dat$x == "group1"][1],2),
Group2_mean = round(dat$group_mean[dat$x == "group2"][1],2),
Group2_n = n,
Group2_sd = round(dat$sd_y[dat$x == "group2"][1],2),
t_test = t_test,
t.statistic = round(mod$statistic,2),
df = round(mod$parameter,2),
p.val = mod$p.value,
`95%CI` = paste0("[",paste0(round(mod$conf.int,2),collapse =","),"]"),
Cohen_d = round(cohen_d$estimate,2),
Cohen_95CI = paste0("[",paste0(round(cohen_d$conf.int,2),collapse =","),"]")) %>%
tibble()
# Rename some of the variables
df <- rename(df, `95% CI` = `X95.CI`)
df <- rename(df, `Cohen d 95% CI` = Cohen_95CI)
# Create descriptives of the data
data_descr <- paste0("Group1 mean = ",df$Group1_mean,", Group 2 mean = ",df$Group2_mean," \n" )
# Create a descriptive of the model
model_descr <- paste0("t(", df$df ,")= ",df$t.statistic, ", p= ",df$p.val,", 95% CI ", df$`95% CI`,
", d= ", df$Cohen_d)
# Full descriptions
full_desc <- paste0(data_descr, model_descr)
description_list[[ii]] <- df
# Plot the data
plot_list[[ii]] <- dat %>%
ggplot(aes(x = x, y = y)) +
stat_summary(fun = "mean",
geom = "bar",
width = .6,
color = "black",
fill = "white") +
geom_errorbar(aes(ymin=lower.CI, ymax=upper.CI), width=.2,
position=position_dodge(.9)) +
theme_classic() +
labs(title = paste0("Plot",ii),
caption = full_desc) +
theme(plot.caption = element_text(hjust = 0.5)) +
coord_cartesian(ylim = c(group1_mean - 30 , group2_mean + 20))
}
do.call(rbind, description_list)
#plot_list
Custom code was written to produce the outputs of 10 t-test. The first 6 column represent information about the original data (descriptives) and the last 7 columns given information about the t-test (inferential).
Descriptives: Let's begin with the descriptives, we see that there are two groups (Group1 and Group2)- these groups could mean anything such as sex (M,F), marital status (Y,N), employment (Y,N), education (Y,N) etc. What matters is that there are only two groups. The data below are expressed in wide format. Each group has a mean score for some outcome and we notice that for Group1, that outcome is the same (101.1
), however, for Group2, this outcome tends to increase by 1 for each iteration- this was made on purpose to compare the values of the inferential statistics across rows. Each group has the same sample size and a similar standard deviation. Thus, any changes in the inferential portion of the table below are strictly due to mean differences.
Inferential: Now this is the main output of the t-test. We see that each iteration used the same t-test (standard) and aside from the degrees of freedom (df), all other values for each column differ across rows. Let's start by explaining each one:
-
t.statistic
: This is essentially a ratio that tells you how big the difference between two group means is compared to the difference of their standard error. Therefore, the larger the absolute value of the t statistic, the more likely the means will be statistically different from each other. -
df
: This is the degrees of freedom. It is calculated as$n_1$ +$n_2$ -2. Since each iteration included the same sample size for both groups (both had 50 observations), the degrees of freedom stays constant. -
p.val
: This is the most important part of the output. It is a value (a proportion) that tells you whether or not you can reject the null hypothesis. The null hypothesis for this test is that mean differences are the same. Therefore, if you have a p-value that is smaller than 0.05, this means that we are less than 5% sure that the means are the same. This is so little certainty that instead we say this cannot be true- and since this cannot be true, the alternative we accept is that the means are different from each other. So the take home message is p >= 0.05 the means are the same and p < .05 the means are different. -
95% CI
: This is another way of expressing the p-value. If 0 is not between the two values (confidence intervals), then the means are significantly different from each other. \ -
Cohen's d
: This is the effect size, which is a standardized way of reporting just how different the means are from each other. -
Cohen's d 95% CI
: This tells us how reliable our
10 t-test output |
---|
![]() |
Making sense of the table: We see that as the mean of group 2 starts to increase, the t-statistic and Cohen's d also begins to increase. The p-value, on the other hand starts to get smaller. At row 4, we see that we have a significant mean difference (p < .05), when the mean of group 2 is 106 and group 1 is 101. This means when the mean difference is 5 units apart (the units of the outcome), then our statistics tells us they are meaningfully different. This is also when the 95% CI show two numbers in which 0 no longer passes through. The t-statistic is negative because it is comparing the mean of group1 to group2, and this mean is smaller than that of group2. In no way does negative mean smaller, an increasing negative t-statistic actually means there is more evidence that means are statistically significant from each other. When it comes to how different the means are from each other, we do not use their difference in their units, but instead rely on a standardized difference. This is where Cohen's d comes into play. Traditionally speaking, a small effect starts at 0.2, a medium effect begins at 0.5 and a large effect at 0.8. What is interesting from this example is that even in the first iteration, while not significant, there is a small effect size for mean differences between groups. However, we must also consider Cohen's d 95% CI, which tells us how reliable our effect size is. Even though our first iteration is a small effect, its 95% CI ranges between -0.59 and 0.2, which includes 0, therefore this estimate is not reliable.
We can also visualize the plots below- one for each iteration
Plot 1 | Plot 2 | Plot 3 | Plot 4 | Plot 5 |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
Plot 6 | Plot 7 | Plot 8 | Plot 9 | Plot 10 |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
# Load in packages
library(tidyverse)
library(ggplot2)
library(effsize)
# Set seed
set.seed(123)
# Create list for plots
plot_list <- list()
description_list <- list()
# Create dummy datasets
dummy_data_list <- list()
for(ii in 1:3) {
# Set parameters
n = c(25,50,100)
group1_mean = 100
# Create three dummy datasets
dat <- paste0("dataset",ii)
x <- rep(c("group1","group2"), each = n[ii])
y <- rep(c(rnorm(n=n[ii], mean = group1_mean, sd = 15)),2)
n <- n[ii]
# Create a dataset
dummy_data_list[[ii]] <- data.frame(dat,x, y,n)
}
# Save outputs in lists
output_list <- list()
all_output_list <- list()
# For each dataset
for(iii in 1:length(dummy_data_list)) {
# For each dataset, run 10 iterations
for(ii in 1:10) {
# Run it for three different datasets
dat <- dummy_data_list[[iii]]
# Increase the mean of group2
dat <- dat %>%
mutate(y = ifelse(x == "group2", y + ii, y))
# Calculate descriptives
dat <- dat %>%
group_by(x) %>%
mutate(dat = dat,
group_mean = mean(y),
sd_y = sd(y),
se = sd_y/sqrt(n),
lower.CI = group_mean - 1.96*se,
upper.CI = group_mean + 1.96*se)
# Check variances first
var_test <- var.test(y~x, data = dat)
if(var_test$p.value > .05) {
# Run a t-test
mod <- t.test(y~x, data = dat, var.equal = TRUE)
t_test <- "Standard"
} else {
# Run a welch's t-test
mod <- t.test(y~x, data = dat, var.equal = FALSE)
t_test <- "Welch's"
}
# Calculate effect size
cohen_d <- cohen.d(y ~ x, data = dat)
# Create a dataset that contains all info we will need
df <- data.frame(dataset = dat$dat[1],
iterations = ii,
Group1_mean = round(dat$group_mean[dat$x == "group1"][1],2),
Group1_n = dat$n[1],
Group1_sd = round(dat$sd_y[dat$x == "group1"][1],2),
Group2_mean = round(dat$group_mean[dat$x == "group2"][1],2),
Group2_n = dat$n[1],
Group2_sd = round(dat$sd_y[dat$x == "group2"][1],2),
t_test = t_test,
t.statistic = round(mod$statistic,2),
df = round(mod$parameter,2),
p.val = mod$p.value,
`95%CI` = paste0("[",paste0(round(mod$conf.int,2),collapse =","),"]"),
Cohen_d = round(cohen_d$estimate,2),
Cohen_lower.limit = round(cohen_d$conf.int,2)[1],
Cohen_upper.limit = round(cohen_d$conf.int,2)[2]) %>%
tibble()
# Rename some of the variables
df <- rename(df, `95% CI` = `X95.CI`)
# Save the output into a list
output_list[[ii]] <- df
}
# Save the output from the previous for loop here
all_output_list[[iii]] <- do.call(rbind, output_list)
}
# Combine the output of all datasets together
all_ouputs <- do.call(rbind, all_output_list)
# Create a comprehensive graph from this data
all_ouputs2 <- all_ouputs %>%
pivot_longer(cols = c(t.statistic, Cohen_d),
names_to = "Statistic",
values_to = "Statistic_value") %>%
mutate(dataset = case_when(
dataset == "dataset1"~ "dataset1 (n=50)",
dataset == "dataset2"~ "dataset1 (n=100)",
dataset == "dataset3"~ "dataset1 (n=200)",
),
dataset = factor(dataset,
levels = c("dataset1 (n=50)",
"dataset1 (n=100)",
"dataset1 (n=200)")),
significant = ifelse(p.val < .05, "Yes", "No"),
mean_difference = abs(Group1_mean - Group2_mean),
Statistic = factor(Statistic, levels = c("t.statistic", "Cohen_d")))
all_ouputs2 %>%
filter(Statistic == "t.statistic") %>%
ggplot(aes(x = mean_difference, y = Statistic_value, color = significant)) +
geom_point(size = .5) +
theme_light() +
labs(x = "Mean Difference",
y = NULL) +
facet_grid(~dataset)
all_ouputs2 %>%
filter(Statistic == "Cohen_d") %>%
ggplot(aes(x = mean_difference, y = Statistic_value, color = significant)) +
geom_point(size = .5) +
theme_light() +
labs(x = "Mean Difference",
y = NULL) +
geom_errorbar(aes(ymin = Cohen_lower.limit, ymax = Cohen_upper.limit), width = 0.2) +
facet_grid(~dataset)
Effect of Sample Size on t-statistic
|
---|
![]() |
The table above explains how sample size influences the output of the t-test. There were three datasets that were used for this. The first dataset contains 25 observations for each group, the second contained 50 and the third 100. Just like the previous example, we calculated the means of each group and then used a t-test to compare if there were any significant mean differences. This was done for 10 iterations, thus the graph shows how increasing the mean of one group (group 2), leads to a larger mean difference, and how that results in different t-statistics is affected by sample size.
t-statistic: We see that for all datasets, as the mean difference increases, the t-statistic becomes more negative, but really this means that it gets father away from 0, which is a good thing. The farther away from 0, the stronger the test was at detecting a significant difference between means. The color represents when the t-test becomes significant (blue), meaning that the p-value for that test was less than 0.05. When comparing across datasets, it seems that all t-tests became significant when the t-statistic was -2.00 or smaller. Thus, what the sample size of the dataset seems to be doing is increasing the slope of the t-statistic in relation to the mean difference. Basically what this means is that when your sample size is larger, your group mean difference does not have to be as big as when your sample size is smaller to find a significant difference. For example, for dataset1 with 50 subjects, the group mean difference needs to be around 8 for the test to reveal a significance difference. In dataset 3 this occurs when the difference is 3.75, which is less than half that of the smaller dataset!
Effect of Sample Size on Cohen's d (Effect Size) |
---|
![]() |
This table was created similarly as the one above, with calculating the effect size for mean group differences in three datasets with increasing sample sizes.
Cohen's d: Here we see that sample size has no direct effect on what the effect size is. It is a bit difficult to see, but basically if you have a ruler and hold it horizontally, we would see the value for cohen's d (the circle) is the same at each mean difference for all three datasets. However, what the sample size does influence is the certainty of the effect size, or how reliable it is. Notice that the confidence intervals for Cohen's d decreases as the sample size increases. Also, these values are significant the second the error bars are no longer passing 0.
# Load in packages
library(tidyverse)
library(ggplot2)
library(effsize)
# Set seed
set.seed(123)
# Create list for plots
plot_list <- list()
description_list <- list()
# Create dummy data
# Set parameters
n = 50
group1_mean = 100
group2_mean = 107
iter_num = 50
# Create dummy data
x <- rep(c("group1","group2"), each = 50)
y <- rep(c(rnorm(n=n, mean = group1_mean, sd = 15)),2)
# Create a vector of desired standard deviations
desired_sd <- seq(from = 5, to = 25, length.out = iter_num)
for(ii in 1:iter_num) {
# Create a dataset (and resets the dataset)
dat <- data.frame(x, y)
# Correct the mean of the second group
correct_mean <- group2_mean - group1_mean
dat <- dat %>%
mutate(y = ifelse(x == "group2", y + correct_mean, y))
# QC
aggregate(y ~ x, data = dat, FUN = mean)
# Calculate the current group variance
var_df <- aggregate(y ~ x, data = dat, FUN = var)
group2_var <- var_df$y[2]
# Modifies the variance of group2
dat <- dat %>%
mutate(y = ifelse(x == "group2", y * sqrt(desired_sd[ii]^2/group2_var), y))
# Obtain the means
group_means <- aggregate(y ~ x, data = dat, FUN = mean)
group_means
# Fix the mean again
correct_mean2 <- group_means$y[1] - group_means$y[2]
dat <- dat %>%
mutate(y = ifelse(x == "group2", y + correct_mean2 + correct_mean, y))
# Final QC
aggregate(y ~ x, data = dat, FUN = mean)
# Calculate descriptives
dat <- dat %>%
group_by(x) %>%
mutate(group_mean = mean(y),
var_y = var(y),
sd_y = sd(y),
se = sd_y/sqrt(n),
lower.CI = group_mean - 1.96*se,
upper.CI = group_mean + 1.96*se)
# Run the standard t-test
stand_mod <- t.test(y~x, data = dat, var.equal = TRUE)
stand.t.test <- round(stand_mod$statistic,2)
stand.t.df <- round(stand_mod$parameter,2)
stand.p.val<- round(stand_mod$p.value,2)
# Run Welch's t-test
welch_mod <- t.test(y~x, data = dat, var.equal = FALSE)
welch.t.test <- round(welch_mod$statistic,2)
welch.t.df <- round(welch_mod$parameter,2)
welch.p.val<- round(welch_mod$p.value,2)
# Check variances first
var_test <- var.test(y~x, data = dat)
# Run the correct test
if(var_test$p.value > .05) {
# Run a t-test
mod <- t.test(y~x, data = dat, var.equal = TRUE)
t_test <- "Standard"
} else {
# Run a welch's t-test
mod <- t.test(y~x, data = dat, var.equal = FALSE)
t_test <- "Welch's"
}
# Extract information from the correct t test
mod.t.test <- round(mod$statistic,2)
mod.t.df <- round(mod$parameter,2)
mod.p.val <- round(mod$p.value,2)
# Calculate effect size
cohen_d <- cohen.d(y ~ x, data = dat)
# Create a dataset that contains all info we will need
df <- data.frame(Group1_mean = round(dat$group_mean[dat$x == "group1"][1],2),
Group1_n = n,
Group1_sd = round(dat$sd_y[dat$x == "group1"][1],2),
Group2_mean = round(dat$group_mean[dat$x == "group2"][1],2),
Group2_n = n,
Group2_sd = round(dat$sd_y[dat$x == "group2"][1],2),
stand.t.test = stand.t.test,
stand.df = stand.t.df,
stand.p.val = stand.p.val,
welch.t.test = welch.t.test,
welch.df = welch.t.df,
welch.p.val = welch.p.val,
correct.t.test = mod.t.test,
correct.df = round(mod$parameter,2),
correct.p.val = mod$p.value,
`95%CI` = paste0("[",paste0(round(mod$conf.int,2),collapse =","),"]"),
Cohen_d = round(cohen_d$estimate,2),
Cohen_95CI = paste0("[",paste0(round(cohen_d$conf.int,2),collapse =","),"]")) %>%
tibble()
# Rename some of the variables
df <- rename(df, `95% CI` = `X95.CI`)
df <- rename(df, `Cohen d 95% CI` = Cohen_95CI)
# Save the output into a list
description_list[[ii]] <- df
}
full_dat <- do.call(rbind, description_list)
#plot_list
# Conver the data to long
full_dat2 <- full_dat %>%
pivot_longer(cols = c(stand.t.test, welch.t.test, correct.t.test,
stand.p.val, welch.p.val, correct.p.val),
names_to = "Broad_Statistic_Type",
values_to = "Statistic_Value") %>%
mutate(sd_ratio_diff = (Group1_sd - Group2_sd)/Group1_sd,
Test_Type = case_when(
grepl("stand", Broad_Statistic_Type) ~ "Standard",
grepl("welch", Broad_Statistic_Type) ~ "Welch",
grepl("correct", Broad_Statistic_Type) ~ "Correct",
),
Statistic_Type = ifelse(grepl("p.val",Broad_Statistic_Type), "p.value", "t_statistic")
)
# Make sense of the data using plots
full_dat2 %>%
#mutate(Statistic_Value = abs(Statistic_Value)) %>%
ggplot(aes(x= Group2_sd, y = Statistic_Value)) +
geom_point() +
geom_vline(xintercept = full_dat2$Group1_sd) +
facet_grid(Statistic_Type~Test_Type, scales = "free_y") +
theme_light()

# Load in packages
library(tidyverse)
library(ggplot2)
library(effsize)
# Set seed
set.seed(123)
# Create list for plots
plot_list <- list()
description_list <- list()
# Create dummy data
# Set parameters
n1 = 25
n2 = 18
group1_mean = 100
group2_mean = 107
iter_num = 50
# Create dummy data
x <- c(rep("group1",n1),rep("group2",n2))
y <- c(rnorm(n=n1, mean = group1_mean, sd = 15),
rnorm(n=n2, mean = group2_mean, sd = 15))
# Create a vector of desired standard deviations
desired_sd <- seq(from = 5, to = 25, length.out = iter_num)
for(ii in 1:iter_num) {
# Create a dataset (and resets the dataset)
dat <- data.frame(x, y)
# Correct the mean of the second group
correct_mean <- group2_mean - group1_mean
dat <- dat %>%
mutate(y = ifelse(x == "group2", y + correct_mean, y))
# QC
aggregate(y ~ x, data = dat, FUN = mean)
# Calculate the current group variance
var_df <- aggregate(y ~ x, data = dat, FUN = var)
group2_var <- var_df$y[2]
# Modifies the variance of group2
dat <- dat %>%
mutate(y = ifelse(x == "group2", y * sqrt(desired_sd[ii]^2/group2_var), y))
# Obtain the means
group_means <- aggregate(y ~ x, data = dat, FUN = mean)
group_means
# Fix the mean again
correct_mean2 <- group_means$y[1] - group_means$y[2]
dat <- dat %>%
mutate(y = ifelse(x == "group2", y + correct_mean2 + correct_mean, y))
# Final QC
aggregate(y ~ x, data = dat, FUN = mean)
# Calculate descriptives
dat <- dat %>%
group_by(x) %>%
mutate(group_mean = mean(y),
var_y = var(y),
sd_y = sd(y),
se = sd_y/sqrt(length(x)),
lower.CI = group_mean - 1.96*se,
upper.CI = group_mean + 1.96*se)
# Run the standard t-test
stand_mod <- t.test(y~x, data = dat, var.equal = TRUE)
stand.t.test <- round(stand_mod$statistic,2)
stand.t.df <- round(stand_mod$parameter,2)
stand.p.val<- round(stand_mod$p.value,2)
# Run Welch's t-test
welch_mod <- t.test(y~x, data = dat, var.equal = FALSE)
welch.t.test <- round(welch_mod$statistic,2)
welch.t.df <- round(welch_mod$parameter,2)
welch.p.val<- round(welch_mod$p.value,2)
# Check variances first
var_test <- var.test(y~x, data = dat)
# Run the correct test
if(var_test$p.value > .05) {
# Run a t-test
mod <- t.test(y~x, data = dat, var.equal = TRUE)
t_test <- "Standard"
} else {
# Run a welch's t-test
mod <- t.test(y~x, data = dat, var.equal = FALSE)
t_test <- "Welch's"
}
# Extract information from the correct t test
mod.t.test <- round(mod$statistic,2)
mod.t.df <- round(mod$parameter,2)
mod.p.val <- round(mod$p.value,2)
# Calculate effect size
cohen_d <- cohen.d(y ~ x, data = dat)
# Create a dataset that contains all info we will need
df <- data.frame(Group1_mean = round(dat$group_mean[dat$x == "group1"][1],2),
Group1_n = n1,
Group1_sd = round(dat$sd_y[dat$x == "group1"][1],2),
Group2_mean = round(dat$group_mean[dat$x == "group2"][1],2),
Group2_n = n2,
Group2_sd = round(dat$sd_y[dat$x == "group2"][1],2),
stand.t.test = stand.t.test,
stand.df = stand.t.df,
stand.p.val = stand.p.val,
welch.t.test = welch.t.test,
welch.df = welch.t.df,
welch.p.val = welch.p.val,
correct.t.test = mod.t.test,
correct.df = round(mod$parameter,2),
correct.p.val = mod$p.value,
`95%CI` = paste0("[",paste0(round(mod$conf.int,2),collapse =","),"]"),
Cohen_d = round(cohen_d$estimate,2),
Cohen_95CI = paste0("[",paste0(round(cohen_d$conf.int,2),collapse =","),"]")) %>%
tibble()
# Rename some of the variables
df <- rename(df, `95% CI` = `X95.CI`)
df <- rename(df, `Cohen d 95% CI` = Cohen_95CI)
# Save the output into a list
description_list[[ii]] <- df
}
full_dat <- do.call(rbind, description_list)
#plot_list
# Conver the data to long
full_dat2 <- full_dat %>%
pivot_longer(cols = c(stand.t.test, welch.t.test, correct.t.test,
stand.p.val, welch.p.val, correct.p.val),
names_to = "Broad_Statistic_Type",
values_to = "Statistic_Value") %>%
mutate(sd_ratio_diff = (Group1_sd - Group2_sd)/Group1_sd,
Test_Type = case_when(
grepl("stand", Broad_Statistic_Type) ~ "Standard",
grepl("welch", Broad_Statistic_Type) ~ "Welch",
grepl("correct", Broad_Statistic_Type) ~ "Correct",
),
Statistic_Type = ifelse(grepl("p.val",Broad_Statistic_Type), "p.value", "t_statistic")
)
# Make sense of the data using plots
full_dat2 %>%
#mutate(Statistic_Value = abs(Statistic_Value)) %>%
ggplot(aes(x= Group2_sd, y = Statistic_Value)) +
geom_point() +
geom_vline(xintercept = full_dat2$Group1_sd) +
facet_grid(Statistic_Type~Test_Type, scales = "free_y") +
theme_light()
