怎么进行统计学分析? - ricket-sjtu/bi028 GitHub Wiki
应变量个数 | 自变量个数与类型 | 应变量类型 | 统计学方法 | R函数 |
---|---|---|---|---|
1 | 0 (单个总体) | 区间或正态分布 | 单样本t检验 | t.test() |
有序或区间 | Wilcoxon秩检验 | wilcox.test() | ||
二分类 | 二项式检验 | binom.test(), prop.test() | ||
多分类 | 卡方检验 | chisq.test() | ||
1 (单因素两独立水平) | 区间或正态分布 | 独立样本t检验 | t.test(.., paired=F) | |
有序或区间 | Wilcoxon Mann-Whitney检验 | wilcox.test(.., paired=F) | ||
分类数据 | 卡方检验或Fisher精确检验 | chisq.test(), fisher.test() | ||
1 (单因素多独立水平) | 区间或正态分布 | One-way ANOVA | anova() | |
有序或区间 | Kruskal-Wallis检验 | kruskal.test() | ||
分类数据 | 卡方检验 | chisq.test() | ||
1 (单因素两配对水平) | 区间或正态分布 | 配对t检验 | t.test(.., paired=T) | |
有序或区间 | Wilcoxon符号秩检验 | wilcox.test(.., paired=T) | ||
分类数据 | McNemar检验 | mcnemar.test() | ||
1 (单因素配对多水平) | 区间或正态分布 | One-way repeated measure ANOVA | car::Anova() | |
有序或区间 | Friedman检验 | friedman.test(.., paired=T) | ||
分类数据 | Repeated Measured Logistic回归 | lme4::glmer(.., family=binomial) | ||
2+ (多因素独立) | 区间或正态分布 | factorial ANOVA | car::Anova() | |
有序或区间 | ordered Logistic回归 | MASS::polr() | ||
分类数据 | factorial Logistic回归 | car::Anova() | ||
1 (区间或连续) | 区间或正态分布 | 相关分析或简单回归分析 | cor(.., method="pearson"), lm() | |
有序或区间 | 非参数相关 | cor(.., method="spearman") | ||
分类数据 | 简单Logistic回归 | glm(.., family=binomial) | ||
2+ (连续或分类变量) | 区间或正态分布 | 多重线性回归,协方差分析 | lm(), ancova() | |
分类 | 多重Logisitic回归,判别分析 | nnet::multinom(), MASS::lda() | ||
2+ | 单因素多水平 | 区间或正态连续 | MANOVA分析 | manova() |
2+ | 2+变量 | 区间或正态连续 | 多元线性回归 | manova(lm), car::Manova(lm) |
2 sets of 2+ | 0 | 区间或正态连续 | 典型相关分析 | cancor() |
2+ | 0 | 区间或正态连续 | 因子分析 | factanal() |
my.model <- lm(cbind(A, B) ~ c + d + e + f + g + H + I)
## uses sequential model comparisons for so-called Type I sum of squares
summary(manova(my.model))
## uses model comparisons for Type II sum of squares
library(car)
Manova(my.model, type="II")
Not having equal number of observations in each of the strata.
Consider a model that includes two factors A and B:
- there are therefore two main effects, and
- an interaction, AB
- full model:
SS(A, B, AB)
- model without interaction:
SS(A, B)
SS(AB | A, B) = SS(A, B, AB) - SS(A, B)
SS(A | B, AB) = SS(A, B, AB) - SS(B, AB)
SS(B | A, AB) = SS(A, B, AB) - SS(A, AB)
SS(A | B) = SS(A, B) - SS(B)
SS(B | A) = SS(A, B) - SS(A)
sequential sum of squares
SS(A)
SS(B | A)
SS(AB | A, B)
SS(A | B)
SS(B | A)
SS(A | B, AB)
SS(B | A, AB)
When data is balanced, the factors are orthogonal, and types I, II and III all give the same results.
N <- 100 # number of subjects
c <- rbinom(N, 1, 0.2) # dichotomous predictor c
H <- rnorm(N, -10, 2) # continuous predictor H
A <- -1.4*c + 0.6*H + rnorm(N, 0, 3) # Dependent Variable A
B <- 1.4*c - 0.6*H + rnorm(N, 0, 3) # Dependent Variable B
Y <- cbind(A, B) # Dependent Variable matrix
my.model <- lm(Y ~ c + H) # the multivariate model
summary(manova(my.model)) # base-R: SS type I
# Df Pillai approx F num Df den Df Pr(>F)
# c 1 0.06835 3.5213 2 96 0.03344 *
# H 1 0.32664 23.2842 2 96 5.7e-09 ***
# Residuals 97
library(car) # for Manova()
Manova(my.model, type="II")
# Type II MANOVA Tests: Pillai test statistic
# Df test stat approx F num Df den Df Pr(>F)
# c 1 0.05904 3.0119 2 96 0.05387 .
# H 1 0.32664 23.2842 2 96 5.7e-09 ***
所以,我们可以看到:
- 对于
stats::manova
,使用的是Type I SS,也就是先后计算SS(c)
和SS(H | c)
; - 对于
car::Manova
,使用的是Type II SS,也就是先后计算SS(c | H)
和SS(H | c)
; -
stats::anova
和car::Anova
的区别也是相同的。
- Build the design matrix
$X$
X <- cbind(1, c, H)
XR <- model.matrix(~ c + H)
all.equal(X, XR, check.attributes=FALSE)
- Define the orthogonal projection for the full model:
$P_f = X(X^T X)^{-1} X^T$ , and then get the matrix$W = Y^T (I - P_f) Y$ :
Pf <- X %*% solve(t(X) %*% X) %*% t(X)
Id <- diag(N)
WW <- t(Y) %*% (Id - Pf) %*% Y
- Type I SS for restricted and unrestricted models, plus their projections
$P_{rI}$ and$P_{uI}$ , gives the matrix$B_I = Y^T (P_{uI} - P_{rI}) Y$ :
XrI <- X[, 1]
PrI <- XrI %*% solve(t(XrI) %*% XrI) %*% t(XrI)
XuI <- X[, c(1,2)]
PuI <- XuI %*% solve(t(XuI) %*% XuI) %*% t(XuI)
Bi <- t(Y) %*% (PuI - PrI) %*% Y
- Type II SS for restricted and unrestricted models, plus their projections
$P_{rII}$ and$P_{uII}$ , leading to the matrix$B_{II} = Y^T (P_{uII} - P_{rII}) Y$ :
XrII <- X[, -2]
PrII <- XrII %*% solve(t(XrII) %*% XrII) %*% t(XrII)
PuII <- Pf
Bii <- t(Y) %*% (PuII - PrII) %*% Y
- Pillai-Bartlett trace for both types of SS:
$\operatorname{tr}((B+W)^{-1}B)$
(PBTi <- sum(diag(solve(Bi + WW) %*% Bi))) # SS type I
# [1] 0.0683467
(PBTii <- sum(diag(solve(Bii + WW) %*% Bii))) # SS type II
# [1] 0.05904288