Model Comparison Using Parametric Bootstrap - PennBBL/groupAnalysis GitHub Wiki
This entry contains the necessary code to use the pbkrtest package for model comparison using Generalized Additive Mixed Effects Models in R.
The code uses a toy dataset that can be found under the following path:
/import/monstrum/exampleData/TOY/TOY_CBF_DATA.csv
The code is broken down into three main section:
- Loading required packages
- Cleaning Data
- Model definition and comparison
If working with a different dataset, you can skip the "Cleaning Data" section since it is specific to this TOY dataset. the current model compares a big model, and a small model. The big model contains a main linear effect for SPQ, a spline for age, and an interaction term of the linear SPQ term and the spline of age. The small model does not contain the interaction term and only has a linear term for SPQ and a non-linear term for age. This entry only addresses comparing models with fixed regression splines.
It is important to note that you do not need to add a linear effect in the formula for gam() if you are already specifying a linear by non-linear interaction effect using the "by = " option in the s() function. In fact, if you add the linear effect, you will get an error from the gam function.
PBmodcomp should only be used if your models are nested (the variables in the smaller model are a subset of the variables in the big model).
Loading dataset and required libraries
################################################################################
library(pbkrtest)
library(lme4)
library(mccv)
################################################################################
data.analysis <- read.csv("TOY_CBF_DATA.csv")
data.analysis <- data.analysis[, -1]
Data Cleaning Section
### Define index of dependent variable
################################################################################
i=19
data.analysis <- data.analysis[!is.na(data.analysis$matedu), ]
var.name <- names(data.analysis)[i]
### Getting rid of outliers:
###anyone that has a change above 3SD of the mean
###anyone that doesn't have the two timepoints
################################################################################
temp.outlier <- data.analysis[, c(1,2,i)]
temp.outlier.wide <- reshape(direction="wide", temp.outlier,
timevar = "timepoint", idvar = "bblid")
temp.outlier.wide$diff <- (temp.outlier.wide[, 2]- temp.outlier.wide[, 3])
outliers <- temp.outlier.wide[is.na(temp.outlier.wide$diff), 1]
temp.outlier.wide <- temp.outlier.wide[!is.na(temp.outlier.wide$diff), ]
outliers <- union(outliers, temp.outlier.wide[which(temp.outlier.wide$diff > mean(temp.outlier.wide$diff) + 3*sd(temp.outlier.wide$diff)), 1])
outliers <- union(outliers, temp.outlier.wide[which(temp.outlier.wide$diff < mean(temp.outlier.wide$diff) - 3*sd(temp.outlier.wide$diff)), 1])
outliers <- union(outliers, 117595) #massICV is less for t2 than t1
for (j in 1:length(outliers)) {
data.analysis <- data.analysis[-which(data.analysis$bblid == outliers[j]), ]
}
### Changing variable type
################################################################################
data.analysis$sex <-ordered(as.character(data.analysis$sex), level=c("male", "female"))
data.analysis$go2DaysFromGo1Scan <- as.numeric(as.character(data.analysis$go2DaysFromGo1Scan))
data.analysis$go2DaysFromGo1Scan[which(data.analysis$time == 1)] <- 0
data.analysis$race[which(data.analysis$race == "black")] <- "other"
data.analysis$race <- as.factor(as.character(data.analysis$race))
data.analysis$dx_group <- as.factor(data.analysis$dx_group)
data.analysis$dx_diag <- ordered(as.character(data.analysis$dx_diag),
levels=c("CR","TD"))
data.analysis$dx_sex=0
data.analysis$dx_sex[which(data.analysis$sex == "male" & data.analysis$dx_diag == "TD")] <- 1
Model definition and comparison
In order to be able to run PBmodcomp using A Generalized Additive Mixed Model, we need to first call the model.matrix() function on a simple gam() in order to obtain the design matrix with the non-linear splines. Then we use this design matrix and lmer() to fit the model and at last we use PBmodcomp in the big and small model to compare them.
### Define Covariates for model
################################################################################
covariates=" ~ +sex +race +matedu +s(age) +s(age, by=Overall_SPQ_Bifactor_General) "
covariates2=" ~ +sex +race +matedu + s(age) + +Overall_SPQ_Bifactor_General "
m <- as.formula(paste(var.name, covariates, sep=""))
m2 <- as.formula(paste(var.name, covariates2, sep=""))
### Create design matrices for large and small models
################################################################################
matrix.big <- model.matrix(gam(m , data=data.analysis))
matrix.base <- model.matrix(gam(m2 , data=data.analysis))
### Create Random Effects and Dependent Variable Vectors
################################################################################
ROI <- as.numeric(data.analysis$gm)
fam.id<- as.factor(data.analysis$bblid)
### Run the linear mixed effects model with the GAM design matrix
################################################################################
base.model <- lmer(ROI ~ -1 + matrix.base + (1|fam.id))
big.model <- lmer(ROI ~ -1 + matrix.big + (1|fam.id))
### Model comparison using parametric bootstrap methods
################################################################################
pb.b <- PBmodcomp(big.model, base.model, nsim=500)
pb.b