Generalized Additive Mixed Effects Models - PennBBL/groupAnalysis GitHub Wiki

This entry addresses the analysis of repeated measures data (such as longitudinal data) by the use of the Generalized Additive Mixed Effect Model. We can do this in R using the gamm4() function.

The implementation of gamm4 is virtually the same as the one for gam() so please go to the Generalized Additive Model entry if you want to learn more about it here.

The code uses a toy dataset that can be found under the following path:

/import/monstrum/exampleData/TOY/TOY_VOL_DATA.csv

The biggest difference between using gam and gamm4 is that you have to specify the explicit random effects of the model by using the random argument in the gamm4 function. This is an example of the call

Read Data

data.analysis <- read.csv("TOY_VOL_DATA.csv")
data.analysis <- data.analysis[, -1]
data.analysis <- data.analysis[union(which(data.analysis$dx_group == "CRCR"), which(data.analysis$dx_group == "TDTD")), ]

Declare model

var.name <- "vol_wm"
covariates  <- " ~ +dx_group  +s(age)"
m    <- as.formula(paste(var.name, covariates, sep=""))
gam1 <- gamm4(m, data=data.analysis, random = ~ (1|bblid))

While this might seem like a weird and tedious way of declaring and constructing our formula, these lines can prove incredibly useful when running a regression across several dependent variables (i.e. ROI analysis). With this code, we could loop through each ROI save the name and declare a formula for each one. While this might seem counterproductive, just changing our dependent variable in the gamm4 function does not work and the function will break.

Plotting gamm4 models using ggplot2

The visualization of the gamm4 models tend to be a little challenging, since some of the functions that are commonly used in R to visualize a regression don't work correctly. However, the following code shows a relatively painless way of using ggplot to do this. In order to do this we need to do the following

  1. Run your model in the data you want to analyze
  2. Create a dummy dataset with all variables constant except the one you want to analyze (if we want to see the non-linear effect of age over brain volume, we create a dataset with our possible range of age values)
  3. Use the predict function on this dummy dataset
  4. Plot the results of the predict function using ggplot

To make this clearer, this is the code used to plot some of these non-linear effects:

Create dummy dataset

plot.df = data.frame( age = rep(seq(min(data.analysis$age), max(data.analysis$age), 
         length.out=200),nlevels(data.analysis$dx_group) ), 
         dx_group=rep(levels(data.analysis$dx_group) , each=200) )

The model we have chosen only contains age and diagnosis as predictors, therefore we only need to declare those two variables in our dataset. We start by declaring an age variable by using the sequence function and ranging our values from the minimum to the maximum age. The argument length.out is put to your discretion, and will only determine the difference in age across each point in the sequence. We also also the rep function to repeat these age values twice (since there are two main diagnosis groups). We also declare a main diagnosis, so that we end up with the same 200 age values repeated twice for each diagnosis group marked as such.

If you want to work with more variables, you can just declare them after this line (i.e. plot.df$sex <- "female"). Please be advised that if you include more than one categorical variable in your analysis, you might want to consider plotting the effects separately. For example, if we also included a sex by age non-linear interaction in this model, but we are still interested in visualizing the effects of diagnosis it might be a good idea to plot males and females separately.

It is important to note that you should only let the variables you want to plot vary, since you are trying to plot for the effect of these variables holding everything else constant. Therefore, if you are only interested to see the age effect, then all other used regressors should be either declared to be the mean, or one of the categories.

Use predict on the dummy dataset to obtain estimation and Standard Error

plot.df = cbind(plot.df, as.data.frame(predict( gam1$gam, plot.df, se.fit = T)))
pred.gamm.plot = plot.df
pred.gamm.plot$group = pred.gamm.plot$dx_diag
pred.gamm.plot$se = pred.gamm.plot$se.fit
data.analysis$prediction <- as.vector(predict.gam(gam1$gam))

Once we have created a dummy dataset with ALL regressors, we can use the predict function to determine what would be the expected values. We can also obtain the standard error of the fit in order to be able to plot confidence intervals.

Use ggplot to plot the estimated values

(ggplot(data=pred.gamm.plot, aes(x=age))   
       + geom_line(data=data.analysis, aes(x = age, 
                                       y = vol_wm, 
                                           group = bblid,
                                           col=factor(dx_group)))
       + geom_line(aes(y=fit, col=factor(dx_group)), size=1)
       + geom_line(aes(y=fit+2*se, col=factor(dx_group)), linetype="dashed")
       + geom_line(aes(y=fit-2*se, col=factor(dx_group)), linetype="dashed")
       + ylab(var.name)
       + ggtitle(paste("Estimated Value for",var.name,"Vs. Age"))
       + ylim(min(data.analysis$prediction)*.9,max(data.analysis$prediction)*1.1))

In here, we are creating a spaghetti plot using the raw values as the background, however we can could change this if need be to the predicted values of our regression (You can use predict on your real dataset). We chose 2 standard errors above and below the mean since this will reflect a 95% confidence interval.