Structured covariance matrices for lme4 - rstats-gsoc/gsoc2025 GitHub Wiki
Background
Mixed models, also known as mixed-effects models or hierarchical models, are a class of statistical model that incorporates fixed and random effects. These models are widely used where data are correlated or structured, e.g. repeated measures, longitudinal, clustered or hierarchical data, which are common across numerous disciplines. With the exception of the recommended package nlme
(which predates R itself), lme4
is the most widely used package for mixed models; it provides several capabilities (generalized models, crossed random effects) that are either impossible or difficult to achieve with nlme
.
However, unlike nlme
, lme4
does not allow for structured covariance matrices in random-effects terms (e.g. compound-symmetric, autoregressive, etc.); such matrices are useful both as a way to simplify overly complex models, and to incorporate known structure (e.g. temporal, spatial, or phylogenetic) when estimating random effects.
This project aims to add these capabilities to lme4
.
Related work
- as mentioned above,
nlme
provides access to some structured covariance matrices (and other packages, such asape
, extend these capabilities). Although the architecture ofnlme
andlme4
are completely different, we can use the stuff fromnlme
both as test/comparison cases, and as inspiration (e.g., the value of a modular/extensible/object-oriented framework for structured covariance matrices). - such a project was attempted before (see this branch), but was abandoned 11 years ago for a variety of reasons. We can re-use some of this material but will also take lessons from some of the reasons it failed (e.g., premature branching failed to maintain back-compatibility with "base"
lme4
) - The
glmmTMB
package supports a variety of structured covariance matrices (although not in a modular way). We will be able to re-use some of the formula-processing machinery (implemented in thereformulas
package). - The Bayesian/sampling-based
MCMCglmm
and (to some extent)brms
also provide covariance structures. These will be useful mostly for inspiration, since the internal architectures are very different. - The INLA package provides tools for constructing efficient, spatially structured precision matrices (the inverse of the covariance matrix); it would be nice to be able to take advantage of it, but may not be feasible because the architecture of
lme4
is heavily based on Cholesky factors (a matrix square root of the covariance matrix) rather than precision matrices (or Cholesky factors of precision matrices). - the pedigreemm package implements kinship/pedigree structures in mixed models on top of
lme4
machinery. We might be able to incorporate some of this machinery (although more of the code is written in C++, and the methods might be too specialized for easy re-use).
Details of your coding project
- We will start by reviewing the
flexLambda
branch to see what aspects of the design, and what parts of the original code, can be re-used vs. needing to be rethought/rewritten. - This is a good case for test-driven development; we will start by building tests based on fitting with covariance structures available in
nlme
andglmmTMB
- Our minimal goal is a completely backward-compatible modular structure that implements classes for (at least) diagonal and AR1 covariance structures, including documentation and tests
- Classes will need to include their own
print()
methods - An additional feature will be an
up2date()
function similar to glmmTMB::up2date() which allows re-use of saved models from previous versions oflme4
Expected impact
As discussed above, lme4
is one of the most widely used mixed model packages, with strong support from downstream methods. Covariance structures are one of its most important missing features. While other packages (glmmTMB
/MCMCglmm
/brms
) also provide these structures, lme4
is the best scaling method in some cases; its different internal architecture provides diversity in the mixed model ecosystem.
Mentors
- EVALUATING MENTOR: Ben Bolker [email protected] is the maintainer of
lme4
and an active contributor toglmmTMB
. He has previous experience with GSOC (phylobase
(as mentor, 2008),directlabels
(as co-/secondary mentor, 2021)) - Emi Tanaka [email protected] is the author of the
edibble
andnestr
CRAN packages and co-maintainer of the Mixed Models task view - Mikael Jagan [email protected] is an author of the recommended
Matrix
package and maintainer of several other CRAN packages, and is an active contributor of detailed bug reports and patches for base R.
Contributors, please contact mentors above after completing at least one of the tests below. Feel free to submit partial solutions of the "medium" or "hard" tests if necessary.
Tests
Contributors, please do one or more of the following tests before contacting the mentors above.
-
Easy: Write a shell script to be run in the top level directory of the lme4 sources (containing file
DESCRIPTION
). The script should ensure that all dependencies of lme4 are installed in the user library, then build a source tarball, then perform a check on the source tarball. Run the script and include in your submission the installation and check output from the check directory. If the output suggests problems with your set-up, then fix the problems and try again. -
Medium: Use the modular framework described in
?lme4::modular
to write an R function implementing a diagonal random effects covariance matrix by writing a wrapper for the objective function; the wrapper function should take as its argument a vector of the standard deviations of the covariance matrix, construct a new vector of zeros containing the elements of the scaled Cholesky factor of the blocks of the random effects covariance matrix, and pass this vector to the deviance function
(hint: to determine the indices of diagonal elements of the Cholesky factor in the vector, see the$reTrms$lower
element of the results oflFormula()
). Then callnloptwrap
orbobyqa
with this wrapper function to fit the model.As a test, try this example:
lmer(incidence/size ~ period + (period | herd), cbpp, control = lmerControl(check.nobs.vs.nRE = "ignore"))
but with the diagonal elements set to zero. (You can compare your results withglmmTMB(incidence/size ~ period + diag(period | herd), cbpp, REML = TRUE)
.) -
Hard: Submit a pull request fixing one of the following issues from the
lme4
GitHub issues list. An ideal patch will make minimal necessary changes to the R sources, improve the documentation (where appropriate), and add suitable regression tests. Some suggested issues: #705, #555, #227 [implement anautoscale
option inlmerControl
, which would apply to all numeric columns of the model matrix, but don't worry about back-transforming], #770, #724You may also contact the mentors to request permission to solve a different issue for your test.
Solutions of tests
Contributors, please post a link to your test results below.
Contributor Name | GitHub Profile | Test Results |
---|---|---|
Divendra Yadav | GitHub Profile | Easy, Medium |
Wenjie Lan | GitHub Profile | Easy & Medium |
Nicolò Raganato | GitHub Profile | Easy, Medium, Hard |
Greg Forkutza | GitHub Profile | Easy, Medium, Hard |