2. Regression Analysis - wmmurrah/lavaanFIML GitHub Wiki

This tutorial demonstrates how to use full information maximum likelihood (FIML) estimation to deal with missing data in a regression model using lavaan.

Import Data

In this post I use FIML to deal with missing data in a multiple regression framework. First, I import the data from a text file named 'employee.dat'. You can download a zip file of the data from Applied Missing Data website. I also have a github page for these examples here. Remember to replace the file path in the read.table function with the path to the text file location on your computer.

employee <- read.table("data/employee.dat")

Because the original text file does not include variable names, I name the variables in the new data frame:

names(employee) <- c("id", "age", "tenure", "female", "wbeing", "jobsat", 
                     "jobperf", "turnover", "iq")

then I recode all data points with the value of -99 in the original text file, which indicates a missing value, to NA, the missing data value recognized by R.

employee[employee == -99] <- NA

Create Regression Model Object

Now we are ready to create a character string containing the regression model using the lavaan model conventions. Note that b1 and b2 are labels that will be used later for the Wald test. These labels are equivalent to (b1) and (b2) after these variables in the Mplus code.

model <- '
# Regression model 
jobperf ~ b1*wbeing + b2*jobsat

# Variances
wbeing ~~ wbeing
jobsat ~~ jobsat

# Covariance/correlation
wbeing ~~ jobsat
'

In addition to the regression model, I also estimated the variances and covariances of the predictors. I did this to replicate the results of the original Mplus example. In Mplus you have to estimate the variances of all of the predictors if any of them have missing data that you would like to model. In lavaan the fixed.x=FALSE argument has the same effect (see below).

Fit the Model

Next, I use the sem function to fit the model.

fit <- sem(model, employee, missing='fiml', meanstructure=TRUE, 
           fixed.x=FALSE)

Listwise deletion is the default, so the missing='fiml' argument tell lavaan to use the FIML instead. I also included the meanstructure=TRUE argument to include the means of the observed variables in the model, and the fixed.x=FALSE argument to estimate the means, variances, and covariances. Again, I do this to replicate the results of the original Mplus example.

Generate Output

We are now ready to look at the results.

summary(fit, fit.measures=TRUE, rsquare=TRUE, standardize=TRUE)

Compared to what we learned in the last post, the only thing new to the summary function is the rsquare=TRUE argument, which, not surprisingly, results in the model R² being included in the summary output.

I only show the Parameter estimates section here:

Parameter estimates:

  Information                                 Observed
  Standard Errors                             Standard

                   Estimate  Std.err  Z-value  P(>|z|)   Std.lv  Std.all
Regressions:
  jobperf ~
    wbeing   (b1)     0.476    0.055    8.665    0.000    0.476    0.447
    jobsat   (b2)     0.027    0.060    0.444    0.657    0.027    0.025

Covariances:
  wbeing ~~
    jobsat            0.467    0.098    4.780    0.000    0.467    0.336

Intercepts:
    jobperf           2.869    0.382    7.518    0.000    2.869    2.289
    wbeing            6.286    0.063   99.692    0.000    6.286    5.338
    jobsat            5.959    0.065   91.836    0.000    5.959    5.055

Variances:
    wbeing            1.387    0.108                      1.387    1.000
    jobsat            1.390    0.109                      1.390    1.000
    jobperf           1.243    0.087                      1.243    0.792

R-Square:

    jobperf           0.208

Wald Test

In lavaan the Wald test is called separately from the estimation function. This function will use the labels assigned in the model object above.

# Wald test is called seperately.
lavTestWald(fit,  constraints='b1 == 0
                               b2 == 0')

Results of Wald Test

$stat
[1] 95.88081

$df
[1] 2

$p.value
[1] 0

There you have it! Regression with FIML in R. But, what if you have variables that you are not interested in incorporating in your model, but may have information about the missingness in the variables that are in your model? I will talk about that in the next post.