Example 8: Missing Data Handling - simsem/simsem GitHub Wiki

Model Description

This example will show how to impose missing values into datasets. The model of this example is the Multi-Trait, Multi-Method (MTMM) model. There are three traits in this model. Y1, Y4, and Y7 are measured by a common method. The model is shown below. The trivial model misspecification is specified in cross-loadings and error correlations. Note that the cross-loadings in the construct side are only specified because the cross-loadings in the method side do not make sense. We specify that the percentage of missing data will be approximately 20% in all variables.

Example 8 Model

Syntax

The factor loading matrix with trivial misspecification in cross loadings is specified:

loading <- matrix(0, 9, 4)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
loading[7:9, 3] <- NA
loading[c(1, 4, 7), 4] <- NA
loading.v <- matrix(0, 9, 4)
loading.v[1:3, 1] <- "runif(1,.4,.9)"
loading.v[4:6, 2] <- "runif(1,.4,.9)"
loading.v[7:9, 3] <- "runif(1,.4,.9)"
loading.v[c(1, 4, 7), 4] <- "runif(1,.3,.6)"
loading.mis <- matrix("runif(1,-.2,.2)", 9, 4)
loading.mis[is.na(loading)] <- 0
loading.mis[,4] <- 0
LY <- bind(loading, loading.v, misspec=loading.mis)

For some users, typing values into a matrix might be easier. Users might consider the data.entry function:

loading <- matrix(0, 9, 4)
data.entry(loading)

Then, users can edit each element of the loading matrix. The loading matrix should appear as:

Example 8 Data Entry

The syntax of the factor correlation matrix is:

faccor <- diag(4)
faccor[1, 2] <- faccor[2, 1] <- NA
faccor[1, 3] <- faccor[3, 1] <- NA
faccor[2, 3] <- faccor[3, 2] <- NA
faccor.v <- diag(4)
faccor.v[1, 2] <- faccor.v[2, 1] <- "rnorm(1,.4,.1)"
faccor.v[1, 3] <- faccor.v[3, 1] <- "rnorm(1,.2,.1)"
faccor.v[2, 3] <- faccor.v[3, 2] <- "rnorm(1,.3,.1)"
RPS <- binds(faccor, faccor.v)

The factor variances are set as 1 by the program default. The error correlation matrix with trivial misspecification in error correlations is specified:

error.cor.mis <- matrix("rnorm(1,0,.1)", 9, 9)
diag(error.cor.mis) <- 1
RTE <- binds(diag(9), misspec=error.cor.mis)

Thus, the MTMM model is set up:

mtmm.model <- model(LY=LY, RPS=RPS, RTE=RTE, modelType="CFA")

Next, we need to specify a missing object. This object will indicate both the amount of missingness imposed in the simulated data and the method to handle missing data. The missing object can be made by the miss function:

miss.model <- miss(pmMCAR=0.2, m=5)

The pmMCAR argument indicates the proportion of values in each variable that will be missing completely at random. The m argument is the number of imputations run on each simulated data set, which implies using the multiple imputation method in missing data handling. If the m argument is not specified (or set to 0), the missing data handling method will be full information maximum likelihood. Note that users may check the runMI function in the semTools package to analyze data with missing values by multiple imputation.

We can create only one dataset, impose missing values, and analyze the data:

dat <- generate(mtmm.model, 500)
dat <- impose(miss.model, dat) 
out <- analyze(mtmm.model, dat, miss=miss.model)

The impose function on the missing object with a dataset as the second argument will impose missing values on the data. The result is provided in lavaan format. Users may use inspect(out, "impute") to check the fraction missing information and others.

The result object can be specified and investigated:

Output <- sim(1000, n=500, mtmm.model, miss=miss.model)
getCutoff(Output, 0.05)
plotCutoff(Output, 0.05)
summary(Output)

The figure below shows the graph provided by the plotCutoff function:

Example 8 SSD

The figure below shows the screen provided by the summary function:

Example 8 Summary Result 1

Example 8 Summary Result 2

Note that the simulation could be slow because we create five copies of a dataset (i.e., multiple imputation) in each replication. Therefore, we need to run the MTMM model for five times in each replication.

The summary of the simResult object will provide four new columns: the means and the standard deviations of FMI1 and FMI2 across replications.

Here is the summary of the whole script in this example.

Remarks

Alternative imputation package

The default multiple imputation package is Amelia. The mice package is also supported in the current package. The package argument can be used to specify a desired multiple-imputation package:

miss.model <- miss(pmMCAR=0.2, m=5, package="mice")

Additional arguments in multiple imputation

Users may specify additional arguments from the main multiple-imputation function of each package. For example, the users may investigate additional arguments in the Amelia package by

library(Amelia)
?amelia

Assume that this example has multiple groups and some subjects have missing values on the group variable. The noms argument in the Amelia package is needed. The noms argument can be specified in the miss function:

miss.model <- miss(pmMCAR=0.2, m=5, package="mice", noms="group")

The noms argument will be saved in the missing object and pass to the amelia function.

Methods to combine chi-square values

There are three available methods to combine chi-square values, which can be specified in the chi argument in the miss function: "MR" for the method proposed for Meng & Rubin (1992), "Mplus" for the method used in Mplus (Asparouhov & Muthen, 2010), and "LMRR" for the method proposed by Li, Meng, Raghunathan, & Rubin (1991). The default is "all" in order to calculate all three methods in the output. The fit indices in the result object is based on the Mplus method. Here is the example of setting chi-square values to the Li, Meng, Raghunathan, & Rubin's method:

miss.model <- miss(pmMCAR=0.2, m=5, package="mice", chi="LMRR")

Multiple-imputation convergence cutoff

Sometimes, all data analyses on multiply-imputed datasets are not converged. Users need to decide how large the convergence rate is to determine that the overall analysis is converged. The default of this package is that if the convergence rate of the data analyses on the multiply-imputed datasets are less than 0.80, the aggregated result is considered as nonconvergent. The convergentCutoff argument in the miss function can be used to change the cutoff:

miss.model <- miss(pmMCAR=0.2, m=5, package="mice", convergentCutoff=0.6)

Function Review

miss Create a missing object
impose Impose missing values on a data set