Example 7: Analyzing Real Data and then Run Monte Carlo Simulation - simsem/simsem GitHub Wiki

Model Description

This example will show how to analyze data by this package. The aim of this package is not to fit structural equation models from real data. For fitting structural equation models in R we reccomend the sem, lavaan and OpenMx packages. This package, rather, will use the result for data analysis as parameters in data generation of a simulation study. As a benefit, this procedure will yield a distribution of fit indices from simulated data when the hypothesized model is correct. This distribution can compare with the result of the real data. This procedure is called a Monte Carlo approach for model evaluation.

This example will show how to analyze the Holzinger and Swineford (1939) data that have three factors with three indicators each. The data will be analyzed by confirmatory factor analysis with three factors with three indicators each. The fixed factor method of scale identification is used. The resulting standardized parameter estimates will be used for data generation. Then, cross-loadings with uniform distribution from -0.2 to 0.2 are added to the parameter estimates to generate trivial model misspecification. Then, the generated data (from both real parameters and trivial misspecification) is analyzed and provide the distribution of fit indices.

Example 7 Model

Syntax

We will use the Holzinger and Swineford (1939) data can be found from the lavaan package:

library(lavaan) 
summary(HolzingerSwineford1939)

In this example, we do not need to specify parameter values. Therefore, only parameter specifications are needed. Thus, the matrix object, symmetric matrix object, vector object, and the specification using model functions are not needed. Instead, the parameter specifications can be used and combined together by the estmodel function (alternatively, estmodel.cfa, estmodel.path, or estmodel.sem for an appropriate model type). In this example, the estmodel function is used:

loading <- matrix(0, 9, 3)
loading[1:3, 1] <- NA
loading[4:6, 2] <- NA
loading[7:9, 3] <- NA
cfamodel <- estmodel(LY=loading, modelType="CFA", indLab=paste("x", 1:9, sep=""))

The loading matrix shows which elements are freely estimated and which elements are fixed to 0. The estmodel function will build the analysis model. By default of this function, all factor variances are fixed to 1, all factor covariances are free, all error variances are free, all error covariances are fixed to 0, all factor means are fixed to 0, and all measurement intercepts are free. The modelType argument is specified as "CFA" for confirmatory factor analysis. The indLab argument is used to select a subset of variables for data analysis. The paste("x", 1:9, sep="")) is used to get a vector of x1, x2, , x9, which represents the target variables in the target data set. The order of the names must be the same as the order of indicators in the analysis model. The data is analyzed:

out <- analyze(cfamodel, HolzingerSwineford1939)

out is a lavaan object. We need to translate the lavaan object to model template in this package. The model.lavaan is used as a translator:

datamodel.nomis <- model.lavaan(out, std=TRUE)

The first argument is the lavaan object. The std argument is set as TRUE to use standardized coefficients from the output as the data generation model. The standardized coefficients are used here for convenience in specifying the trivial misspecification (as shown later) because the cross loadings will be in standardized scale (note that some models are not standardized, e.g., growth curve model). A Monte Carlo simulation can be run by the sim function:

output.nomis <- sim(1000, n=nrow(HolzingerSwineford1939), datamodel.nomis)
summary(output.nomis)

Note that this result object does not contain any trivial misspecification. The fit index cutoff using the alpha level of .05 can be visualized by the plotCutoff function:

plotCutoff(output.nomis, 0.05)

The figure below shows the graph provided by the plotCutoff function without trivial misspecification:

Example 7 SSD

Because we have the analysis output from the real data, we can compare the fit indices obtained by the real data analysis and the fit indices from the simulation study by the pValue function:

pValue(out, output.nomis)

The figure below shows the screen provided by the pValue function without trivial misspecification:

Example 7 pValue

The first argument of the pValue function is the lavaan output from the real data. The second argument is the result object from the simulation study. The usedFit argument can be used to select desired fit indices. If p < .05, the hypothesized model is rejected because the fit indices of the real data does not in the range of fit indices when the population model is true. The result provides two extra values: andRule and orRule. The andRule is based on the principle that the model is retained only when all fit indices provide good fit. The proportion is calculated from the number of replications that have all fit indices indicating a better model than the observed data. The proportion from the andRule is the most stringent rule in retaining a hypothesized model. The orRule is based on the principle that the model is retained only when at least one fit index provides good fit. The proportion is calculated from the number of replications that have at least one fit index indicating a better model than the observed data. The proportion from the orRule is the most lenient rule in retaining a hypothesized model. Both rules are based on only selected fit indices. The default uses seven fit indices: Chi-square, AIC, BIC, RMSEA, CFI, TLI, and SRMR.

The previous code does not account for trivial model misspecification yet. The simulated data are generated from the observed parameter estimates only. Trivial model misspecification can be added by adding trivial misspecification in the model.lavaan function:

loading.mis <- matrix("runif(1, -0.2, 0.2)", 9, 3)
loading.mis[is.na(loading)] <- 0
datamodel.mis <- model.lavaan(out, std=TRUE, LY=loading.mis)

As shown above, the trivial misspecification in cross loadings are added into the model. A Monte Carlo simulation can be run. The fit indices can be plotted and the p value can be investigated:

output.mis <- sim(1000, n=nrow(HolzingerSwineford1939), datamodel.mis)
plotCutoff(output.mis, 0.05)
pValue(out, output.mis)

The figure below shows the graph provided by the plotCutoff function with trivial misspecification:

Example 7 SSD 2

The figure below shows the screen provided by the pValue function with trivial misspecification:

Example 7 pValue 2

From here, the pValue from the model with trivial misspecification will be larger than the model without trivial misspecification. Therefore, the chance to retain the hypothesized model is higher when accounting for trivial model misspecification.

Here is the summary of the whole script in this example.

Remarks

Explicitly Specifying Parameter Object

All parameters can be explicitly specified in the simParamCFA function by changing Line 7 as:

facCov <- matrix(NA, 3, 3)
diag(facCov) <- 1
errorCov <- diag(NA, 9)
intercept <- rep(NA, 9)
facMean <- rep(0, 3)
cfamodel <- estmodel(LY=loading, PS=facCov, TE=errorCov, TY=intercept, AL=facMean, modelType="CFA")

Use Unstandardized Parameters

The simulated data can be generated by unstandardized parameters. The std argument in the model.lavaan function can be specified as FALSE to use unstandardized parameters for data generation. The Line 11 can be changed to:

datamodel.nomis <- model.lavaan(out, std=FALSE)

Line 18 can be changed similarly.

Select Desired Fit Indices

The usedFit argument in the pValue function can be specified so the function will find and calculate all p values based on only interested fit indices. For example, if we are interested in only RMSEA and CFI, Line 14 can be changed:

pValue(out, output.nomis, usedFit=c("RMSEA", "CFI"))

Line 22 can be changed similarly.

Function Review

  • estmodel Create a model template for data analysis only
  • estmodel.cfa The shortcut for the estmodel function where CFA is the type of model.
  • estmodel.path The shortcut for the estmodel function where Path is the type of model.
  • estmodel.sem The shortcut for the estmodel function where SEM is the type of model.
  • model.lavaan Create a model template from the lavaan output
  • pValue Find the proportion of the fit indices values across replications that indicates equal or worse fit than the real-data fit indices.