Example 26: Nested Model Comparison with Varying Sample Size and Percent Missing - simsem/simsem GitHub Wiki

Model Description

This example will show how to find the fit indices cutoff for nested model comparison and find the statistical power for nested model comparison when we vary sample size (50 to 500) and percent missing (0 to 0.3) in simulation. The model in this example is the simplex model that the same variable is measured five times and the relationship among variables is only the first-order direct effect between the variables in adjacent time. In the parent model, the four direct effects are drawn from the uniform distribution ranged from 0.3 to 0.7. The residual variances have the values that the total variances of each variable are 1. In the nested model, the four direct effects are constrained to be equal across time. The residual variances also have the values that the total variances of each variable are 1. The trivial model misspecification is normal distribution with the mean of 0 and standard deviation of 0.05 added on top of the direct effect so the effects are approximately (but not exactly) equal across time. We will find the cutoffs, as well as statistical power, to decide whether to select the nested (equal direct effects) or parent (unequal direct effects) models when sample size and percent missing in a simulation study are varied.

Example 26 Model

Syntax

The nested model with the equality constraint can be specified:

pathNested <- matrix(0, 5, 5)
pathNested[2, 1] <- "con"
pathNested[3, 2] <- "con"
pathNested[4, 3] <- "con"
pathNested[5, 4] <- "con"
pathMis <- matrix(0, 5, 5)
pathMis[2:5, 1] <- "rnorm(1, 0, 0.05)"
pathMis[3:5, 2] <- "rnorm(1, 0, 0.05)"
pathMis[4:5, 3] <- "rnorm(1, 0, 0.05)"
pathMis[5, 4] <- "rnorm(1, 0, 0.05)"
BEnested <- bind(pathNested, "runif(1, 0.3, 0.7)", misspec = pathMis)
    
residual <- diag(5)
RPS <- binds(residual)
    
modelNested <- model(RPS = RPS, BE = BEnested, modelType="Path")

The parent model without the equality constraint can be specified:

pathParent <- matrix(0, 5, 5)
pathParent[2, 1] <- NA
pathParent[3, 2] <- NA
pathParent[4, 3] <- NA
pathParent[5, 4] <- NA
pathMis <- matrix(0, 5, 5)
pathMis[2:5, 1] <- "rnorm(1, 0, 0.05)"
pathMis[3:5, 2] <- "rnorm(1, 0, 0.05)"
pathMis[4:5, 3] <- "rnorm(1, 0, 0.05)"
pathMis[5, 4] <- "rnorm(1, 0, 0.05)"
BEparent <- bind(pathParent, "runif(1, 0.3, 0.7)", misspec = pathMis)
    
modelParent <- model(RPS = RPS, BE = BEparent, modelType="Path")

The result objects fitting the nested and parent models onto the datasets created from the nested model. Note that if the seed number is the same (the seed number is fixed by default), the same data are generated by both models. The sample size, n, increases from 50 to 500 and the percent missing completely at random, pmMCAR, increases from 0 to 0.3 in both result objects.

outDatNestedModNested <- sim(NULL, n=50:500, modelNested, generate = modelNested, pmMCAR=seq(0, 0.3, 0.1))
outDatNestedModParent <- sim(NULL, n=50:500, modelParent, generate = modelNested, pmMCAR=seq(0, 0.3, 0.1))

We can find the average difference in fit indices, as well as the proportion of replications that provides significant chi-square difference test (power), by the anova function:

anova(outDatNestedModNested, outDatNestedModParent)

Notice that the results contain the information of the power of using chi-square difference test in multiple combinations of sample size and percent missing. Because both result objects are based on the datasets from the nested model, the power should be low.

The figure below shows the graph provided by the anova function when datasets are from the nested model (Note that the first 25 rows in the varyParam elements are shown):

Example 26 anova null

To find the cutoff of the difference in fit indices using the Monte Carlo approach given a combination of sample size, nVal, and percent missing completely at random, pmMCARval, the getCutoffNested function can be used:

cutoff <- getCutoffNested(outDatNestedModNested, outDatNestedModParent, nVal=250, pmMCARval=0.2)

The figure below shows the graph provided by the getCutoffNested function:

Example 26 get cutoff

The sampling distribution of the difference between fit indices from the nested and parent models given each value of sample size can be plotted by the plotCutoffNested function:

plotCutoffNested(outDatNestedModNested, outDatNestedModParent, alpha=0.05)

The figure below shows the graph provided by the plotCutoffNested function:

Example 26 fit

We can find the sampling distribution of the difference in fit indices if the nested model is false (the parent model is true) by specifying the generate argument as the model template from the parent model:

outDatParentModNested <- sim(NULL, n=50:500, modelNested, generate = modelParent, pmMCAR=seq(0, 0.3, 0.1))
outDatParentModParent <- sim(NULL, n=50:500, modelParent, generate = modelParent, pmMCAR=seq(0, 0.3, 0.1))

We can also use the anova function to compare two simulation results of fitting nested and parent models on the datasets from the parent model:

anova(outDatParentModNested, outDatParentModParent)

The output will provide the power of using chi-square difference test in multiple combinations of sample size and percent missing. Because both result objects are based on the datasets from the parent model, the power should be high.

The figure below shows the graph provided by the anova function when datasets are from the parent model:

Example 26 anova alt

We can also find the statistical power of the Monte Carlo approach for the nested model comparison given a combination of sample size and percent missing by the getPowerFitNested function:

getPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent, nVal=250, pmMCARval=0.2)

The figure below shows the graph provided by the getPowerFitNested function when datasets from the nested model are specified:

Example 26 get power 1

We can also find the statistical power of the Monte Carlo approach for the nested model comparison given a priori cutoffs in the getPowerFitNested function:

getPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff=cutoff, nVal=250, pmMCARval=0.2)

The figure below shows the graph provided by the getPowerFitNested function when the derived cutoffs are specified:

Example 26 get power 2

We can plot the statistical power given different values of sample size and percent missing by logistic curves in contour plot using the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent)

The figure below shows the graph provided by the plotPowerFitNested function when the simulation results of the datasets from both parent and nested models are specified:

Example 26 plot power 1

We can select only the logistic plot of the difference in RMSEA in the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent, usedFit="RMSEA")

The figure below shows the graph provided by the plotPowerFitNested function when the simulation results of the datasets from both parent and nested models are specified and the only RMSEA plot is selected:

Example 26 plot power 2

We can make a perspective plot to visualize the logistic curve by specifying the useContour argument as FALSE in the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, nullNested=outDatNestedModNested, 
    nullParent=outDatNestedModParent, useContour=FALSE)

The figure below shows the perspective plots provided by the plotPowerFitNested function when the simulation results of the datasets from both parent and nested models are specified:

Example 26 plot power 3

We can also use our theoretical cutoffs as the cutoff to retain or reject the nested model. For example, when the chi-square value is greater than 3.84 (chi-square cutoff when df = 1 and alpha = .05) or the difference in CFI is greater than 0.1, the nested model will be rejected. This cutoff can be used to find the power in a specific combination of sample size and percent missing by the getPowerFitNested function:

cutoff2 <- c(Chi=3.84, CFI=-0.01)
getPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff=cutoff2, nVal=250, pmMCARval=0.2, 
    condCutoff=FALSE)

The condCutoff is specified as FALSE because the input cutoffs are applicable on any values of sample size and percent missing.

The figure below shows the graph provided by the getPowerFitNested function when the a priori cutoffs are specified:

Example 26 get power 3

The cutoff can be used to find the statistical power given each combination of sample size and percent missing and plot a logistic curve by the plotPowerFitNested function:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff=cutoff2)

The figure below shows the logistic contour plots provided by the plotPowerFitNested function when the a priori cutoff is specified:

Example 26 plot power 4

We can use the perspective plot to visualize the logistic curve:

plotPowerFitNested(outDatParentModNested, outDatParentModParent, cutoff=cutoff2, useContour=FALSE)

The figure below shows the perspective plots provided by the plotPowerFitNested function when the a priori cutoff is specified:

Example 26 plot power 5

Here is the summary of the whole script in this example.