Introduction into Heywood Cases and Consequences of Constraining - Private-Projects237/Statistics GitHub Wiki
Overview
This page will be discussing concepts of constraining in CFA and Heywood cases. The purpose is to build a better understanding of how model fit is influenced by Heywood cases, constraining, and constraining due to Heywood cases. The goal is to use this knowledge to aid with scale development studies. To keep this Github wiki at a reasonable length, most of the code will not be provided, instead just the output to get the point across.
[AFTER SPEAKING WITH DR. FRANCIS A HEYWOOD CASE IS WHEN THERE IS A NEGATIVE NUMBER FOR THE VARIANCE OF A FACTOR - IT IS NOT THE SAME AS A FACTOR LOADING BEING GREATER THAN 1. HE ALSO DOES NOT AGREE WITH FIXING HEYWOOD CASES AKA VARIANCES TO 0, JUST CALL IT A BAD MODEL AND MOVE ON]
Testing the Effect of a 0 Constraint on a 1 Factor Model
A dataset was created with 5 variables and a 5-point Likert response for each variable. There are 500 observations. The dataset was created from a vector that contained values from a normal distribution (mean = 0, sd = 1) that was then multiplied by the same factor loading (.8) five times (to create 5 variables) plus some error at random. This generated 5 continuous variables that correlated strongly with the latent factor and moderately with each other. We then converted these numeric values into integers (1,2,3,4,5) that we can use categorically and correlate using a polychoric correlation.
Correlations Between Latent Factor and Variables | Viewing the data as Categorical | Polychoric Correlations Between Items |
---|---|---|
7 model were specified below to investigate what the consequences are for:
- a) constraining an items factor loading to 0, and
- b) not specifying an item in the model.
mod0 <- '
# Latent factor
F1 =~ 1*item1 + item2 + item3 + item4 + item5
'
mod1 <- '
# Latent factor
F1 =~ 1*item1 + item2 + item3 + item4 + 0*item5
'
mod1.1 <- '
# Latent factor
F1 =~ 1*item1 + item2 + item3 + item4
'
mod2 <- '
# Latent factor
F1 =~ 1*item1 + item2 + item3 + 0*item4 + 0*item5
'
mod2.1 <- '
# Latent factor
F1 =~ 1*item1 + item2 + item3
'
mod3 <- '
# Latent factor
F1 =~ 1*item1 + item2 + 0*item3 + 0*item4 + 0*item5
'
mod3.1 <- '
# Latent factor
F1 =~ 1*item1 + item2
'
There are some interesting results below to learn from. The first is about model specification. There is a huge difference between constraining an item to 0 and not including it in the model. In the former, the item stays in the model and its factor loading is forced to be 0. However, when this item is not included in the model, it is not given a factor loading at all- therefore it is as if the dataset does not have the item to begin with. mod1.1
was used in a secondary subset dataset, where 'item5' was removed, and it produced the exact same factor loadings and model fit indices (fit1.1 vs fir1.1.1) as mod1.1
from the full 5 item dataset.
Thus, from this conclusion, when comparing CFA model outputs, they must have the same items included into the model, they cannot simply just not specify an item into the model, because in some cases that is equivalent to throwing out that variable from the dataset, which is bad. Therefore the alternative approach should be to constrain the factor loadings of items, so that model comparison can still be appropriate.
The other takeaway is the effect from constraining items to 0. We see that this massively produces worse model fit indices, and the difference is drastic. So what is going on here? The idea is that we are trying specify a model that will reproduce as best as possible the observed sample variance-covariance matrix from the model. So from the four models that were created from the same dataset, their sample variance-covariance matrix are identical. However, we see that their produced or 'fitted' variance-covariance matrix, what the model is predicting, varies across the models, which makes sense, and here is where it becomes obvious why constraining factor loadings to 0 produces awful model fit. The fitted variance-covariance matrices are essentially forcing correlations between the item that was forced to have a factor loading of 0 and other items to be 0! Thus, creating a matrix that varies greatly from the observed one! Additionally, we can see this clearer in the residual covariance matrix
Model Fit Comparison | Factor Loadings of mod1 |
---|---|
Factor Loadings of mod1.1 | Factor Loadings of mod1.1 in subset data |
---|---|
Observed Sample Variance-Covariance Matrix | Model Implied Variance-Covariance Matrix | Residual Variance Covariance Matrix |
---|---|---|