Higher order factor model HolzingerSwineford1939 - Private-Projects237/Statistics GitHub Wiki

Overview

Here we will be exploring the higher order factor model using the sample data from HolzingerSwineford1939.

Generate the data


corr_matrix <- matrix(c(
  1.000, 0.297, 0.441, 0.373, 0.293, 0.357, 0.067, 0.224, 0.390,
  0.297, 1.000, 0.340, 0.153, 0.139, 0.193, -0.076, 0.092, 0.206,
  0.441, 0.340, 1.000, 0.159, 0.077, 0.198, 0.072, 0.186, 0.329,
  0.373, 0.153, 0.159, 1.000, 0.733, 0.704, 0.174, 0.107, 0.208,
  0.293, 0.139, 0.077, 0.733, 1.000, 0.720, 0.102, 0.139, 0.227,
  0.357, 0.193, 0.198, 0.704, 0.720, 1.000, 0.121, 0.150, 0.214,
  0.067, -0.076, 0.072, 0.174, 0.102, 0.121, 1.000, 0.487, 0.341,
  0.224, 0.092, 0.186, 0.107, 0.139, 0.150, 0.487, 1.000, 0.449,
  0.390, 0.206, 0.329, 0.208, 0.227, 0.214, 0.341, 0.449, 1.000
), nrow=9, ncol=9, byrow=TRUE)

# Set row and column names
rownames(corr_matrix) <- c("y1", "y2", "y3", "y4", "y5", "y6", "y7", "y8", "y9")
colnames(corr_matrix) <- c("y1", "y2", "y3", "y4", "y5", "y6", "y7", "y8", "y9")

Specify the model

without labels

# Specify the model
mod <- '
  # measurement model (fixed = 3; free = 6)
  F1 =~ 1*y1 + y2 + y3
  F2 =~ 1*y4 + y5 + y6
  F3 =~ 1*y7 + y8 + y9
  
  # structural model (fixed = 0; free = 3)
  H =~ NA*F1 + F2 + F3
  
  # first order factor residual variances (fixed = 0; free = 3)
  F1 ~~ F1
  F2 ~~ F2
  F3 ~~ F3
  
  # second order factor variance (fixed=1; free=0)
  H ~~ 1*H
  
  # residual variance of indicators (fixed = 0; free = 9)
  y1 ~~ y1
  y2 ~~ y2
  y3 ~~ y3
  y4 ~~ y4
  y5 ~~ y5
  y6 ~~ y6
  y7 ~~ y7
  y8 ~~ y8
  y9 ~~ y9
  
  # Total estimated parameters = 9*10/2 = 45
  # Fixed parameters = 3 + 1 = 4
  # Freely estimated parameters = 6 + 3 + 3 + 9  = 21
  # df = 45 - 21 = 24
'

with labels

mod <- '
  # measurement model (fixed = 3; free = 6)
  F1 =~ paste(1)*1*y1 + a*y2 + b*y3
  F2 =~ paste(1)*1*y4 + c*y5 + d*y6
  F3 =~ paste(1)*1*y7 + e*y8 + f*y9
  
  # structural model (fixed = 0; free = 3)
  H =~ j*NA*F1 + k*F2 + l*F3
  
  # first order factor residual variances (fixed = 0; free = 3)
  F1 ~~ g*F1
  F2 ~~ h*F2
  F3 ~~ i*F3
  
  # second order factor variance (fixed = 1; free = 0)
  H ~~ paste(1)*1*H
  
  # residual variance of indicators (fixed = 0; free = 9)
  y1 ~~ m*y1
  y2 ~~ n*y2
  y3 ~~ o*y3
  y4 ~~ p*y4
  y5 ~~ q*y5
  y6 ~~ r*y6
  y7 ~~ s*y7
  y8 ~~ t*y8
  y9 ~~ u*y9
'

Run the CFA

# Create a CFA model
fit <- cfa(mod, sample.cov = corr_matrix, std.lv = FALSE, sample.nobs = 301)

# Obtain model outpur
summary(fit, fit.measures = TRUE, rsquare = TRUE)
parameterEstimates(fit)
standardizedSolution(fit)

Model Specification

1. Show the path diagram

without labels

# Path diagram
library(semPlot)
p <- semPaths(fit, whatLabels = "std",
              sizeMan = 5,
              nCharNodes = 0, nCharEdges = 0,
              edge.width = 1.5, node.width = 1.4,
              rotation = 4,
              edge.color = "black",
              edge.label.cex = 1,
              style = "ram",
              mar = c(2, 4, 2, 4)) # D L U R

with labels

# Path diagram
library(semPlot)
p <- semPaths(fit, whatLabels = "name",
              sizeMan = 5,
              nCharNodes = 0, nCharEdges = 0,
              edge.width = 1.5, node.width = 1.4,
              rotation = 4,
              edge.color = "black",
              edge.label.cex = 1,
              style = "ram",
              mar = c(2, 4, 2, 4)) # D L U R

Path Diagram of the Higher Order Model (without labels)	Path Diagram of the Higher Order Model (with labels)

What this model is telling us Here we have fully standardized hierarchical model. It was created from 9 observable (manifest) variables. There are essentially a few parts to this model.

Measurement part: We start with the first order factors (F1, F2, F3), they are standardized to have a variance of 1. Their factor loadings fully explain the amount of variance in the indicator that is fully explainable. For example, if we want to investigate how much variance of an indicator is explained by the full hierarchical model, we take the factor loadings and square them, multiply them by the variance of their respective first order factor (1) and then subtract them from 1 (ex: 1-.77^2 = .40).
Structural part: We next move to the structural model part. Here we have a second order factor (H) and its beta coefficients explain variance from the first order factors. As mentioned, the variance of the first order factors is 1, the variance we see attributed to F1, F2, and F3 in the path diagram is its residual variance, or the proportion of variance that is not explained by higher order factor. Using the same math from the measurement model, we can explore how much of the variance of the first order factor is explained by the higher order factor (ex: 1-.87^2 = .24).
Specific vs General effects: While at first glance it seems that the higher order factor has no effect on the indicators, this is actually not the case at all. The explained variance of the indicator we calculated for the measurement model can be divided into the variance that is explained by the general factor (H) and the residual variance of the first order factor (specific factor). Thus we now use slightly different math. We start by identifying how much variance is explained by the higher order factor, so we need to multiply the square of the factor loadings with the square of the beta coefficient times the variance of the higher order factor (ex: .77^2 * .87^2 = 0.45). We now see how much of the indicator variance is explained by the specific factor by using the same math as part 1 but this time with the residual variance of the first order factor (ex: .77^2 * .24 = 0.14). We can then add these two variances together and subtract them from 1 (ex: 1 - (.45 + .14) = 0.41)
Residual variance: We have touched on this in the past three parts but whatever is not explained by the model is the residual variance. Thus for the indicators the values to their left indicate the percentage of variance not explained by the model. We can use this to evaluate how well the model is explaining different indicators, for example we see that for y2, 82% of the variance is not explained by this model, showing that it is a bad fit for this observable variable.
Labels: When looking at the labels version of the path diagram, we need to remind ourselves that this is for the unstandardized model. The math still work out the same though. Factor loadings are a,b,c,d,e,f, the residual variances for the first order factors are g,h,i, and the beta coefficients are i,k,l. The variances of the first order factors are var(F1), var(F2), and var(F3). However, we can more correctly write them like this var(F1) = $j^2 + g$, var(F2) = $k^2 + h$, var(F3) = $l^2 + i$. These residual variances that we see are just that, variances which are represented by one value. This is very different from a vector of error terms which we will use to write equations such as $e_1$ or $\xi_1$.

2. Measurement and Structural Model Equations

Measurement Model

$y_1 = 1 \times F1 + e_1$; $var(e_1) = m$
$y_2 = a \times F1 + e_2$; $var(e_2) = n$
$y_3 = b \times F1 + e_3$; $var(e_3) = o$
$y_4 = 1 \times F2 + e_4$; $var(e_4) = p$
$y_5 = c \times F2 + e_5$; $var(e_5) = q$
$y_6 = d \times F2 + e_6$; $var(e_6) = r$
$y_7 = 1 \times F3 + e_7$; $var(e_7) = s$
$y_8 = e \times F3 + e_8$; $var(e_8) = t$
$y_9 = f \times F3 + e_9$; $var(e_9) = u$

Structural Model

$F1 = j \times H + \xi_1$; $var(\xi_1) = g$
$F2 = k \times H + \xi_2$; $var(\xi_2) = h$
$F3 = l \times H + \xi_3$; $var(\xi_3) = i$
var$(H) = 1$

3. Calculate the Variances and Covariances Using the Estimated Parameters

Calculating Variances (Indicators)

var($y_1$) = cov($1 \times F1 + e_1, 1 \times F1 + e_1$) =
- $var(F1) + var(e_1)$ =
- $(j^2 + g) + m$
var($y_2$) = cov($a \times F1 + e_2, a \times F1 + e_2$) =
- $a^2var(F1) + var(e_2)$ =
- $a^2 \times (j^2 + g) + n$
var($y_3$) = cov($b \times F1 + e_3, b \times F1 + e_3$) =
- $b^2var(F1) + var(e_3)$ =
- $b^2 \times (j^2 + g) + o$
var($y_4$) = cov($1 \times F2 + e_4, 1 \times F2 + e_4$) =
- $var(F2) + var(e_4)$ =
- $ (k^2 + h) + p$
var($y_5$) = cov($c \times F2 + e_5, c \times F2 + e_5$) =
- $c^2var(F2) + var(e_5)$ =
- $c^2 \times (k^2 + h) + q$
var($y_6$) = cov($d \times F2 + e_6, d \times F2 + e_6$) =
- $d^2var(F2) + var(e_6)$ =
- $d^2 \times (k^2 + h) + r$
var($y_7$) = cov($1 \times F3 + e_7, 1 \times F3 + e_7$) =
- $var(F3) + var(e_7)$ =
- $ (l^2 + i) + s$
var($y_8$) = cov($e \times F3 + e_8, e \times F3 + e_8$) =
- $e^2var(F3) + var(e_7)$ =
- $e^2 \times (l^2 + i) + t$
var($y_9$) = cov($f \times F3 + e_9, f \times F3 + e_9$) =
- $f^2var(F3) + var(e_9)$ =
- $f^2 \times (l^2 + i) + u$

Calculating Variances (First Order Factors)

var($F1$) = cov($j \times H + \xi_1, j \times H + \xi_1$)
- $j^2 \times var(H) + var(\xi_1)$
- $j^2 \times 1 + g$
var($F2$) = cov($k \times H + \xi_2, k \times H + \xi_2$)
- $k^2 \times var(H) + var(\xi_2)$
- $k^2 \times 1 + h$
var($F3$) = cov($l \times H + \xi_3, l \times H + \xi_3$)
- $l^2 \times var(H) + var(\xi_3)$
- $l^2 \times 1 + i$

Calculating Variances (Higher Order Factor)

var($H$) = 1

Calculating Covariances (Indicators)