Overview

Here we will simulate some dummy data, run the categorical confirmatory factory analysis, and then interpret the results.

Step 1: Generating the Data

# Load packages
library(MASS)
library(lavaan)

# Set seed
set.seed(123)

N <- 500  
# Simulate correlated latent traits
Sigma <- matrix(c(1, 0.5, 0.4,
                  0.5, 1, 0.3,
                  0.4, 0.3, 1), nrow = 3)
latent_factors <- mvrnorm(N, mu = c(0, 0, 0), Sigma = Sigma)
colnames(latent_factors) <- c("F1", "F2", "F3")

# Logistic model: item_response = factor_loading * latent_factor + error
make_ordinal <- function(theta, loading = 1, thresholds = c(-0.5, 0.5)) {
  z <- loading * theta + rnorm(length(theta))
  cut(z, breaks = c(-Inf, thresholds, Inf), labels = 0:2)
}

# 5 items per factor
items <- data.frame(
  item1 = make_ordinal(latent_factors[, "F1"], 1),
  item2 = make_ordinal(latent_factors[, "F1"], 0.9),
  item3 = make_ordinal(latent_factors[, "F1"], 0.8),
  item4 = make_ordinal(latent_factors[, "F1"], 1.1),
  item5 = make_ordinal(latent_factors[, "F1"], 1),
  
  item6 = make_ordinal(latent_factors[, "F2"], 1),
  item7 = make_ordinal(latent_factors[, "F2"], 1.2),
  item8 = make_ordinal(latent_factors[, "F2"], 0.9),
  item9 = make_ordinal(latent_factors[, "F2"], 1),
  item10 = make_ordinal(latent_factors[, "F2"], 1.1),
  
  item11 = make_ordinal(latent_factors[, "F3"], 1),
  item12 = make_ordinal(latent_factors[, "F3"], 0.9),
  item13 = make_ordinal(latent_factors[, "F3"], 1.2),
  item14 = make_ordinal(latent_factors[, "F3"], 1),
  item15 = make_ordinal(latent_factors[, "F3"], 1)
)

# Convert to ordered factors
items[] <- lapply(items, ordered)

We will now explain what is occurring in the code step by step to showcase why this model works. First we start by creating a matrix ('Sigma'), which will represent 1) the number of factors in the data; 2) their variances; and 3) how well they correlate with each other. We created three factors all with a variance set to 1 and with moderate correlations to each other.

Then we use the mvnorm() function to produce a dataset that is 500 rows long (observation) and has three columns (number of factors). the values represent theta values. These values were designed to match the data relationship we set with 'Sigma' and the means of each factor was set to 0 using the 'mu' parameter.

Next we create a custom function named make_ordinal() which essentially takes a vector (theta values) and multiplies it by a factor loading that we specify and then adds some noise to it. These values are then categorized in 0, 1, or 2 based on their value. This is done 5 times for each factor, thus simulating responses on 15 items, which will be the data we will run the categorical CFA on.

Sigma Correlation Matrix	Theta Values for Each Factor	Generate Ordinal Items From Theta Values

Visualizing the Data

# Visualizing the data
items_summarised <- items %>%
  pivot_longer(item1:item15, names_to = "items", values_to = "response") %>%
  mutate(factors = case_when(
    items %in% paste0("item",  1:5)  ~ "F1",
    items %in% paste0("item",  6:10) ~ "F2",
    items %in% paste0("item", 11:15) ~ "F3"
  )) %>%
  mutate(response = as.numeric(response)) %>%
  group_by(factors, items, response) %>%
  summarise(response_freq = length(items))

item_order <- paste0("item",15:1)

items_summarised %>%
  mutate(response = factor(response),
         items = factor(items, levels = item_order)) %>%
  ggplot(aes(x = items, y = response_freq, fill = response)) +
  geom_bar(stat = "identity") +
  coord_flip() + 
  theme_classic()

sapply(items, function(x) table(x))

Distribution of Responses by Items	Averaged Items Within Factors Correlation

Step 3: Specify the Model

model <- '
  # Measurement Model
  F1 =~ 1*item1 + item2 + item3 + item4 + item5
  F2 =~ 1*item6 + item7 + item8 + item9 + item10
  F3 =~ 1*item11 + item12 + item13 + item14 + item15
'

Step 4: Making Sense of the Model Output

fit <- cfa(model, data = items, estimator = "WLSMV", ordered = colnames(items))
summary(fit, standardized = TRUE, fit.measures = TRUE)
parameterEstimates(fit)
standardizedSolution(fit)
inspect(fit, "r2")

Unstandardized Parameter Estimated Part 1	Unstandardized Parameter Estimated Part 1

We start by seeing the unstandardized factor loadings for each item and their corresponding factor (=~). The first item for each factor was fixed to the value 1. Then we see something new which is the | operator. This tells us the threshold where a response will change to the next sequential response for that item. Since each item has three unique responses, there are two thresholds to represent the change. The next operator we see is ~~ which functions as the residual variance for that item or the unstandardized variance for the factor. They also show the covariances between the factors. We then have our scaling factors ~*~, not very sure what this means yet but they have to be fixed to 1 to make the rest of the model work. Lastly, the ~1 operator represents the intercepts/means- notice they have been fixed to 0.

Anyway, even though the output is pretty extensive, we really only care about the same aspects that we did during regular CFA, which is the factor loadings and the variances and covariances between the factors. The residual variances are also important for the items but it becomes more meaningful (I think) when the model is standardized.

Standardized Factor Loadings	Standardized Residual Variances, Factor Variances & Factor Correlations	Item r2

This is the more meaningful part of the output- the |,~1 and ~*~ is redundant. By looking at our standardized factor loadings we see that each item has a pretty good relationship with their respective factor. We see that the variance of the factor is now set to 1, which is what we want. The residual variances of the items give us an even better idea of how each item is being explained by the factor, we can basically think of them as 1 - r^2. Lastly the standardized covariances between the factors represent their correlations. Notice how they are kinda similar to what we programed for 'Sigma' back in step 1.

Step 5: Visualizing the Path Diagram

# Plot it
library(semPlot)
semPaths(fit,
         what = "std",                     # Standardized estimates
         whatLabels = "std",               # Show estimated values
         layout = "tree",                  # Hierarchical layout
         edge.label.cex = 1,             # Edge label size
         node.label.cex = 1,             # Node label size
         sizeMan = 7,                      # Size of item rectangles
         sizeMan2 = 4,
         sizeLat = 10,                     # Size of factor ovals
         edge.color = "black",             # Black edges for clarity
         nodeLabels = c("Item 1", "Item 2", "Item 3", "Item 4", "Item 5",
                        "Item 6", "Item 7", "Item 8", "Item 9", "Item 10",
                        "Item 11", "Item 12", "Item 13", "Item 14", "Item 15",
                        "Factor 1", "Factor 2", "Factor 3"), # Custom labels
         residuals = TRUE,                 # Show residual variances
         exoCov = TRUE,                    # Show factor correlations
         intercepts = FALSE,               # Exclude intercepts
         title = FALSE,                    # Custom title added later
         curve = 3.75,                      # Curved arrows for correlations
         rotation = 2,                     # Vertical orientation for items
         normalize = FALSE,            # mitigates overlap between indicators
         weighted = FALSE,             # gets rid of dumb thickness in the lines
         mar = c(3,15,3,10), # D L U R
)

The Path Diagram of the categorical CFA (Standardized)

Step 6: Comparing Categorical CFA to IRT GRM


# Run a Graded Response Model
F1_irt <- grm(select(items,item1:item5), constrained = FALSE, Hessian = TRUE, control = list(iter.max = 200))
F2_irt <- grm(select(items,item6:item10), constrained = FALSE, Hessian = TRUE, control = list(iter.max = 200))
F3_irt <- grm(select(items,item11:item15), constrained = FALSE, Hessian = TRUE, control = list(iter.max = 200))

# Extract the thresholds
thresh_discr_F1 <- coef(F1_irt, simplify = TRUE)
thresh_F1 <- thresh_discr_F1[, c("Extrmt1", "Extrmt2")]
thresh_discr_F2 <- coef(F2_irt, simplify = TRUE)
thresh_F2 <- thresh_discr_F2[, c("Extrmt1", "Extrmt2")]
thresh_discr_F3 <- coef(F3_irt, simplify = TRUE)
thresh_F3 <- thresh_discr_F3[, c("Extrmt1", "Extrmt2")]

irt_thresh <- data.frame(rbind(thresh_F1, thresh_F2, thresh_F3))
irt_thresh$item <- row.names(irt_thresh)
irt_thresh$model <- "irt_GRM"

# Obtain threshold estimates from CFA
cfa_thresh <- parameterEstimates(fit) %>%
  filter(op %in% "|") %>%
  select(lhs, rhs, est) %>%
  pivot_wider(names_from = rhs, values_from = est) %>%
  rename(item = lhs, Extrmt1 = t1, Extrmt2 = t2)
cfa_thresh$model <- "cat_CFA"

# Now let's combine the two to plot
rbind(irt_thresh, cfa_thresh) %>%
  mutate(item = rep(1:15, 2),
         item = factor(item)) %>%
  pivot_longer(cols = c(Extrmt1, Extrmt2), names_to = "threshold_type", values_to = "value") %>%
  ggplot(aes(x = item, y = value, shape = threshold_type, color = model)) +
  geom_point(size = 2.5) +
  #facet_grid(threshold_type ~ ., scales = "free_y") +
  theme_linedraw()


# Comparing theta values
theta_scores_CFA <- data.frame(lavPredict(fit, type = "lv", method = "EBM"))
theta_scores_CFA$model <- "cat_CFA"

# irt theta values
theta_scores_GRM <- data.frame(
F1 = factor.scores(F1_irt, resp.patterns = select(items, item1:item5), method = "EAP")$score.dat$z1,
F2 = factor.scores(F2_irt, resp.patterns = select(items, item6:item10), method = "EAP")$score.dat$z1,
F3 = factor.scores(F2_irt, resp.patterns = select(items, item11:item15), method = "EAP")$score.dat$z1
)
theta_scores_GRM$model <- "irt_GRM" 

# Bind them together to plot
rbind(theta_scores_CFA, theta_scores_GRM) %>%
  pivot_longer(-model,
               names_to = "Factor",
               values_to = "Theta_values") %>%
  ggplot(aes(x = Theta_values)) +
  geom_histogram() +
  facet_grid(model~Factor) +
  theme_linedraw()

Estimated Threshold Comparisons Between CFA and IRT Models	Comparing Theta Values Between Models

Getting Started with Categorical Confirmation Factor Analysis - Private-Projects237/Statistics GitHub Wiki

Overview

Step 1: Generating the Data

Visualizing the Data

Step 3: Specify the Model

Step 4: Making Sense of the Model Output

Step 5: Visualizing the Path Diagram

Step 6: Comparing Categorical CFA to IRT GRM

⚠️ GitHub.com Fallback ⚠️

Getting Started with Categorical Confirmation Factor Analysis - Private-Projects237/Statistics GitHub Wiki

Overview

Step 1: Generating the Data

Visualizing the Data

Step 3: Specify the Model

Step 4: Making Sense of the Model Output

Step 5: Visualizing the Path Diagram

Step 6: Comparing Categorical CFA to IRT GRM

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️