08.Latent variable modeling01.Factor analysis - sporedata/researchdesigneR GitHub Wiki

1. Use cases: in which situations should I use this method?

Whenever a collection of questions (items) is believed to represent a latent variable (also called factor or dimension). Examples include quality of life, depression, anxiety, physical function, among others.

2. Input: what kind of data does the method require?

Multiple variables representing questions that are believed to relate to one or more latent variables.

3. Algorithm: how does the method work?

Model mechanics

Describing in words

Factor analysis is a method to describe how variables vary together (for example their correlation). The shared variability is hypothesized to represent a common latent variable. Factor analysis is usually classified as:

Exploratory factor analysis: when there is no prespecified hypothesis of the latent variable structure.
Confirmatory factor analysis: whenever there is a hypothesis regarding the underline structure of the latent variable.
Bifactor modeling: whenever the factor structure could be explained by one overall structure (also known as g + other factors connecting to each of the items). It can be used in partitioning common and construct-specific variance in a manner not possible by other common factor-based methods. Common variance refers to variance in items that can be explained by one common factor. Construct-specific variance refers to additional variance not explained by the common variance specific to a limited number of items within the larger set. However, some limitations need to be addressed to properly use bifactor modeling, e.g., factor collapse--a phenomenon that occurs when an excessive amount of variance is shifted away from one or more of the specific factors toward the general.
Higher-order factors modeling: The higher-order factors model aims to answer the following questions: What higher-order factor structure best explains the covariance among the factors (not items) in the measurement model? What should the structural model of the factors look like? The higher-order factors aim to forecast covariance among the measurement model factors themselves, just as the measurement model factors aim to predict covariance among the items. Similar correlations across the measurement model elements would imply a single higher-order factor.
Multilevel confirmatory factor analysis: where any given item can be connected to more than one factor.

According to Revelle [1], determining the number of factors to extract is an unsolved problem in psychometrics. However, many techniques aim to solve this problem, although none of them is uniformly the best. The most common techniques include:

Plotting the magnitude of the successive eigenvalues and applying the scree test. The scree represents a sudden drop in eigenvalues, analogous to the change in slope seen when approaching the rock face of a mountain [2]. Criticism: the scree test is quite appealing but can lead to differences of interpretation as to when the scree "breaks".
Extracting factors as long as they are interpretable. Criticism: the number of extracted factors reflects the investigators creativity more than the data.
Using the Very Simple Structure (VSS) Criterion [3]. The VSS Criterion compares the fit of several factor analyses with the loading matrix "simplified" by deleting all but the c greatest loadings per item, where c is a measure of factor complexity. Criticism: VSS, while very simple to understand, will not work very well if the data are very factorially complex. (Simulations suggest it will work fine if the complexities of some of the items are no more than 2) [1].

Case study: If ever asked to explain the meaning of the loading, h2, and u2 of the table with our factor solution of choice, a basic explanation is that factor loading is the correlation coefficient for the variable and factor and shows the variance explained by the variable on that particular factor; h2 column represents the communalities, which indicate the common variance shared by factors represented by a set of variables, with a higher communality indicating that larger amounts of variance associated with a variable have been extracted by the factor solution. u2 is the uniqueness score, 1-h2.

Using Wayne Velicer’s Minimum Average Partial (MAP) criterion [4].
Extracting principal components until the eigenvalue < 1. Criticism: the eigenvalue of 1 rule, although the default for many programs, seems to be a rough way of dividing the number of variables by 3 and is probably the worst of all criteria.

Describing in images

Describing with code

Breaking down equations

Reporting guidelines

Data science packages

mirt: Multidimensional Item Response Theory
mirtCAT: Computerized Adaptive Testing with Multidimensional Item Response Theory
RatingScaleReduction package: stepwise rating scale item reduction without predictability loss - This package includes a method for reducing the number of rating scale items without predictability loss. It uses the area under the receiver operator curve (AUC ROC) for the stepwise method of reducing items of a rating scale.

Suggested companion methods

Learning materials

Books

Articles
- Common references for latent variable
- Best Practices for Developing and Validating Scales for Health, Social, and Behavioral Research: A Primer - Article with an overview of scale development and evaluation, including information on domain identification, item generation, content validity, expert evaluation, cognitive interviews, survey adminisration, item reduction analyses, extraction of factors, dimensionality tests, reliability, and validity.

4. Output: how do I interpret this method's results?

Mock conclusions or most frequent format for conclusions reached at the end of a typical analysis.

Tables, plots, and their interpretation

Table to display factor analysis results with different factor solutions [5]:

Table to display factor analysis results with different factor solutions

Scree plot [6]:

Scree plot

Correlation plot [7]:

Correlation plot

5. SporeData-specific

Templates

List of psychometric properties commonly used scales

Data science functions

vars <- c("A1", "A2", "A3", "A4", "A5", "C1", "C2", "C3", "C4", "C5",
 "N1", "N2", "N3", "N4", "N5")
efa_analysis <- sdatools::ExploratoryFactorAnalysis(bfi, vars)

sdatools::correlationPlot(efa_analysis)

References

[1] Revelle, W. How To: Use the psych package for Factor Analysis and data reduction. Department of Psychology, Northwestern University. 2021.

[2] Cattell, R. B. The scree test for the number of factors. Multivariate Behavioral Research, 1(2):245–276. 1966.

[3] Revelle, W. and Rocklin, T. Very Simple Structure - alternative procedure for estimating the optimal number of interpretable factors. Multivariate Behavioral Research, 14(4):403–414. 1979.

[4] Velicer, W. Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3):321–327. 1976.

[5] Knekta E, Runyon C, Eddy S. One size doesn’t fit all: Using factor analysis to gather validity evidence when using surveys in your research. CBE—Life Sciences Education. 2019;18(1):rm1.

[6] Kassambara A. Practical guide to principal component methods in R: PCA, M (CA), FAMD, MFA, HCPC, factoextra. STHDA 2017 Aug 23.

[7] https://analyticsbuddhu.wordpress.com/2017/03/05/how-to-do-factor-analysis-using-r/