Lab 04 — CFA: measurement, local misfit, and ω reliability

Tommaso Feraco

Goals

In this lab you will:

Fit and compare CFA models (1-factor vs correlated factors)
Inspect local misfit for CFA (residual correlations, MI + EPC/SEPC)
Make theory-filtered respecification choices (and document them)
Compute and report reliability from CFA (ω family via semTools)
(Optional) Fit a bifactor model and evaluate interpretability (not just fit)

Setup

library(lavaan)
library(semTools)
library(semPlot)

We’ll use the built-in dataset HolzingerSwineford1939 (9 tests; classic CFA example).

dat <- HolzingerSwineford1939
vars <- paste0("x", 1:9)
dat <- dat[, vars]
summary(dat)

       x1              x2             x3             x4             x5      
 Min.   :0.667   Min.   :2.25   Min.   :0.25   Min.   :0.00   Min.   :1.00  
 1st Qu.:4.167   1st Qu.:5.25   1st Qu.:1.38   1st Qu.:2.33   1st Qu.:3.50  
 Median :5.000   Median :6.00   Median :2.12   Median :3.00   Median :4.50  
 Mean   :4.936   Mean   :6.09   Mean   :2.25   Mean   :3.06   Mean   :4.34  
 3rd Qu.:5.667   3rd Qu.:6.75   3rd Qu.:3.12   3rd Qu.:3.67   3rd Qu.:5.25  
 Max.   :8.500   Max.   :9.25   Max.   :4.50   Max.   :6.33   Max.   :7.00  
       x6              x7             x8              x9      
 Min.   :0.143   Min.   :1.30   Min.   : 3.05   Min.   :2.78  
 1st Qu.:1.429   1st Qu.:3.48   1st Qu.: 4.85   1st Qu.:4.75  
 Median :2.000   Median :4.09   Median : 5.50   Median :5.42  
 Mean   :2.186   Mean   :4.19   Mean   : 5.53   Mean   :5.37  
 3rd Qu.:2.714   3rd Qu.:4.91   3rd Qu.: 6.10   3rd Qu.:6.08  
 Max.   :6.143   Max.   :7.43   Max.   :10.00   Max.   :9.25

Part A — Competing measurement hypotheses

Model 1: One general factor

m1 <- '
  g =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9
'
fit1 <- cfa(m1, data = dat, std.lv = TRUE)
summary(fit1, fit.measures = TRUE, standardized = TRUE)

lavaan 0.6-19 ended normally after 18 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        18

  Number of observations                           301

Model Test User Model:
                                                      
  Test statistic                               312.264
  Degrees of freedom                                27
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               918.852
  Degrees of freedom                                36
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.677
  Tucker-Lewis Index (TLI)                       0.569

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3851.224
  Loglikelihood unrestricted model (H1)      -3695.092
                                                      
  Akaike (AIC)                                7738.448
  Bayesian (BIC)                              7805.176
  Sample-size adjusted Bayesian (SABIC)       7748.091

Root Mean Square Error of Approximation:

  RMSEA                                          0.187
  90 Percent confidence interval - lower         0.169
  90 Percent confidence interval - upper         0.206
  P-value H_0: RMSEA <= 0.050                    0.000
  P-value H_0: RMSEA >= 0.080                    1.000

Standardized Root Mean Square Residual:

  SRMR                                           0.143

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  g =~                                                                  
    x1                0.510    0.068    7.550    0.000    0.510    0.438
    x2                0.259    0.071    3.649    0.000    0.259    0.220
    x3                0.252    0.068    3.689    0.000    0.252    0.223
    x4                0.985    0.057   17.375    0.000    0.985    0.848
    x5                1.084    0.063   17.181    0.000    1.084    0.841
    x6                0.917    0.054   17.092    0.000    0.917    0.838
    x7                0.196    0.066    2.975    0.003    0.196    0.180
    x8                0.203    0.061    3.323    0.001    0.203    0.201
    x9                0.309    0.060    5.143    0.000    0.309    0.307

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                1.098    0.092   11.895    0.000    1.098    0.808
   .x2                1.315    0.108   12.188    0.000    1.315    0.951
   .x3                1.212    0.099   12.186    0.000    1.212    0.950
   .x4                0.380    0.048    7.963    0.000    0.380    0.281
   .x5                0.486    0.059    8.193    0.000    0.486    0.293
   .x6                0.356    0.043    8.295    0.000    0.356    0.298
   .x7                1.145    0.094   12.215    0.000    1.145    0.967
   .x8                0.981    0.080   12.202    0.000    0.981    0.960
   .x9                0.919    0.076   12.105    0.000    0.919    0.906
    g                 1.000                               1.000    1.000

Model 2: Three correlated factors (the classic HS structure)

m3 <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9
'
fit3 <- cfa(m3, data = dat, std.lv = TRUE)
summary(fit3, fit.measures = TRUE, standardized = TRUE)

lavaan 0.6-19 ended normally after 20 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        21

  Number of observations                           301

Model Test User Model:
                                                      
  Test statistic                                85.306
  Degrees of freedom                                24
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                               918.852
  Degrees of freedom                                36
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.931
  Tucker-Lewis Index (TLI)                       0.896

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -3737.745
  Loglikelihood unrestricted model (H1)      -3695.092
                                                      
  Akaike (AIC)                                7517.490
  Bayesian (BIC)                              7595.339
  Sample-size adjusted Bayesian (SABIC)       7528.739

Root Mean Square Error of Approximation:

  RMSEA                                          0.092
  90 Percent confidence interval - lower         0.071
  90 Percent confidence interval - upper         0.114
  P-value H_0: RMSEA <= 0.050                    0.001
  P-value H_0: RMSEA >= 0.080                    0.840

Standardized Root Mean Square Residual:

  SRMR                                           0.065

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual =~                                                             
    x1                0.900    0.081   11.128    0.000    0.900    0.772
    x2                0.498    0.077    6.429    0.000    0.498    0.424
    x3                0.656    0.074    8.817    0.000    0.656    0.581
  textual =~                                                            
    x4                0.990    0.057   17.474    0.000    0.990    0.852
    x5                1.102    0.063   17.576    0.000    1.102    0.855
    x6                0.917    0.054   17.082    0.000    0.917    0.838
  speed =~                                                              
    x7                0.619    0.070    8.903    0.000    0.619    0.570
    x8                0.731    0.066   11.090    0.000    0.731    0.723
    x9                0.670    0.065   10.305    0.000    0.670    0.665

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  visual ~~                                                             
    textual           0.459    0.064    7.189    0.000    0.459    0.459
    speed             0.471    0.073    6.461    0.000    0.471    0.471
  textual ~~                                                            
    speed             0.283    0.069    4.117    0.000    0.283    0.283

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .x1                0.549    0.114    4.833    0.000    0.549    0.404
   .x2                1.134    0.102   11.146    0.000    1.134    0.821
   .x3                0.844    0.091    9.317    0.000    0.844    0.662
   .x4                0.371    0.048    7.779    0.000    0.371    0.275
   .x5                0.446    0.058    7.642    0.000    0.446    0.269
   .x6                0.356    0.043    8.277    0.000    0.356    0.298
   .x7                0.799    0.081    9.823    0.000    0.799    0.676
   .x8                0.488    0.074    6.573    0.000    0.488    0.477
   .x9                0.566    0.071    8.003    0.000    0.566    0.558
    visual            1.000                               1.000    1.000
    textual           1.000                               1.000    1.000
    speed             1.000                               1.000    1.000

Exercise 1 — Compare the two models

1a) Compare global fit

Extract a minimal set of indices:

get_fit <- function(f) {
  fitMeasures(f, c("chisq","df","pvalue","cfi","tli","rmsea","rmsea.ci.lower","rmsea.ci.upper","srmr"))
}

rbind(one_factor = get_fit(fit1),
      three_factor = get_fit(fit3))

             chisq df  pvalue   cfi   tli  rmsea rmsea.ci.lower rmsea.ci.upper
one_factor   312.3 27 0.0e+00 0.677 0.569 0.1874         0.1690          0.206
three_factor  85.3 24 8.5e-09 0.931 0.896 0.0921         0.0714          0.114
               srmr
one_factor   0.1431
three_factor 0.0652

Questions

Which model fits better globally? Which indices support that conclusion?
Do you expect the 3-factor model to always fit better? Why/why not?

1b) Compare with a χ² difference test (nested?)

anova(fit1, fit3)


Chi-Squared Difference Test

     Df  AIC  BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)    
fit3 24 7517 7595  85.3                                        
fit1 27 7738 7805 312.3        227 0.498       3     <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Question

Is this comparison a valid nested χ² test here? Why (think: is the 1-factor model nested in the 3-factor model)?

Even when anova() runs, you must reason about nesting. Not every model comparison is a proper LRT.

Part B — Interpret key parameters (measurement-level thinking)

Exercise 2 — Loadings, residual variances, and factor correlations

2a) Standardized loadings

pe3 <- parameterEstimates(fit3, standardized = TRUE)
load3 <- subset(pe3, op == "=~")[, c("lhs","rhs","est","se","pvalue","std.all")]
load3

      lhs rhs   est    se pvalue std.all
1  visual  x1 0.900 0.081      0   0.772
2  visual  x2 0.498 0.077      0   0.424
3  visual  x3 0.656 0.074      0   0.581
4 textual  x4 0.990 0.057      0   0.852
5 textual  x5 1.102 0.063      0   0.855
6 textual  x6 0.917 0.054      0   0.838
7   speed  x7 0.619 0.070      0   0.570
8   speed  x8 0.731 0.066      0   0.723
9   speed  x9 0.670 0.065      0   0.665

Tasks

Identify the weakest indicator(s) by std.all.
Identify any signs that look strange (e.g., negative loadings, extremely low/high).

2b) Residual variances

resvar3 <- subset(pe3, op == "~~" & lhs %in% vars & rhs %in% vars)[, c("lhs","est","std.all")]
resvar3

   lhs   est std.all
10  x1 0.549   0.404
11  x2 1.134   0.821
12  x3 0.844   0.662
13  x4 0.371   0.275
14  x5 0.446   0.269
15  x6 0.356   0.298
16  x7 0.799   0.676
17  x8 0.488   0.477
18  x9 0.566   0.558

Question

Do any residual variances look “too small” or “too large”? What might that mean substantively?

2c) Factor correlations

phi3 <- subset(pe3, op == "~~" & lhs %in% c("visual","textual","speed") &
                 rhs %in% c("visual","textual","speed") & lhs != rhs)[, c("lhs","rhs","est","std.all")]
phi3

       lhs     rhs   est std.all
22  visual textual 0.459   0.459
23  visual   speed 0.471   0.471
24 textual   speed 0.283   0.283

Questions

Are factor correlations high? What would make you worry about discriminant validity?
In your area, what theory would justify correlated factors?

Part C — Local misfit: residual correlations and MI (CFA-specific)

Global fit is a summary. Now locate misfit.

Exercise 3 — Residual correlations (where does the model fail?)

res_std <- residuals(fit3, type = "standardized")$cov
res_std

       x1     x2     x3     x4     x5     x6     x7     x8     x9
x1  0.000                                                        
x2 -1.996  0.000                                                 
x3 -0.997  2.689  0.000                                          
x4  2.679 -0.284 -1.899  0.000                                   
x5 -0.359 -0.591 -4.157  1.545  0.000                            
x6  2.155  0.681 -0.711 -2.588  0.941  0.000                     
x7 -3.773 -3.654 -1.858  0.865 -0.842 -0.326  0.000              
x8 -1.380 -1.119 -0.300 -2.021 -1.099 -0.641  4.823  0.000       
x9  4.077  1.606  3.518  1.225  1.701  1.423 -2.325 -4.132  0.000

Tasks

Find the largest absolute standardized residual correlations (top 3–5).
(Tip: scan the matrix or use code below.)

# helper: get top absolute residuals (excluding diagonal)
R <- res_std
diag(R) <- NA
top <- as.data.frame(as.table(R))
top <- top[order(abs(top$Freq), decreasing = TRUE), ]
head(top, 10)

   Var1 Var2  Freq
62   x8   x7  4.82
70   x7   x8  4.82
23   x5   x3 -4.16
39   x3   x5 -4.16
72   x9   x8 -4.13
80   x8   x9 -4.13
9    x9   x1  4.08
73   x1   x9  4.08
7    x7   x1 -3.77
55   x1   x7 -3.77

For each large residual pair, write a measurement-level hypothesis:
- shared wording/content?
- method effect?
- plausible cross-loading?
- local dependence (same stimulus format, similar items)?

Exercise 4 — MI + EPC/SEPC (use with a theory filter)

mi <- modificationIndices(fit3, sort. = TRUE)
head(mi[, c("lhs","op","rhs","mi","epc","sepc.all")], 15)

       lhs op rhs    mi    epc sepc.all
30  visual =~  x9 36.41  0.519    0.515
76      x7 ~~  x8 34.15  0.536    0.859
28  visual =~  x7 18.63 -0.380   -0.349
78      x8 ~~  x9 14.95 -0.423   -0.805
33 textual =~  x3  9.15 -0.269   -0.238
55      x2 ~~  x7  8.92 -0.183   -0.192
31 textual =~  x1  8.90  0.347    0.297
51      x2 ~~  x3  8.53  0.218    0.223
59      x3 ~~  x5  7.86 -0.130   -0.212
26  visual =~  x5  7.44 -0.189   -0.147
50      x1 ~~  x9  7.33  0.138    0.247
65      x4 ~~  x6  6.22 -0.235   -0.646
66      x4 ~~  x7  5.92  0.098    0.180
48      x1 ~~  x7  5.42 -0.129   -0.195
77      x7 ~~  x9  5.18 -0.187   -0.278

Tasks

Compare MI results with your largest residual pairs. Do they point to the same area?
Pick one candidate modification, but only if you can justify it conceptually.
Decide: is it more plausible as:
- correlated residual (x_i ~~ x_j) OR
- cross-loading (factor =~ x_j)?

Correlated residuals are not “free fit”: they change the measurement story (local dependence). Cross-loadings threaten simple structure and interpretation.

Exercise 5 — Respecify (one change), refit, document

Create m3b by copying m3 and adding exactly one modification you justified.

Examples:

correlated residual: x1 ~~ x2
cross-loading: visual =~ x4 (be careful: interpretability!)

m3b <- '
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9

  # ADD ONE THEORY-JUSTIFIED PARAMETER HERE
  # x1 ~~ x2
'
fit3b <- cfa(m3b, data = dat, std.lv = TRUE)

Compare fit:

rbind(original = get_fit(fit3),
      modified = get_fit(fit3b))

         chisq df  pvalue   cfi   tli  rmsea rmsea.ci.lower rmsea.ci.upper
original  85.3 24 8.5e-09 0.931 0.896 0.0921         0.0714          0.114
modified  85.3 24 8.5e-09 0.931 0.896 0.0921         0.0714          0.114
           srmr
original 0.0652
modified 0.0652

Compare nested models (if nested):

anova(fit3, fit3b)


Chi-Squared Difference Test

      Df  AIC  BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit3  24 7517 7595  85.3                                    
fit3b 24 7517 7595  85.3          0     0       0

Documentation (write 3 bullet points)

What did you add and why?
What diagnostic evidence supported it (residuals? MI+EPC?)?
Does the modification change the interpretation of the factor(s)?

Part D — Reliability from CFA (ω family)

Exercise 6 — Compute ω reliability

reliability(fit3)

       visual textual speed
alpha   0.626   0.883 0.688
omega   0.625   0.885 0.688
omega2  0.625   0.885 0.688
omega3  0.612   0.885 0.686
avevar  0.371   0.721 0.424

Questions

Which ω coefficients are reported? What do they mean conceptually?
Compare ω across the three factors. Which factor seems most reliable and why?

Optional Part E — Bifactor model (advanced)

This is not required, but if you want to keep the bifactor material “real”, do it once.

A bifactor model: a general factor g loads on all items, plus domain-specific factors. Domain factors are often set orthogonal to g (and sometimes mutually orthogonal).

mbi <- '
  # group factors
  visual  =~ x1 + x2 + x3
  textual =~ x4 + x5 + x6
  speed   =~ x7 + x8 + x9

  # general factor
  g =~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9
'
fit_bi <- cfa(mbi, data = dat, std.lv = TRUE, orthogonal = TRUE)
get_fit(fit_bi)

         chisq             df         pvalue            cfi            tli 
        34.880         18.000          0.010          0.981          0.962 
         rmsea rmsea.ci.lower rmsea.ci.upper           srmr 
         0.056          0.027          0.083          0.047

Tasks

Does bifactor fit “better”? (It often does.)
Are all factors interpretable? Inspect standardized loadings.

pe_bi <- parameterEstimates(fit_bi, standardized = TRUE)
subset(pe_bi, op == "=~")[, c("lhs","rhs","std.all")]

       lhs rhs std.all
1   visual  x1  -0.999
2   visual  x2   0.161
3   visual  x3   0.143
4  textual  x4   0.761
5  textual  x5   0.824
6  textual  x6   0.742
7    speed  x7   0.660
8    speed  x8   0.715
9    speed  x9   0.476
10       g  x1   0.918
11       g  x2   0.499
12       g  x3   0.635
13       g  x4   0.371
14       g  x5   0.287
15       g  x6   0.377
16       g  x7   0.060
17       g  x8   0.247
18       g  x9   0.440

Compute reliability indices.

reliability(fit_bi)

       visual textual speed     g
alpha  0.6261   0.883 0.688 0.760
omega  0.5266   0.869 0.686 0.846
omega2 0.0961   0.744 0.620 0.551
omega3 0.0961   0.744 0.620 0.535
avevar     NA      NA    NA    NA

Interpret bifactor with diagnostics (e.g., ωH/ECV/H). Fit alone is not enough. If you want more, see the extras module on bifactor/ESEM/method factors.

Wrap-up

Take-home

CFA is a measurement hypothesis: zeros and constraints are theory.
Pair global fit with CFA-specific local diagnostics (residual correlations + MI/EPC).
Respecify with discipline: one change at a time, theory filter, transparent reporting.
Reliability should match your measurement model: ω from CFA is usually a better default than α.

Solutions (instructor version)

#| echo: false if (SHOW) { # Typical modification: correlate residuals within a domain (often x1 ~~ x2 in HS) m3b_sol <- ’ visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 x1 ~~ x2 ’ fit3b_sol <- cfa(m3b_sol, data = dat, std.lv = TRUE)

list( fit1 = get_fit(fit1), fit3 = get_fit(fit3), top_residuals = head(top, 10), top_mi = head(mi[, c(“lhs”,“op”,“rhs”,“mi”,“epc”,“sepc.all”)], 10), fit3b = get_fit(fit3b_sol), lrt = anova(fit3, fit3b_sol), omega3 = reliability(fit3) ) }