Show code
library(lavaan)
library(semTools)
library(semPlot)Tommaso Feraco
In this lab you will:
semTools)We’ll use the built-in dataset HolzingerSwineford1939 (9 tests; classic CFA example).
x1 x2 x3 x4 x5
Min. :0.667 Min. :2.25 Min. :0.25 Min. :0.00 Min. :1.00
1st Qu.:4.167 1st Qu.:5.25 1st Qu.:1.38 1st Qu.:2.33 1st Qu.:3.50
Median :5.000 Median :6.00 Median :2.12 Median :3.00 Median :4.50
Mean :4.936 Mean :6.09 Mean :2.25 Mean :3.06 Mean :4.34
3rd Qu.:5.667 3rd Qu.:6.75 3rd Qu.:3.12 3rd Qu.:3.67 3rd Qu.:5.25
Max. :8.500 Max. :9.25 Max. :4.50 Max. :6.33 Max. :7.00
x6 x7 x8 x9
Min. :0.143 Min. :1.30 Min. : 3.05 Min. :2.78
1st Qu.:1.429 1st Qu.:3.48 1st Qu.: 4.85 1st Qu.:4.75
Median :2.000 Median :4.09 Median : 5.50 Median :5.42
Mean :2.186 Mean :4.19 Mean : 5.53 Mean :5.37
3rd Qu.:2.714 3rd Qu.:4.91 3rd Qu.: 6.10 3rd Qu.:6.08
Max. :6.143 Max. :7.43 Max. :10.00 Max. :9.25
lavaan 0.6-19 ended normally after 18 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 18
Number of observations 301
Model Test User Model:
Test statistic 312.264
Degrees of freedom 27
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 918.852
Degrees of freedom 36
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.677
Tucker-Lewis Index (TLI) 0.569
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3851.224
Loglikelihood unrestricted model (H1) -3695.092
Akaike (AIC) 7738.448
Bayesian (BIC) 7805.176
Sample-size adjusted Bayesian (SABIC) 7748.091
Root Mean Square Error of Approximation:
RMSEA 0.187
90 Percent confidence interval - lower 0.169
90 Percent confidence interval - upper 0.206
P-value H_0: RMSEA <= 0.050 0.000
P-value H_0: RMSEA >= 0.080 1.000
Standardized Root Mean Square Residual:
SRMR 0.143
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
g =~
x1 0.510 0.068 7.550 0.000 0.510 0.438
x2 0.259 0.071 3.649 0.000 0.259 0.220
x3 0.252 0.068 3.689 0.000 0.252 0.223
x4 0.985 0.057 17.375 0.000 0.985 0.848
x5 1.084 0.063 17.181 0.000 1.084 0.841
x6 0.917 0.054 17.092 0.000 0.917 0.838
x7 0.196 0.066 2.975 0.003 0.196 0.180
x8 0.203 0.061 3.323 0.001 0.203 0.201
x9 0.309 0.060 5.143 0.000 0.309 0.307
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 1.098 0.092 11.895 0.000 1.098 0.808
.x2 1.315 0.108 12.188 0.000 1.315 0.951
.x3 1.212 0.099 12.186 0.000 1.212 0.950
.x4 0.380 0.048 7.963 0.000 0.380 0.281
.x5 0.486 0.059 8.193 0.000 0.486 0.293
.x6 0.356 0.043 8.295 0.000 0.356 0.298
.x7 1.145 0.094 12.215 0.000 1.145 0.967
.x8 0.981 0.080 12.202 0.000 0.981 0.960
.x9 0.919 0.076 12.105 0.000 0.919 0.906
g 1.000 1.000 1.000
lavaan 0.6-19 ended normally after 20 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 21
Number of observations 301
Model Test User Model:
Test statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Model Test Baseline Model:
Test statistic 918.852
Degrees of freedom 36
P-value 0.000
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.931
Tucker-Lewis Index (TLI) 0.896
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3737.745
Loglikelihood unrestricted model (H1) -3695.092
Akaike (AIC) 7517.490
Bayesian (BIC) 7595.339
Sample-size adjusted Bayesian (SABIC) 7528.739
Root Mean Square Error of Approximation:
RMSEA 0.092
90 Percent confidence interval - lower 0.071
90 Percent confidence interval - upper 0.114
P-value H_0: RMSEA <= 0.050 0.001
P-value H_0: RMSEA >= 0.080 0.840
Standardized Root Mean Square Residual:
SRMR 0.065
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual =~
x1 0.900 0.081 11.128 0.000 0.900 0.772
x2 0.498 0.077 6.429 0.000 0.498 0.424
x3 0.656 0.074 8.817 0.000 0.656 0.581
textual =~
x4 0.990 0.057 17.474 0.000 0.990 0.852
x5 1.102 0.063 17.576 0.000 1.102 0.855
x6 0.917 0.054 17.082 0.000 0.917 0.838
speed =~
x7 0.619 0.070 8.903 0.000 0.619 0.570
x8 0.731 0.066 11.090 0.000 0.731 0.723
x9 0.670 0.065 10.305 0.000 0.670 0.665
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
visual ~~
textual 0.459 0.064 7.189 0.000 0.459 0.459
speed 0.471 0.073 6.461 0.000 0.471 0.471
textual ~~
speed 0.283 0.069 4.117 0.000 0.283 0.283
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.x1 0.549 0.114 4.833 0.000 0.549 0.404
.x2 1.134 0.102 11.146 0.000 1.134 0.821
.x3 0.844 0.091 9.317 0.000 0.844 0.662
.x4 0.371 0.048 7.779 0.000 0.371 0.275
.x5 0.446 0.058 7.642 0.000 0.446 0.269
.x6 0.356 0.043 8.277 0.000 0.356 0.298
.x7 0.799 0.081 9.823 0.000 0.799 0.676
.x8 0.488 0.074 6.573 0.000 0.488 0.477
.x9 0.566 0.071 8.003 0.000 0.566 0.558
visual 1.000 1.000 1.000
textual 1.000 1.000 1.000
speed 1.000 1.000 1.000
Extract a minimal set of indices:
chisq df pvalue cfi tli rmsea rmsea.ci.lower rmsea.ci.upper
one_factor 312.3 27 0.0e+00 0.677 0.569 0.1874 0.1690 0.206
three_factor 85.3 24 8.5e-09 0.931 0.896 0.0921 0.0714 0.114
srmr
one_factor 0.1431
three_factor 0.0652
Questions
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit3 24 7517 7595 85.3
fit1 27 7738 7805 312.3 227 0.498 3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Question
Even when anova() runs, you must reason about nesting. Not every model comparison is a proper LRT.
lhs rhs est se pvalue std.all
1 visual x1 0.900 0.081 0 0.772
2 visual x2 0.498 0.077 0 0.424
3 visual x3 0.656 0.074 0 0.581
4 textual x4 0.990 0.057 0 0.852
5 textual x5 1.102 0.063 0 0.855
6 textual x6 0.917 0.054 0 0.838
7 speed x7 0.619 0.070 0 0.570
8 speed x8 0.731 0.066 0 0.723
9 speed x9 0.670 0.065 0 0.665
Tasks
std.all. lhs est std.all
10 x1 0.549 0.404
11 x2 1.134 0.821
12 x3 0.844 0.662
13 x4 0.371 0.275
14 x5 0.446 0.269
15 x6 0.356 0.298
16 x7 0.799 0.676
17 x8 0.488 0.477
18 x9 0.566 0.558
Question
lhs rhs est std.all
22 visual textual 0.459 0.459
23 visual speed 0.471 0.471
24 textual speed 0.283 0.283
Questions
Global fit is a summary. Now locate misfit.
x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 0.000
x2 -1.996 0.000
x3 -0.997 2.689 0.000
x4 2.679 -0.284 -1.899 0.000
x5 -0.359 -0.591 -4.157 1.545 0.000
x6 2.155 0.681 -0.711 -2.588 0.941 0.000
x7 -3.773 -3.654 -1.858 0.865 -0.842 -0.326 0.000
x8 -1.380 -1.119 -0.300 -2.021 -1.099 -0.641 4.823 0.000
x9 4.077 1.606 3.518 1.225 1.701 1.423 -2.325 -4.132 0.000
Tasks
Var1 Var2 Freq
62 x8 x7 4.82
70 x7 x8 4.82
23 x5 x3 -4.16
39 x3 x5 -4.16
72 x9 x8 -4.13
80 x8 x9 -4.13
9 x9 x1 4.08
73 x1 x9 4.08
7 x7 x1 -3.77
55 x1 x7 -3.77
lhs op rhs mi epc sepc.all
30 visual =~ x9 36.41 0.519 0.515
76 x7 ~~ x8 34.15 0.536 0.859
28 visual =~ x7 18.63 -0.380 -0.349
78 x8 ~~ x9 14.95 -0.423 -0.805
33 textual =~ x3 9.15 -0.269 -0.238
55 x2 ~~ x7 8.92 -0.183 -0.192
31 textual =~ x1 8.90 0.347 0.297
51 x2 ~~ x3 8.53 0.218 0.223
59 x3 ~~ x5 7.86 -0.130 -0.212
26 visual =~ x5 7.44 -0.189 -0.147
50 x1 ~~ x9 7.33 0.138 0.247
65 x4 ~~ x6 6.22 -0.235 -0.646
66 x4 ~~ x7 5.92 0.098 0.180
48 x1 ~~ x7 5.42 -0.129 -0.195
77 x7 ~~ x9 5.18 -0.187 -0.278
Tasks
x_i ~~ x_j) ORfactor =~ x_j)?Correlated residuals are not “free fit”: they change the measurement story (local dependence). Cross-loadings threaten simple structure and interpretation.
Create m3b by copying m3 and adding exactly one modification you justified.
Examples:
x1 ~~ x2visual =~ x4 (be careful: interpretability!)Compare fit:
chisq df pvalue cfi tli rmsea rmsea.ci.lower rmsea.ci.upper
original 85.3 24 8.5e-09 0.931 0.896 0.0921 0.0714 0.114
modified 85.3 24 8.5e-09 0.931 0.896 0.0921 0.0714 0.114
srmr
original 0.0652
modified 0.0652
Compare nested models (if nested):
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit3 24 7517 7595 85.3
fit3b 24 7517 7595 85.3 0 0 0
Documentation (write 3 bullet points)
visual textual speed
alpha 0.626 0.883 0.688
omega 0.625 0.885 0.688
omega2 0.625 0.885 0.688
omega3 0.612 0.885 0.686
avevar 0.371 0.721 0.424
Questions
This is not required, but if you want to keep the bifactor material “real”, do it once.
A bifactor model: a general factor g loads on all items, plus domain-specific factors. Domain factors are often set orthogonal to g (and sometimes mutually orthogonal).
chisq df pvalue cfi tli
34.880 18.000 0.010 0.981 0.962
rmsea rmsea.ci.lower rmsea.ci.upper srmr
0.056 0.027 0.083 0.047
Tasks
lhs rhs std.all
1 visual x1 -0.999
2 visual x2 0.161
3 visual x3 0.143
4 textual x4 0.761
5 textual x5 0.824
6 textual x6 0.742
7 speed x7 0.660
8 speed x8 0.715
9 speed x9 0.476
10 g x1 0.918
11 g x2 0.499
12 g x3 0.635
13 g x4 0.371
14 g x5 0.287
15 g x6 0.377
16 g x7 0.060
17 g x8 0.247
18 g x9 0.440
visual textual speed g
alpha 0.6261 0.883 0.688 0.760
omega 0.5266 0.869 0.686 0.846
omega2 0.0961 0.744 0.620 0.551
omega3 0.0961 0.744 0.620 0.535
avevar NA NA NA NA
Interpret bifactor with diagnostics (e.g., ωH/ECV/H). Fit alone is not enough. If you want more, see the extras module on bifactor/ESEM/method factors.
#| echo: false if (SHOW) { # Typical modification: correlate residuals within a domain (often x1 ~~ x2 in HS) m3b_sol <- ’ visual =~ x1 + x2 + x3 textual =~ x4 + x5 + x6 speed =~ x7 + x8 + x9 x1 ~~ x2 ’ fit3b_sol <- cfa(m3b_sol, data = dat, std.lv = TRUE)
list( fit1 = get_fit(fit1), fit3 = get_fit(fit3), top_residuals = head(top, 10), top_mi = head(mi[, c(“lhs”,“op”,“rhs”,“mi”,“epc”,“sepc.all”)], 10), fit3b = get_fit(fit3b_sol), lrt = anova(fit3, fit3b_sol), omega3 = reliability(fit3) ) }