Show code
library(lavaan)
library(semTools)Tommaso Feraco
In this lab you will:
We use the classic HolzingerSwineford1939 dataset and discretize the indicators into 5 ordered categories (a “Likert-ification”).
dat_full <- HolzingerSwineford1939
vars <- paste0("x", 1:9)
make_ordinal <- function(x, k = 5) {
# quantile-based cutpoints (balanced categories)
qs <- quantile(x, probs = seq(0, 1, length.out = k + 1), na.rm = TRUE, type = 8)
qs <- unique(qs)
if (length(qs) < 3) {
qs <- seq(min(x, na.rm = TRUE), max(x, na.rm = TRUE), length.out = k + 1)
}
cut(x, breaks = qs, include.lowest = TRUE, ordered_result = TRUE)
}
dat_ord <- dat_full
for (v in vars) dat_ord[[v]] <- make_ordinal(dat_full[[v]], k = 5)
# A "wrong-on-purpose" version that treats ordinal codes as continuous numbers
dat_ord_num <- dat_ord
for (v in vars) dat_ord_num[[v]] <- as.numeric(dat_ord[[v]])
# sanity check
lapply(dat_ord[vars], table) |> head(2)$x1
[0.667,4] (4,4.67] (4.67,5.33] (5.33,5.9] (5.9,8.5]
65 60 72 44 60
$x2
[2.25,5.25] (5.25,5.75] (5.75,6.25] (6.25,7] (7,9.25]
90 54 53 53 51
fit_ord <- cfa(
m3,
data = dat_ord,
ordered = vars, # tells lavaan these are ordinal
std.lv = TRUE,
parameterization = "theta" # common choice for categorical indicators
# estimator is set automatically to WLSMV when ordered= is used
)
fitMeasures(fit_ord, c("chisq","df","cfi","tli","rmsea","srmr")) chisq df cfi tli rmsea srmr
56.708 24.000 0.987 0.980 0.067 0.065
Questions (answer in words)
pe_cont <- parameterEstimates(fit_cont, standardized = TRUE)
pe_ord <- parameterEstimates(fit_ord, standardized = TRUE)
load_cont <- subset(pe_cont, op == "=~")[, c("lhs","rhs","est","se","pvalue","std.all")]
load_ord <- subset(pe_ord, op == "=~")[, c("lhs","rhs","est","se","pvalue","std.all")]
phi_cont <- subset(pe_cont, op == "~~" & lhs %in% c("visual","textual","speed") & rhs %in% c("visual","textual","speed") & lhs != rhs)[, c("lhs","rhs","est","std.all")]
phi_ord <- subset(pe_ord, op == "~~" & lhs %in% c("visual","textual","speed") & rhs %in% c("visual","textual","speed") & lhs != rhs)[, c("lhs","rhs","est","std.all")]
list(load_cont = load_cont, load_ord = load_ord,
phi_cont = phi_cont, phi_ord = phi_ord)$load_cont
lhs rhs est se pvalue std.all
1 visual x1 1.060 0.097 0 0.750
2 visual x2 0.634 0.097 0 0.432
3 visual x3 0.815 0.094 0 0.568
4 textual x4 1.056 0.070 0 0.774
5 textual x5 1.178 0.070 0 0.844
6 textual x6 1.228 0.072 0 0.849
7 speed x7 0.689 0.095 0 0.486
8 speed x8 0.837 0.096 0 0.596
9 speed x9 1.032 0.102 0 0.729
$load_ord
lhs rhs est se pvalue std.all
1 visual x1 1.520 0.394 0 0.835
2 visual x2 0.495 0.088 0 0.443
3 visual x3 0.674 0.106 0 0.559
4 textual x4 1.472 0.160 0 0.827
5 textual x5 1.819 0.226 0 0.876
6 textual x6 1.767 0.195 0 0.870
7 speed x7 0.624 0.096 0 0.529
8 speed x8 0.820 0.121 0 0.634
9 speed x9 1.430 0.308 0 0.820
$phi_cont
lhs rhs est std.all
22 visual textual 0.488 0.488
23 visual speed 0.516 0.516
24 textual speed 0.235 0.235
$phi_ord
lhs rhs est std.all
58 visual textual 0.490 0.490
59 visual speed 0.491 0.491
60 textual speed 0.232 0.232
Questions
For ordinal indicators, the model is expressed in terms of an underlying latent response \((y^\*)\). Observed categories reflect whether \((y^\*)\) crosses thresholds:
In lavaan, thresholds appear with op == "|".
lhs op rhs est se
10 x1 | t1 -1.430 0.280
11 x1 | t2 -0.389 0.147
12 x1 | t3 0.723 0.189
13 x1 | t4 1.536 0.315
14 x2 | t1 -0.588 0.086
15 x2 | t2 -0.060 0.081
16 x2 | t3 0.443 0.084
17 x2 | t4 1.067 0.097
18 x3 | t1 -0.868 0.100
19 x3 | t2 -0.289 0.089
20 x3 | t3 0.383 0.090
21 x3 | t4 1.047 0.106
22 x4 | t1 -1.419 0.159
23 x4 | t2 -0.230 0.128
24 x4 | t3 0.921 0.144
25 x4 | t4 1.702 0.173
26 x5 | t1 -1.703 0.215
27 x5 | t2 -0.533 0.153
You can also inspect sample thresholds:
x1|t1 x1|t2 x1|t3 x1|t4 x2|t1 x2|t2
-0.7860 -0.2140 0.3975 0.8440 -0.5273 -0.0542
Tasks
x1) and report its thresholds (how many? why?). x1 x2 x3 x4 x5 x6 x7 x8 x9
x1 0.000
x2 -0.052 0.000
x3 -0.006 0.101 0.000
x4 0.090 0.050 -0.079 0.000
x5 -0.016 0.020 -0.133 -0.004 0.000
x6 0.017 -0.001 -0.048 -0.012 0.010 0.000
x7 -0.144 -0.165 -0.073 0.056 -0.013 0.001 0.000
x8 -0.065 -0.043 0.008 -0.058 -0.054 -0.033 0.161 0.000
x9 0.068 0.032 0.150 0.046 0.061 -0.028 -0.074 -0.072 0.000
Find the largest absolute residuals:
Var1 Var2 Freq
16 x7 x2 -0.165
56 x2 x7 -0.165
62 x8 x7 0.161
70 x7 x8 0.161
27 x9 x3 0.150
75 x3 x9 0.150
7 x7 x1 -0.144
55 x1 x7 -0.144
23 x5 x3 -0.133
39 x3 x5 -0.133
lhs op rhs mi epc sepc.all
133 x7 ~~ x8 29.83 -0.651 -0.651
87 visual =~ x9 29.44 -1.156 -0.662
135 x8 ~~ x9 12.48 1.043 1.043
85 visual =~ x7 12.04 0.348 0.295
90 textual =~ x3 11.41 0.338 0.280
120 x3 ~~ x9 9.73 -0.446 -0.446
112 x2 ~~ x7 7.30 0.239 0.239
105 x1 ~~ x7 7.03 0.407 0.407
93 textual =~ x9 6.51 -0.314 -0.180
86 visual =~ x8 5.60 0.309 0.239
88 textual =~ x1 5.60 -0.496 -0.273
134 x7 ~~ x9 5.50 0.522 0.522
116 x3 ~~ x5 5.10 0.413 0.413
96 speed =~ x3 4.72 -0.242 -0.201
92 textual =~ x8 4.30 0.158 0.122
Questions
We test invariance across school (Pasteur vs Grant-White).
We keep the same 3-factor model and treat items as ordinal.
Scaled Chi-Squared Difference Test (method = "satorra.2000")
lavaan->lavTestLRT():
lavaan NOTE: The "Chisq" column contains standard test statistics, not the
robust test that should be reported per model. A robust difference test is
a function of two standard (not robust) statistics.
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
fit_g0 48 75.7
fit_g1 72 143.7 93.7 24 3.4e-10 ***
fit_g2 78 144.6 1.0 6 0.98
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation tasks
school: configural vs thresholds vs thresholds+loadings