Goals
In this lab you will learn to:
Extract and report core global fit indices (χ², CFI/TLI, RMSEA+CI, SRMR)
Inspect local misfit using residual matrices (raw + standardized)
Use modification indices (MI) together with EPC / SEPC (not MI alone)
Apply a disciplined respecification protocol : one change at a time, theory filter, document
Important constraint for this lab:
You are not allowed to add paths “because MI says so”. Each change needs a short substantive rationale.
Setup
Show code
library (lavaan)
library (semPlot)
Part A — Generate a dataset with known structure
We simulate data from a “true” model and then fit a misspecified model to create diagnostic signals.
True DGP:
lifeSatisfaction depends on attachment, selfEsteem, parentalSupport, salary
selfEsteem depends on parentalSupport and attachment
attachment covaries with parentalSupport
Show code
N <- 483
m_true <- "
lifeSatisfaction ~ .05*attachment + .25*selfEsteem + .40*parentalSupport + .30*salary
selfEsteem ~ .40*parentalSupport + .20*attachment
attachment ~~ .30*parentalSupport
"
dat <- simulateData (m_true, sample.nobs = N, seed = 2026 )
summary (dat)
lifeSatisfaction selfEsteem attachment parentalSupport
Min. :-3.7002 Min. :-3.1965 Min. :-2.66785 Min. :-2.9075
1st Qu.:-0.8613 1st Qu.:-0.7769 1st Qu.:-0.72205 1st Qu.:-0.7266
Median :-0.0683 Median : 0.0113 Median : 0.00539 Median :-0.0732
Mean :-0.0308 Mean :-0.0184 Mean :-0.03731 Mean :-0.0314
3rd Qu.: 0.7516 3rd Qu.: 0.7091 3rd Qu.: 0.64163 3rd Qu.: 0.6940
Max. : 3.4794 Max. : 3.0854 Max. : 2.80907 Max. : 3.1298
salary
Min. :-2.9048
1st Qu.:-0.6525
Median :-0.0525
Mean :-0.0287
3rd Qu.: 0.6024
Max. : 2.4550
Part B — Fit an intentionally misspecified model
We omit:
some predictors of lifeSatisfaction
the covariance attachment ~~ parentalSupport
Show code
m0 <- "
lifeSatisfaction ~ selfEsteem + salary
selfEsteem ~ parentalSupport + attachment
"
fit0 <- sem (m0, data = dat, meanstructure = TRUE )
Exercise 2 — Visual sanity check (diagram)
Show code
semPaths (fit0, what = "std" , layout = "tree" , residuals = TRUE ,
nCharNodes = 0 , sizeMan = 10 )
Question
Which substantive relations are missing relative to your theoretical DGP description ?
Exercise 3 — Local misfit via residuals
3a) Residual covariances
Show code
# Play with the different 'type of residuals'
# 'cor', 'cor.bollen', 'cor.bentler', 'raw'
res_cov <- lavResiduals (fit0)
# res_cov$resid is S - Sigma_hat
res_cov$ cov
lfStsf slfEst salary prntlS attchm
lifeSatisfaction -0.005
selfEsteem -0.005 0.000
salary -0.008 -0.018 0.000
parentalSupport 0.288 0.000 0.000 0.000
attachment 0.076 0.000 0.000 0.000 0.000
Tasks
Identify the top 3 largest absolute standardized residual covariances.
For each, propose a candidate explanation (omitted path? omitted covariance? shared cause?).
Exercise 4 — Modification indices + EPC/SEPC
Compute MI and inspect the top candidates.
Show code
mi <- modificationIndices (fit0, sort. = TRUE )
head (mi[, c ("lhs" ,"op" ,"rhs" ,"mi" ,"epc" ,"sepc.all" )], 12 )
lhs op rhs mi epc sepc.all
19 lifeSatisfaction ~ parentalSupport 67.193 0.428 0.351
18 lifeSatisfaction ~~ selfEsteem 58.870 -0.764 -0.773
27 parentalSupport ~ lifeSatisfaction 45.447 0.263 0.321
21 selfEsteem ~ lifeSatisfaction 41.326 -0.469 -0.545
20 lifeSatisfaction ~ attachment 4.161 0.099 0.082
23 salary ~ lifeSatisfaction 0.407 -0.045 -0.059
22 selfEsteem ~ salary 0.204 -0.021 -0.018
31 attachment ~ lifeSatisfaction 0.142 -0.014 -0.017
28 parentalSupport ~ selfEsteem 0.126 0.610 0.640
24 salary ~ selfEsteem 0.126 -0.013 -0.014
32 attachment ~ selfEsteem 0.005 -0.895 -0.925
11 parentalSupport ~~ attachment 0.000 0.000 NA
Tasks
Compare your residual-based guesses from Exercise 3 with the MI list. Do they agree?
Pick one candidate modification that is:
theoretically plausible, and
supported by both residual patterns and MI/EPC.
Write a one-sentence justification.
Exercise 5 — Respecify (one change), refit, and re-check
5a) Create model m1 by adding exactly one theoretically justified parameter
Common modifications include:
adding a covariance among exogenous variables (~~)
adding a missing regression (~)
Create m1 below and refit.
Show code
m1 <- "
lifeSatisfaction ~ selfEsteem + salary
selfEsteem ~ parentalSupport + attachment
# ADD ONE PARAMETER HERE (exactly one line)
attachment ~~ parentalSupport
# OR: lifeSatisfaction ~ parentalSupport
# OR: lifeSatisfaction ~ attachment
"
fit1 <- sem (m1, data = dat, meanstructure = TRUE )
5b) Compare global fit
Show code
fitMeasures (fit0, c ("chisq" ,"df" ,"cfi" ,"tli" ,"rmsea" ,"srmr" ))
chisq df cfi tli rmsea srmr
72.623 3.000 0.788 0.505 0.219 0.067
Show code
fitMeasures (fit1, c ("chisq" ,"df" ,"cfi" ,"tli" ,"rmsea" ,"srmr" ))
chisq df cfi tli rmsea srmr
72.859 5.000 0.820 0.640 0.168 0.068
5c) χ² difference test (nested models)
Show code
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit0 3 2746 2779 72.6
fit1 5 5476 5530 72.9 0.237 0 2 0.89
5d) Re-check local misfit
Show code
res1_std <- residuals (fit1)$ cov
res1_std
lfStsf slfEst salary prntlS attchm
lifeSatisfaction -0.003
selfEsteem -0.003 0.000
salary -0.005 -0.009 0.000
parentalSupport 0.364 0.000 0.021 0.000
attachment 0.098 0.000 0.006 0.000 0.000
Interpretation questions
Did the single change reduce the largest residuals you identified?
Did it improve the global indices meaningfully?
Is the change defensible theoretically, or did you “chase fit”?
Exercise 6 — Iterate once (optional, but recommended)
Repeat Exercise 5 one more time :
create m2 by adding one additional parameter (only one) to m1
refit and compare fit1 vs fit2
Show code
m2 <- "
# paste m1 and add one more line
"
# fit2 <- sem(m2, data = dat, meanstructure = TRUE)
Task
Document your respecification path:
m0 → m1: what changed and why
m1 → m2: what changed and why
what diagnostics supported each change (residuals? MI+EPC?)
Part C — A “truth check”: fit the generating model
Fit the model that generated the data. This is not something you can do in real life, but it calibrates your intuition.
Show code
fit_true <- sem (m_true, data = dat, meanstructure = TRUE )
fitMeasures (fit_true, c ("chisq" ,"df" ,"cfi" ,"tli" ,"rmsea" ,"srmr" ))
chisq df cfi tli rmsea srmr
7.76 10.00 1.00 1.01 0.00 0.03
Questions
Does the true model always have “perfect fit”? Why or why not?
What does this tell you about interpreting fit indices?
Wrap-up
What you should take away
Global fit summarizes mismatch; it doesn’t locate problems.
Local diagnostics (residuals + MI/EPC) tell you where mismatch lives.
Respecification must be theory-filtered , done one step at a time , and reported transparently .
What’s next
Next we shift to measurement models (CFA), where local diagnostics include:
cross-loadings (often fixed to 0)
correlated errors
weak items / low loadings
Solutions (instructor version)