Lab 03 — Fit & diagnostics: residuals, MI, and disciplined respecification

Author

Tommaso Feraco

Goals

In this lab you will learn to:

  • Extract and report core global fit indices (χ², CFI/TLI, RMSEA+CI, SRMR)
  • Inspect local misfit using residual matrices (raw + standardized)
  • Use modification indices (MI) together with EPC / SEPC (not MI alone)
  • Apply a disciplined respecification protocol: one change at a time, theory filter, document

Important constraint for this lab:
You are not allowed to add paths “because MI says so”. Each change needs a short substantive rationale.


Setup

Show code
library(lavaan)
library(semPlot)

Part A — Generate a dataset with known structure

We simulate data from a “true” model and then fit a misspecified model to create diagnostic signals.

True DGP:

  • lifeSatisfaction depends on attachment, selfEsteem, parentalSupport, salary
  • selfEsteem depends on parentalSupport and attachment
  • attachment covaries with parentalSupport
Show code
N <- 483

m_true <- "
  lifeSatisfaction ~ .05*attachment + .25*selfEsteem + .40*parentalSupport + .30*salary
  selfEsteem       ~ .40*parentalSupport + .20*attachment
  attachment ~~ .30*parentalSupport
"

dat <- simulateData(m_true, sample.nobs = N, seed = 2026)
summary(dat)
 lifeSatisfaction    selfEsteem        attachment       parentalSupport  
 Min.   :-3.7002   Min.   :-3.1965   Min.   :-2.66785   Min.   :-2.9075  
 1st Qu.:-0.8613   1st Qu.:-0.7769   1st Qu.:-0.72205   1st Qu.:-0.7266  
 Median :-0.0683   Median : 0.0113   Median : 0.00539   Median :-0.0732  
 Mean   :-0.0308   Mean   :-0.0184   Mean   :-0.03731   Mean   :-0.0314  
 3rd Qu.: 0.7516   3rd Qu.: 0.7091   3rd Qu.: 0.64163   3rd Qu.: 0.6940  
 Max.   : 3.4794   Max.   : 3.0854   Max.   : 2.80907   Max.   : 3.1298  
     salary       
 Min.   :-2.9048  
 1st Qu.:-0.6525  
 Median :-0.0525  
 Mean   :-0.0287  
 3rd Qu.: 0.6024  
 Max.   : 2.4550  

Part B — Fit an intentionally misspecified model

We omit:

  • some predictors of lifeSatisfaction
  • the covariance attachment ~~ parentalSupport
Show code
m0 <- "
  lifeSatisfaction ~ selfEsteem + salary
  selfEsteem       ~ parentalSupport + attachment
"
fit0 <- sem(m0, data = dat, meanstructure = TRUE)

Exercise 1 — Global fit (extract + interpret)

1a) Extract key indices

Show code
fitMeasures(
  fit0,
  c("npar","chisq","df","pvalue","cfi","tli",
    "rmsea","rmsea.ci.lower","rmsea.ci.upper","srmr")
)
          npar          chisq             df         pvalue            cfi 
         8.000         72.623          3.000          0.000          0.788 
           tli          rmsea rmsea.ci.lower rmsea.ci.upper           srmr 
         0.505          0.219          0.177          0.264          0.067 

1b) Interpret

Answer in 3–5 sentences:

  1. Does the model seem globally plausible?
  2. Which indices are most informative here and why?
  3. If χ² is significant, what are two non-mutually-exclusive reasons?

Exercise 2 — Visual sanity check (diagram)

Show code
semPaths(fit0, what = "std", layout = "tree", residuals = TRUE,
         nCharNodes = 0, sizeMan = 10)

Question

  • Which substantive relations are missing relative to your theoretical DGP description?

Exercise 3 — Local misfit via residuals

3a) Residual covariances

Show code
# Play with the different 'type of residuals'
# 'cor', 'cor.bollen', 'cor.bentler', 'raw'
res_cov <- lavResiduals(fit0)
# res_cov$resid is S - Sigma_hat
res_cov$cov
                 lfStsf slfEst salary prntlS attchm
lifeSatisfaction -0.005                            
selfEsteem       -0.005  0.000                     
salary           -0.008 -0.018  0.000              
parentalSupport   0.288  0.000  0.000  0.000       
attachment        0.076  0.000  0.000  0.000  0.000

Tasks

  1. Identify the top 3 largest absolute standardized residual covariances.
  2. For each, propose a candidate explanation (omitted path? omitted covariance? shared cause?).

Exercise 4 — Modification indices + EPC/SEPC

Compute MI and inspect the top candidates.

Show code
mi <- modificationIndices(fit0, sort. = TRUE)
head(mi[, c("lhs","op","rhs","mi","epc","sepc.all")], 12)
                lhs op              rhs     mi    epc sepc.all
19 lifeSatisfaction  ~  parentalSupport 67.193  0.428    0.351
18 lifeSatisfaction ~~       selfEsteem 58.870 -0.764   -0.773
27  parentalSupport  ~ lifeSatisfaction 45.447  0.263    0.321
21       selfEsteem  ~ lifeSatisfaction 41.326 -0.469   -0.545
20 lifeSatisfaction  ~       attachment  4.161  0.099    0.082
23           salary  ~ lifeSatisfaction  0.407 -0.045   -0.059
22       selfEsteem  ~           salary  0.204 -0.021   -0.018
31       attachment  ~ lifeSatisfaction  0.142 -0.014   -0.017
28  parentalSupport  ~       selfEsteem  0.126  0.610    0.640
24           salary  ~       selfEsteem  0.126 -0.013   -0.014
32       attachment  ~       selfEsteem  0.005 -0.895   -0.925
11  parentalSupport ~~       attachment  0.000  0.000       NA

Tasks

  1. Compare your residual-based guesses from Exercise 3 with the MI list. Do they agree?
  2. Pick one candidate modification that is:
    • theoretically plausible, and
    • supported by both residual patterns and MI/EPC.

Write a one-sentence justification.


Exercise 5 — Respecify (one change), refit, and re-check

5a) Create model m1 by adding exactly one theoretically justified parameter

Common modifications include:

  • adding a covariance among exogenous variables (~~)
  • adding a missing regression (~)

Create m1 below and refit.

Show code
m1 <- "
  lifeSatisfaction ~ selfEsteem + salary
  selfEsteem       ~ parentalSupport + attachment

  # ADD ONE PARAMETER HERE (exactly one line)
  attachment ~~ parentalSupport
  # OR: lifeSatisfaction ~ parentalSupport
  # OR: lifeSatisfaction ~ attachment
"
fit1 <- sem(m1, data = dat, meanstructure = TRUE)

5b) Compare global fit

Show code
fitMeasures(fit0, c("chisq","df","cfi","tli","rmsea","srmr"))
 chisq     df    cfi    tli  rmsea   srmr 
72.623  3.000  0.788  0.505  0.219  0.067 
Show code
fitMeasures(fit1, c("chisq","df","cfi","tli","rmsea","srmr"))
 chisq     df    cfi    tli  rmsea   srmr 
72.859  5.000  0.820  0.640  0.168  0.068 

5c) χ² difference test (nested models)

Show code
anova(fit0, fit1)

Chi-Squared Difference Test

     Df  AIC  BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit0  3 2746 2779  72.6                                    
fit1  5 5476 5530  72.9      0.237     0       2       0.89

5d) Re-check local misfit

Show code
res1_std <- residuals(fit1)$cov
res1_std
                 lfStsf slfEst salary prntlS attchm
lifeSatisfaction -0.003                            
selfEsteem       -0.003  0.000                     
salary           -0.005 -0.009  0.000              
parentalSupport   0.364  0.000  0.021  0.000       
attachment        0.098  0.000  0.006  0.000  0.000

Interpretation questions

  1. Did the single change reduce the largest residuals you identified?
  2. Did it improve the global indices meaningfully?
  3. Is the change defensible theoretically, or did you “chase fit”?

Part C — A “truth check”: fit the generating model

Fit the model that generated the data. This is not something you can do in real life, but it calibrates your intuition.

Show code
fit_true <- sem(m_true, data = dat, meanstructure = TRUE)
fitMeasures(fit_true, c("chisq","df","cfi","tli","rmsea","srmr"))
chisq    df   cfi   tli rmsea  srmr 
 7.76 10.00  1.00  1.01  0.00  0.03 

Questions

  1. Does the true model always have “perfect fit”? Why or why not?
  2. What does this tell you about interpreting fit indices?

Wrap-up

What you should take away

  • Global fit summarizes mismatch; it doesn’t locate problems.
  • Local diagnostics (residuals + MI/EPC) tell you where mismatch lives.
  • Respecification must be theory-filtered, done one step at a time, and reported transparently.

What’s next

  • Next we shift to measurement models (CFA), where local diagnostics include:
    • cross-loadings (often fixed to 0)
    • correlated errors
    • weak items / low loadings

Solutions (instructor version)