Lab 10 — Multilevel CFA (clustered / repeated data) in lavaan

Author

Tommaso Feraco

Goals

By the end of this lab, you can:

Recognize when non-independence (clusters / repeated measures) matters
Decompose variability into within vs between components (conceptually + via lavaan)
Fit a two-level CFA in lavaan using level: blocks and cluster=
Read level-specific fit, especially srmr_within and srmr_between
Diagnose (and think through) Heywood cases / negative level-2 variances
(Optional) test cross-level invariance of loadings (and partial invariance)

Setup

Packages

Show code

library(lavaan)
library(semTools)

Helper: discretize to Likert

Show code

cut_likert <- function(x, k = 5) {
  qs <- quantile(x, probs = seq(0, 1, length.out = k + 1), na.rm = TRUE)
  as.integer(cut(x, breaks = unique(qs), include.lowest = TRUE, labels = FALSE))
}

Exercise 1 — Simulate clustered (ESM-like) ordinal items

We simulate:

139 persons (clusters)
21 observations per person
4 ordinal items measuring a latent factor (Task Demand)

This is not “true” ordinal data generation (threshold model), but it is adequate for practicing MCFA workflow.

Show code

n_id <- 139
n_rep <- 21
N <- n_id * n_rep
ID <- rep(sprintf("S%03d", 1:n_id), each = n_rep)

# Between-person factor (trait-like)
eta_b <- rnorm(n_id, 0, 0.7)
eta_b_long <- rep(eta_b, each = n_rep)

# Within-person factor (state-like)
eta_w <- rnorm(N, 0, 1.0)

lambda <- c(0.80, 0.70, 0.75, 0.65)
eps_sd <- c(0.70, 0.80, 0.75, 0.85)

# Continuous responses (before discretizing to Likert)
y_cont <- sapply(seq_along(lambda), function(j) {
  lambda[j] * (eta_b_long + eta_w) + rnorm(N, 0, eps_sd[j])
})

ESMdata <- data.frame(
  ID = ID,
  d1 = cut_likert(y_cont[,1]),
  d2 = cut_likert(y_cont[,2]),
  d3 = cut_likert(y_cont[,3]),
  d4 = cut_likert(y_cont[,4])
)

head(ESMdata, 8)

    ID d1 d2 d3 d4
1 S001  1  1  1  2
2 S001  3  5  5  4
3 S001  2  1  2  1
4 S001  1  2  1  1
5 S001  1  1  1  2
6 S001  1  1  1  2
7 S001  4  4  1  2
8 S001  1  1  1  1

Your task

Check the marginal distributions of d1–d4 (are categories used?).
Compute the number of clusters and average cluster size.

Show code

# Write your code here

Exercise 2 — Fit a two-level CFA (MCFA)

We specify a one-factor model at both levels:

Level 1: TD_w
Level 2: TD_b

Show code

m_td <- '
  level: 1
    TD_w =~ d1 + d2 + d3 + d4
  level: 2
    TD_b =~ d1 + d2 + d3 + d4
'

Your task

Fit the model with estimator = "MLR" and cluster = "ID".
Extract these fit measures: rmsea.robust, cfi.robust, tli.robust, srmr_within, srmr_between.
Interpret what it would mean if srmr_between were much worse than srmr_within.

Show code

# Write your code here

Exercise 3 — Compare single-level vs multilevel thinking

A common (bad) habit is to ignore clustering and just fit a single-level CFA.

Your task

Fit a single-level CFA to the same ordinal indicators (no cluster=).
Compare the story you would tell about fit vs the two-level fit.
(Conceptual) What is the risk of relying on the single-level fit here?

Show code

# Write your code here

Exercise 4 — Level-specific parameters (loadings)

Your task

Extract standardized loadings at Level 1 and Level 2.
Do you see stronger loadings at Level 2? Why could that happen?

Show code

# Write your code here

Exercise 5 — Stress test: induce Level-2 misfit

We now create a dataset where between-level residual correlations exist (not captured by the factor).

Show code

# Create a new dataset ESMdata2 with extra between correlations among d1 and d2
eta_b2 <- rnorm(n_id, 0, 0.7)
eta_b2_long <- rep(eta_b2, each = n_rep)

eta_w2 <- rnorm(N, 0, 1.0)

# Add a shared between-only disturbance u_b to items 1 and 2
u_b <- rnorm(n_id, 0, 0.6)
u_b_long <- rep(u_b, each = n_rep)

y2_cont <- sapply(seq_along(lambda), function(j) {
  base <- lambda[j] * (eta_b2_long + eta_w2) + rnorm(N, 0, eps_sd[j])
  if (j %in% c(1,2)) base <- base + u_b_long
  base
})

ESMdata2 <- data.frame(
  ID = ID,
  d1 = cut_likert(y2_cont[,1]),
  d2 = cut_likert(y2_cont[,2]),
  d3 = cut_likert(y2_cont[,3]),
  d4 = cut_likert(y2_cont[,4])
)

Your task

Fit the same two-level CFA on ESMdata2.
Compare srmr_within and srmr_between to the original dataset.
Propose one model modification at Level 2 that could improve srmr_between.

Show code

# Write your code here

Exercise 6 — Heywood cases / negative Level-2 variances (diagnostic mindset)

In MCFA, it is common to encounter:

negative residual variances at Level 2
extremely large Level-2 loadings
warnings about non-positive definite matrices

Your task

Check the estimated variances at Level 2 for ESMdata2.
If any variance is negative or near 0, list two possible explanations (substantive/statistical).

Show code

# Write your code here

Optional Exercise 7 — Cross-level invariance of loadings

We test whether loadings are equal across levels:

Configural: same pattern
Weak (cross-level): equal loadings across levels

Show code

conf <- '
  level: 1
    TD_w =~ d1 + d2 + d3 + d4
  level: 2
    TD_b =~ d1 + d2 + d3 + d4
'

weak <- '
  level: 1
    TD_w =~ a*d1 + b*d2 + c*d3 + d*d4
  level: 2
    TD_b =~ a*d1 + b*d2 + c*d3 + d*d4
'

Your task

Fit conf and weak on the original dataset (ESMdata) and compare fit (focus on srmr_between).
If the weak model fits worse, inspect modification indices and propose a partial invariance model.

Show code

# Write your code here

Wrap-up

What you should be able to say now

“This dataset has a within covariance structure and a between covariance structure.”
“My MCFA fits within/between simultaneously; SRMR tells me where misfit is.”
“Cross-level invariance connects modeling constraints to construct interpretation.”

Next steps

Migrate the key citations into refs/references.bib and cite them in the slides/lab.
(If you want a true ordinal data-generating model) simulate via latent response + thresholds and fit with WLSMV.