Lab 10 — Multilevel CFA (clustered / repeated data) in lavaan

Author

Tommaso Feraco

Goals

By the end of this lab, you can:

  • Recognize when non-independence (clusters / repeated measures) matters
  • Decompose variability into within vs between components (conceptually + via lavaan)
  • Fit a two-level CFA in lavaan using level: blocks and cluster=
  • Read level-specific fit, especially srmr_within and srmr_between
  • Diagnose (and think through) Heywood cases / negative level-2 variances
  • (Optional) test cross-level invariance of loadings (and partial invariance)

Setup

Packages

Show code
library(lavaan)
library(semTools)

Helper: discretize to Likert

Show code
cut_likert <- function(x, k = 5) {
  qs <- quantile(x, probs = seq(0, 1, length.out = k + 1), na.rm = TRUE)
  as.integer(cut(x, breaks = unique(qs), include.lowest = TRUE, labels = FALSE))
}

Exercise 1 — Simulate clustered (ESM-like) ordinal items

We simulate:

  • 139 persons (clusters)
  • 21 observations per person
  • 4 ordinal items measuring a latent factor (Task Demand)

This is not “true” ordinal data generation (threshold model), but it is adequate for practicing MCFA workflow.

Show code
n_id <- 139
n_rep <- 21
N <- n_id * n_rep
ID <- rep(sprintf("S%03d", 1:n_id), each = n_rep)

# Between-person factor (trait-like)
eta_b <- rnorm(n_id, 0, 0.7)
eta_b_long <- rep(eta_b, each = n_rep)

# Within-person factor (state-like)
eta_w <- rnorm(N, 0, 1.0)

lambda <- c(0.80, 0.70, 0.75, 0.65)
eps_sd <- c(0.70, 0.80, 0.75, 0.85)

# Continuous responses (before discretizing to Likert)
y_cont <- sapply(seq_along(lambda), function(j) {
  lambda[j] * (eta_b_long + eta_w) + rnorm(N, 0, eps_sd[j])
})

ESMdata <- data.frame(
  ID = ID,
  d1 = cut_likert(y_cont[,1]),
  d2 = cut_likert(y_cont[,2]),
  d3 = cut_likert(y_cont[,3]),
  d4 = cut_likert(y_cont[,4])
)

head(ESMdata, 8)
    ID d1 d2 d3 d4
1 S001  1  1  1  2
2 S001  3  5  5  4
3 S001  2  1  2  1
4 S001  1  2  1  1
5 S001  1  1  1  2
6 S001  1  1  1  2
7 S001  4  4  1  2
8 S001  1  1  1  1

Your task

  1. Check the marginal distributions of d1d4 (are categories used?).
  2. Compute the number of clusters and average cluster size.
Show code
# Write your code here

Exercise 2 — Fit a two-level CFA (MCFA)

We specify a one-factor model at both levels:

  • Level 1: TD_w
  • Level 2: TD_b
Show code
m_td <- '
  level: 1
    TD_w =~ d1 + d2 + d3 + d4
  level: 2
    TD_b =~ d1 + d2 + d3 + d4
'

Your task

  1. Fit the model with estimator = "MLR" and cluster = "ID".
  2. Extract these fit measures: rmsea.robust, cfi.robust, tli.robust, srmr_within, srmr_between.
  3. Interpret what it would mean if srmr_between were much worse than srmr_within.
Show code
# Write your code here

Exercise 3 — Compare single-level vs multilevel thinking

A common (bad) habit is to ignore clustering and just fit a single-level CFA.

Your task

  1. Fit a single-level CFA to the same ordinal indicators (no cluster=).
  2. Compare the story you would tell about fit vs the two-level fit.
  3. (Conceptual) What is the risk of relying on the single-level fit here?
Show code
# Write your code here

Exercise 4 — Level-specific parameters (loadings)

Your task

  1. Extract standardized loadings at Level 1 and Level 2.
  2. Do you see stronger loadings at Level 2? Why could that happen?
Show code
# Write your code here

Exercise 5 — Stress test: induce Level-2 misfit

We now create a dataset where between-level residual correlations exist (not captured by the factor).

Show code
# Create a new dataset ESMdata2 with extra between correlations among d1 and d2
eta_b2 <- rnorm(n_id, 0, 0.7)
eta_b2_long <- rep(eta_b2, each = n_rep)

eta_w2 <- rnorm(N, 0, 1.0)

# Add a shared between-only disturbance u_b to items 1 and 2
u_b <- rnorm(n_id, 0, 0.6)
u_b_long <- rep(u_b, each = n_rep)

y2_cont <- sapply(seq_along(lambda), function(j) {
  base <- lambda[j] * (eta_b2_long + eta_w2) + rnorm(N, 0, eps_sd[j])
  if (j %in% c(1,2)) base <- base + u_b_long
  base
})

ESMdata2 <- data.frame(
  ID = ID,
  d1 = cut_likert(y2_cont[,1]),
  d2 = cut_likert(y2_cont[,2]),
  d3 = cut_likert(y2_cont[,3]),
  d4 = cut_likert(y2_cont[,4])
)

Your task

  1. Fit the same two-level CFA on ESMdata2.
  2. Compare srmr_within and srmr_between to the original dataset.
  3. Propose one model modification at Level 2 that could improve srmr_between.
Show code
# Write your code here

Exercise 6 — Heywood cases / negative Level-2 variances (diagnostic mindset)

In MCFA, it is common to encounter:

  • negative residual variances at Level 2
  • extremely large Level-2 loadings
  • warnings about non-positive definite matrices

Your task

  1. Check the estimated variances at Level 2 for ESMdata2.
  2. If any variance is negative or near 0, list two possible explanations (substantive/statistical).
Show code
# Write your code here

Optional Exercise 7 — Cross-level invariance of loadings

We test whether loadings are equal across levels:

  • Configural: same pattern
  • Weak (cross-level): equal loadings across levels
Show code
conf <- '
  level: 1
    TD_w =~ d1 + d2 + d3 + d4
  level: 2
    TD_b =~ d1 + d2 + d3 + d4
'

weak <- '
  level: 1
    TD_w =~ a*d1 + b*d2 + c*d3 + d*d4
  level: 2
    TD_b =~ a*d1 + b*d2 + c*d3 + d*d4
'

Your task

  1. Fit conf and weak on the original dataset (ESMdata) and compare fit (focus on srmr_between).
  2. If the weak model fits worse, inspect modification indices and propose a partial invariance model.
Show code
# Write your code here

Wrap-up

What you should be able to say now

  • “This dataset has a within covariance structure and a between covariance structure.”
  • “My MCFA fits within/between simultaneously; SRMR tells me where misfit is.”
  • “Cross-level invariance connects modeling constraints to construct interpretation.”

Next steps

  • Migrate the key citations into refs/references.bib and cite them in the slides/lab.
  • (If you want a true ordinal data-generating model) simulate via latent response + thresholds and fit with WLSMV.