Today: observed-variable path models and mediation (direct, indirect, total effects) + equivalence/pitfalls. Next (03): model fit & diagnostics (global vs local, residuals, MI, disciplined respecification).
Learning objectives
By the end of this session you should be able to:
Write a path model as a system of regressions
Define direct, indirect, and total effects (and compute them in lavaan)
Explain why indirect effects have non-normal sampling distributions
Recognize just-identified path models (why “fit” can be uninformative)
Explain (and fear, a little) model equivalence and causal interpretation limits
Path analysis
A path model is a set of linear regressions estimated jointly, with an explicit covariance structure and with at least one variable working as mediator.
As usual, depicting the models is always the best way to understand our models.
Actually, this looks like a full SEM. Why? And why it isn’t, given the way latent variables should be represented?.
From “effects” to equations
A path diagram is shorthand for a system like:
\[
\begin{aligned}
M &= i_M + aX + \varepsilon_M\\
Y &= i_Y + c'X + bM + \varepsilon_Y
\end{aligned}
\]
\((X)\) exogenous (predictor)
\((M\)) mediator (endogenous)
\((Y)\) outcome (endogenous)
Mediation: three effects
\[
\text{Indirect} = ab \qquad
\text{Direct} = c' \qquad
\text{Total} = c = c' + ab
\]
Interpretation (linear, continuous case):
\((a)\): expected change in \((M)\) per unit change in \((X)\)
\((b)\): expected change in \((Y)\) per unit change in \((M)\) (holding \((X)\) fixed)
\((c')\): remaining effect of \((X)\) on \((Y)\) after \((M)\)
Diagram: mediation template
“Indirect effect” is not automatically “causal mediation”. Causal language requires assumptions!
Why SEM for mediation?
SEM makes it easy to:
estimate the system jointly (including residual covariance if justified)
compute functions of parameters (e.g., \((ab)\), totals, contrasts)
add covariates, multiple mediators, constraints, and (later) measurement models
Indirect effects are just products
The indirect effect is a product \((ab)\). Even if \((\hat a)\) and \((\hat b)\) are approximately normal, the product is not.
A common large-sample approximation (delta method):
\[
\mathrm{Var}(\widehat{ab}) \approx
b^2\mathrm{Var}(\hat a) + a^2\mathrm{Var}(\hat b) + 2ab\,\mathrm{Cov}(\hat a,\hat b)
\]
Sobel test (same idea, historically popular):
\[
z = \frac{\widehat{ab}}{\sqrt{\widehat{\mathrm{Var}}(\widehat{ab})}}
\]
In practice, bootstrap is often preferred for indirect effects (especially with small–moderate \(N\)).
lavaan: mediation as a model + defined parameters
You already know ~ and ~~. The new piece is:
labels (a*X) and defined parameters (ind := a*b)
library(lavaan)mod_med <-' # structural regressions M ~ a*X Y ~ cprime*X + b*M # (optional) exogenous variance X ~~ X # defined effects ind := a*b tot := cprime + (a*b)'fit <-sem(mod_med, data = dat, meanstructure =TRUE)summary(fit, standardized =TRUE)
Note
in lavaan * is not an interaction term but an assignment/lableing
Bootstrap the indirect effect
fit_b <-sem( mod_med, data = dat,se ="bootstrap", bootstrap =2000,meanstructure =TRUE)parameterEstimates(fit_b, ci =TRUE, level = .95,standardized =TRUE) |>subset(op %in%c("~", ":="))
Interpretation:
focus on \((ab)\) estimate and its CI
bootstrap CI is not magic; it’s still conditional on model assumptions
A richer example: multiple indirect paths
Sometimes the theory implies more than one mediated route.
A paper, still Rohrer (2022): That’s a Lot to Process! Pitfalls of Popular Path Models link
References
Rohrer, J. M., Hünermund, P., Arslan, R. C., & Elson, M. (2022). That’s a Lot to Process! Pitfalls of Popular Path Models. Advances in Methods and Practices in Psychological Science, 5(2), 25152459221095827. https://doi.org/10.1177/25152459221095827