library(lavaan)Lab 01 — lavaan basics with observed variables
Goals
By the end of this lab, you can:
- Fit observed-variable models in lavaan (no latent variables yet)
- Express (multi)regression models using lavaan syntax (
~,~~) - Extract and interpret key output (estimates, SE, standardized estimates)
- Test a simple equality constraint using nested model comparison
Setup
Packages
Data
We will use the built-in dataset mtcars (motor trends cars; a classic toy dataset).
Show code
dat <- mtcars
# Keep only variables we use (and make names explicit)
dat <- dat[, c("mpg", "qsec", "wt", "hp")]
# Quick look
summary(dat) mpg qsec wt hp
Min. :10.40 Min. :14.50 Min. :1.513 Min. : 52.0
1st Qu.:15.43 1st Qu.:16.89 1st Qu.:2.581 1st Qu.: 96.5
Median :19.20 Median :17.71 Median :3.325 Median :123.0
Mean :20.09 Mean :17.85 Mean :3.217 Mean :146.7
3rd Qu.:22.80 3rd Qu.:18.90 3rd Qu.:3.610 3rd Qu.:180.0
Max. :33.90 Max. :22.90 Max. :5.424 Max. :335.0
Exercise 1 — A regression in lm()
Fit a standard multiple regression predicting miles per gallon (mpg) from:
- car weight (
wt) - horsepower (
hp)
Show code
fit_lm <- lm(mpg ~ wt + hp, data = dat)
summary(fit_lm)
Call:
lm(formula = mpg ~ wt + hp, data = dat)
Residuals:
Min 1Q Median 3Q Max
-3.941 -1.600 -0.182 1.050 5.854
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
wt -3.87783 0.63273 -6.129 1.12e-06 ***
hp -0.03177 0.00903 -3.519 0.00145 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
Questions
- What is the direction of the effect of
wtonmpg? - What is the direction of the effect of
hponmpg? - Roughly, how much variance is explained?
Exercise 2 — The same regression in lavaan
In lavaan, you should write the model as a string, preferably in an object and not within the formula.
Show code
mod_sem <- '
mpg ~ wt + hp
'
fit_sem <- sem(mod_sem, data = dat, meanstructure = TRUE)
summary(fit_sem, standardized = TRUE)lavaan 0.6-19 ended normally after 1 iteration
Estimator ML
Optimization method NLMINB
Number of model parameters 4
Number of observations 32
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
mpg ~
wt -3.878 0.602 -6.438 0.000 -3.878 -0.630
hp -0.032 0.009 -3.696 0.000 -0.032 -0.361
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mpg 37.227 1.522 24.459 0.000 37.227 6.276
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mpg 6.095 1.524 4.000 0.000 6.095 0.173
Tasks
- Find the regression coefficients for
wtandhp. Do they matchlm()? - Find the intercept for
mpg(lavaan reports it explicitly whenmeanstructure = TRUE). - Find the residual variance of
mpg.
Extracting parameters programmatically
Use the function parameterEstimates() to obtain the estimates. How do you extract standardized estimate?
Show code
pe <- parameterEstimates(fit_sem, standardized = TRUE)
head(pe, 10) lhs op rhs est se z pvalue ci.lower ci.upper std.lv std.all
1 mpg ~ wt -3.878 0.602 -6.438 0 -5.058 -2.697 -3.878 -0.630
2 mpg ~ hp -0.032 0.009 -3.696 0 -0.049 -0.015 -0.032 -0.361
3 mpg ~~ mpg 6.095 1.524 4.000 0 3.109 9.082 6.095 0.173
4 wt ~~ wt 0.927 0.000 NA NA 0.927 0.927 0.927 1.000
5 wt ~~ hp 42.812 0.000 NA NA 42.812 42.812 42.812 0.659
6 hp ~~ hp 4553.965 0.000 NA NA 4553.965 4553.965 4553.965 1.000
7 mpg ~1 37.227 1.522 24.459 0 34.244 40.210 37.227 6.276
8 wt ~1 3.217 0.000 NA NA 3.217 3.217 3.217 3.341
9 hp ~1 146.688 0.000 NA NA 146.688 146.688 146.688 2.174
std.nox
1 -0.654
2 -0.005
3 0.173
4 0.927
5 42.812
6 4553.965
7 6.276
8 3.217
9 146.688
optells you the operator (~regression,~~variance/covariance,~1intercept)estis the estimateseis the standard errorzandpvalueare the usual Wald test outputsstd.allis the fully standardized estimate
Exercise 3 — Multivariate regression (two outcomes)
Now predict two outcomes from the same predictors:
mpg ~ wt + hpqsec ~ wt + hp
And allow:
- the predictors to covary (
wt ~~ hp) - the residuals of the two outcomes to covary (
mpg ~~ qsec)
Adding covariances requires additional degrees of freedom!
Show code
mod_mv <- '
# regressions (two outcomes)
mpg ~ wt + hp
qsec ~ wt + hp
# covariance among predictors (exogenous variables)
wt ~~ hp
# residual covariance among outcomes (endogenous disturbances)
mpg ~~ qsec
'
fit_mv <- sem(mod_mv, data = dat, meanstructure = TRUE)
summary(fit_mv, standardized = TRUE)lavaan 0.6-19 ended normally after 43 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 14
Number of observations 32
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
mpg ~
wt -3.878 0.602 -6.438 0.000 -3.878 -0.630
hp -0.032 0.009 -3.696 0.000 -0.032 -0.361
qsec ~
wt 0.942 0.253 3.720 0.000 0.942 0.516
hp -0.027 0.004 -7.560 0.000 -0.027 -1.048
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
wt ~~
hp 42.812 13.757 3.112 0.002 42.812 0.659
.mpg ~~
.qsec 0.550 0.463 1.187 0.235 0.550 0.215
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mpg 37.227 1.522 24.459 0.000 37.227 6.276
.qsec 18.826 0.640 29.433 0.000 18.826 10.704
wt 3.217 0.170 18.898 0.000 3.217 3.341
hp 146.688 11.929 12.296 0.000 146.688 2.174
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mpg 6.095 1.524 4.000 0.000 6.095 0.173
.qsec 1.076 0.269 4.000 0.000 1.076 0.348
wt 0.927 0.232 4.000 0.000 0.927 1.000
hp 4553.965 1138.491 4.000 0.000 4553.965 1.000
Questions
- Why is
wt ~~ hpa reasonable parameter here? - What does
mpg ~~ qsecrepresent in this model (hint: both have predictors)? - Compare the regression estimates for
mpgwith Exercise 2. Do they change? Why/why not?
Compare to two separate regressions
Show code
fit_lm_mpg <- lm(mpg ~ wt + hp, data = dat)
fit_lm_qsec <- lm(qsec ~ wt + hp, data = dat)
summary(fit_lm_mpg)$coef Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.22727012 1.59878754 23.284689 2.565459e-20
wt -3.87783074 0.63273349 -6.128695 1.119647e-06
hp -0.03177295 0.00902971 -3.518712 1.451229e-03
Show code
summary(fit_lm_qsec)$coef Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.82558525 0.671867025 28.019808 1.493575e-22
wt 0.94153237 0.265896975 3.540967 1.368578e-03
hp -0.02730962 0.003794603 -7.196964 6.362107e-08
Exercise 4 — A simple equality constraint
Test whether the effect of wt is the same for both outcomes (mpg and qsec).
4a) Unconstrained model
(You already fit it as fit_mv.)
4b) Constrained model
Label the two wt slopes with the same name (e.g., b_wt):
Show code
mod_mv_equal <- '
mpg ~ b_wt*wt + hp
qsec ~ b_wt*wt + hp
wt ~~ hp
mpg ~~ qsec
'
fit_mv_equal <- sem(mod_mv_equal, data = dat, meanstructure = TRUE)
summary(fit_mv_equal, standardized = TRUE)lavaan 0.6-19 ended normally after 64 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 14
Number of equality constraints 1
Number of observations 32
Model Test User Model:
Test statistic 35.243
Degrees of freedom 1
P-value (Chi-square) 0.000
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
mpg ~
wt (b_wt) 0.524 0.248 2.114 0.035 0.524 0.082
hp -0.073 0.011 -6.762 0.000 -0.073 -0.805
qsec ~
wt (b_wt) 0.524 0.248 2.114 0.035 0.524 0.298
hp -0.023 0.004 -6.378 0.000 -0.023 -0.932
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
wt ~~
hp 42.812 8.643 4.953 0.000 42.812 0.659
.mpg ~~
.qsec -0.416 0.774 -0.537 0.591 -0.416 -0.095
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mpg 29.136 1.766 16.502 0.000 29.136 4.751
.qsec 19.594 0.645 30.368 0.000 19.594 11.579
wt 3.217 0.170 18.898 0.000 3.217 3.341
hp 146.687 11.929 12.296 0.000 146.687 2.174
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mpg 16.266 4.066 4.000 0.000 16.266 0.432
.qsec 1.168 0.292 4.000 0.000 1.168 0.408
wt 0.927 0.209 4.440 0.000 0.927 1.000
hp 4553.971 0.081 56034.919 0.000 4553.971 1.000
4c) Compare nested models
Show code
anova(fit_mv, fit_mv_equal)
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit_mv 0 698.87 719.40 0.000
fit_mv_equal 1 732.12 751.17 35.243 35.243 1.0345 1 2.911e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interpretation questions
- If the constrained model fits significantly worse, what does that imply?
- If there is little/no difference, what does that imply?
- Which model would you prefer and why (think theory + parsimony)?
Exercise 5 — Reading model structure (sanity checks)
Use lavInspect() to view the sample covariances and the model-implied covariances.
Show code
S <- lavInspect(fit_mv, "sampstat")$cov
Sigma_hat <- fitted(fit_mv)$cov
S mpg qsec wt hp
mpg 35.189
qsec 4.368 3.093
wt -4.957 -0.296 0.927
hp -310.709 -84.059 42.812 4553.965
Show code
Sigma_hat mpg qsec wt hp
mpg 35.189
qsec 4.368 3.093
wt -4.957 -0.296 0.927
hp -310.709 -84.059 42.812 4553.965
Questions
- Are the matrices the same? Should they be?
- Which parts of the model influence the off-diagonal elements of
Sigma_hat?
Wrap-up
What you should take away
- lavaan models are written with a small “grammar”:
~,~~,~1(and later=~) - Even with manifest variables only, SEM lets you model systems of equations and covariances
- Equality constraints are easy: reuse parameter labels
What’s next
- In the next lecture we move from “multivariate regression” to path models and mediation
- You will reuse the same syntax you practiced today (plus parameter labels for effects)