Lab 01 — lavaan basics with observed variables

Author

Tommaso Feraco

Goals

By the end of this lab, you can:

  • Fit observed-variable models in lavaan (no latent variables yet)
  • Express (multi)regression models using lavaan syntax (~, ~~)
  • Extract and interpret key output (estimates, SE, standardized estimates)
  • Test a simple equality constraint using nested model comparison

Setup

Packages

library(lavaan)

Data

We will use the built-in dataset mtcars (motor trends cars; a classic toy dataset).

Show code
dat <- mtcars

# Keep only variables we use (and make names explicit)
dat <- dat[, c("mpg", "qsec", "wt", "hp")]

# Quick look
summary(dat)
      mpg             qsec             wt              hp       
 Min.   :10.40   Min.   :14.50   Min.   :1.513   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:16.89   1st Qu.:2.581   1st Qu.: 96.5  
 Median :19.20   Median :17.71   Median :3.325   Median :123.0  
 Mean   :20.09   Mean   :17.85   Mean   :3.217   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:18.90   3rd Qu.:3.610   3rd Qu.:180.0  
 Max.   :33.90   Max.   :22.90   Max.   :5.424   Max.   :335.0  

Why this dataset? It’s small, clean, and lets us practice the SEM workflow with manifest variables only.

Exercise 1 — A regression in lm()

Fit a standard multiple regression predicting miles per gallon (mpg) from:

  • car weight (wt)
  • horsepower (hp)
Show code
fit_lm <- lm(mpg ~ wt + hp, data = dat)
summary(fit_lm)

Call:
lm(formula = mpg ~ wt + hp, data = dat)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,    Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

Questions

  1. What is the direction of the effect of wt on mpg?
  2. What is the direction of the effect of hp on mpg?
  3. Roughly, how much variance is explained?

Keep this as your “reference”: we’ll reproduce the same model in lavaan.

Exercise 2 — The same regression in lavaan

In lavaan, you should write the model as a string, preferably in an object and not within the formula.

Show code
mod_sem <- '
  mpg ~ wt + hp
'

fit_sem <- sem(mod_sem, data = dat, meanstructure = TRUE)

summary(fit_sem, standardized = TRUE)
lavaan 0.6-19 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         4

  Number of observations                            32

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  mpg ~                                                                 
    wt               -3.878    0.602   -6.438    0.000   -3.878   -0.630
    hp               -0.032    0.009   -3.696    0.000   -0.032   -0.361

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .mpg              37.227    1.522   24.459    0.000   37.227    6.276

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .mpg               6.095    1.524    4.000    0.000    6.095    0.173

Tasks

  1. Find the regression coefficients for wt and hp. Do they match lm()?
  2. Find the intercept for mpg (lavaan reports it explicitly when meanstructure = TRUE).
  3. Find the residual variance of mpg.

Extracting parameters programmatically

Use the function parameterEstimates() to obtain the estimates. How do you extract standardized estimate?

Show code
pe <- parameterEstimates(fit_sem, standardized = TRUE)
head(pe, 10)
  lhs op rhs      est    se      z pvalue ci.lower ci.upper   std.lv std.all
1 mpg  ~  wt   -3.878 0.602 -6.438      0   -5.058   -2.697   -3.878  -0.630
2 mpg  ~  hp   -0.032 0.009 -3.696      0   -0.049   -0.015   -0.032  -0.361
3 mpg ~~ mpg    6.095 1.524  4.000      0    3.109    9.082    6.095   0.173
4  wt ~~  wt    0.927 0.000     NA     NA    0.927    0.927    0.927   1.000
5  wt ~~  hp   42.812 0.000     NA     NA   42.812   42.812   42.812   0.659
6  hp ~~  hp 4553.965 0.000     NA     NA 4553.965 4553.965 4553.965   1.000
7 mpg ~1       37.227 1.522 24.459      0   34.244   40.210   37.227   6.276
8  wt ~1        3.217 0.000     NA     NA    3.217    3.217    3.217   3.341
9  hp ~1      146.688 0.000     NA     NA  146.688  146.688  146.688   2.174
   std.nox
1   -0.654
2   -0.005
3    0.173
4    0.927
5   42.812
6 4553.965
7    6.276
8    3.217
9  146.688
  • op tells you the operator (~ regression, ~~ variance/covariance, ~1 intercept)
  • est is the estimate
  • se is the standard error
  • z and pvalue are the usual Wald test outputs
  • std.all is the fully standardized estimate

At this stage, focus on estimates only. We’ll talk about modeling strategy and diagnostics later.

Check-in: You just used lavaan as a “regression engine”. SEM becomes interesting when we connect multiple equations and allow covariances explicitly.

Exercise 3 — Multivariate regression (two outcomes)

Now predict two outcomes from the same predictors:

  • mpg ~ wt + hp
  • qsec ~ wt + hp

And allow:

  • the predictors to covary (wt ~~ hp)
  • the residuals of the two outcomes to covary (mpg ~~ qsec)

Adding covariances requires additional degrees of freedom!

Show code
mod_mv <- '
  # regressions (two outcomes)
  mpg  ~ wt + hp
  qsec ~ wt + hp

  # covariance among predictors (exogenous variables)
  wt ~~ hp

  # residual covariance among outcomes (endogenous disturbances)
  mpg ~~ qsec
'

fit_mv <- sem(mod_mv, data = dat, meanstructure = TRUE)
summary(fit_mv, standardized = TRUE)
lavaan 0.6-19 ended normally after 43 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14

  Number of observations                            32

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  mpg ~                                                                 
    wt               -3.878    0.602   -6.438    0.000   -3.878   -0.630
    hp               -0.032    0.009   -3.696    0.000   -0.032   -0.361
  qsec ~                                                                
    wt                0.942    0.253    3.720    0.000    0.942    0.516
    hp               -0.027    0.004   -7.560    0.000   -0.027   -1.048

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  wt ~~                                                                 
    hp               42.812   13.757    3.112    0.002   42.812    0.659
 .mpg ~~                                                                
   .qsec              0.550    0.463    1.187    0.235    0.550    0.215

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .mpg              37.227    1.522   24.459    0.000   37.227    6.276
   .qsec             18.826    0.640   29.433    0.000   18.826   10.704
    wt                3.217    0.170   18.898    0.000    3.217    3.341
    hp              146.688   11.929   12.296    0.000  146.688    2.174

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .mpg               6.095    1.524    4.000    0.000    6.095    0.173
   .qsec              1.076    0.269    4.000    0.000    1.076    0.348
    wt                0.927    0.232    4.000    0.000    0.927    1.000
    hp             4553.965 1138.491    4.000    0.000 4553.965    1.000

Questions

  1. Why is wt ~~ hp a reasonable parameter here?
  2. What does mpg ~~ qsec represent in this model (hint: both have predictors)?
  3. Compare the regression estimates for mpg with Exercise 2. Do they change? Why/why not?

Compare to two separate regressions

Show code
fit_lm_mpg  <- lm(mpg  ~ wt + hp, data = dat)
fit_lm_qsec <- lm(qsec ~ wt + hp, data = dat)

summary(fit_lm_mpg)$coef
               Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 37.22727012 1.59878754 23.284689 2.565459e-20
wt          -3.87783074 0.63273349 -6.128695 1.119647e-06
hp          -0.03177295 0.00902971 -3.518712 1.451229e-03
Show code
summary(fit_lm_qsec)$coef
               Estimate  Std. Error   t value     Pr(>|t|)
(Intercept) 18.82558525 0.671867025 28.019808 1.493575e-22
wt           0.94153237 0.265896975  3.540967 1.368578e-03
hp          -0.02730962 0.003794603 -7.196964 6.362107e-08

Separate lm() fits ignore the joint structure (e.g., residual covariance). lavaan makes it explicit.

Exercise 4 — A simple equality constraint

Test whether the effect of wt is the same for both outcomes (mpg and qsec).

4a) Unconstrained model

(You already fit it as fit_mv.)

4b) Constrained model

Label the two wt slopes with the same name (e.g., b_wt):

Show code
mod_mv_equal <- '
  mpg  ~ b_wt*wt + hp
  qsec ~ b_wt*wt + hp

  wt ~~ hp
  mpg ~~ qsec
'

fit_mv_equal <- sem(mod_mv_equal, data = dat, meanstructure = TRUE)
summary(fit_mv_equal, standardized = TRUE)
lavaan 0.6-19 ended normally after 64 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        14
  Number of equality constraints                     1

  Number of observations                            32

Model Test User Model:
                                                      
  Test statistic                                35.243
  Degrees of freedom                                 1
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value   P(>|z|)   Std.lv  Std.all
  mpg ~                                                                  
    wt      (b_wt)    0.524    0.248     2.114    0.035    0.524    0.082
    hp               -0.073    0.011    -6.762    0.000   -0.073   -0.805
  qsec ~                                                                 
    wt      (b_wt)    0.524    0.248     2.114    0.035    0.524    0.298
    hp               -0.023    0.004    -6.378    0.000   -0.023   -0.932

Covariances:
                   Estimate  Std.Err  z-value   P(>|z|)   Std.lv  Std.all
  wt ~~                                                                  
    hp               42.812    8.643     4.953    0.000   42.812    0.659
 .mpg ~~                                                                 
   .qsec             -0.416    0.774    -0.537    0.591   -0.416   -0.095

Intercepts:
                   Estimate  Std.Err  z-value   P(>|z|)   Std.lv  Std.all
   .mpg              29.136    1.766    16.502    0.000   29.136    4.751
   .qsec             19.594    0.645    30.368    0.000   19.594   11.579
    wt                3.217    0.170    18.898    0.000    3.217    3.341
    hp              146.687   11.929    12.296    0.000  146.687    2.174

Variances:
                   Estimate  Std.Err  z-value   P(>|z|)   Std.lv  Std.all
   .mpg              16.266    4.066     4.000    0.000   16.266    0.432
   .qsec              1.168    0.292     4.000    0.000    1.168    0.408
    wt                0.927    0.209     4.440    0.000    0.927    1.000
    hp             4553.971    0.081 56034.919    0.000 4553.971    1.000

4c) Compare nested models

Show code
anova(fit_mv, fit_mv_equal)

Chi-Squared Difference Test

             Df    AIC    BIC  Chisq Chisq diff  RMSEA Df diff Pr(>Chisq)    
fit_mv        0 698.87 719.40  0.000                                         
fit_mv_equal  1 732.12 751.17 35.243     35.243 1.0345       1  2.911e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation questions

  1. If the constrained model fits significantly worse, what does that imply?
  2. If there is little/no difference, what does that imply?
  3. Which model would you prefer and why (think theory + parsimony)?

We’ll later discuss when χ² difference tests are appropriate, and what changes under robust estimators.

Exercise 5 — Reading model structure (sanity checks)

Use lavInspect() to view the sample covariances and the model-implied covariances.

Show code
S <- lavInspect(fit_mv, "sampstat")$cov
Sigma_hat <- fitted(fit_mv)$cov

S
          mpg     qsec       wt       hp
mpg    35.189                           
qsec    4.368    3.093                  
wt     -4.957   -0.296    0.927         
hp   -310.709  -84.059   42.812 4553.965
Show code
Sigma_hat
          mpg     qsec       wt       hp
mpg    35.189                           
qsec    4.368    3.093                  
wt     -4.957   -0.296    0.927         
hp   -310.709  -84.059   42.812 4553.965

Questions

  1. Are the matrices the same? Should they be?
  2. Which parts of the model influence the off-diagonal elements of Sigma_hat?

Wrap-up

What you should take away

  • lavaan models are written with a small “grammar”: ~, ~~, ~1 (and later =~)
  • Even with manifest variables only, SEM lets you model systems of equations and covariances
  • Equality constraints are easy: reuse parameter labels

What’s next

  • In the next lecture we move from “multivariate regression” to path models and mediation
  • You will reuse the same syntax you practiced today (plus parameter labels for effects)

Solutions (instructor version)