In this lab you will:
We start with a controlled world:
\[ \begin{aligned} M &= aX + \varepsilon_M \\ Y &= c'X + bM + \varepsilon_Y \end{aligned} \qquad \text{Indirect}=ab,\ \text{Total}=c'+ab \]
We generate data with:
N <- 400
a <- 0.50
b <- 0.60
cprime <- 0.10
X <- rnorm(N, 0, 1)
eM <- rnorm(N, 0, 1)
eY <- rnorm(N, 0, 1)
M <- a*X + eM
Y <- cprime*X + b*M + eY
dat <- data.frame(X, M, Y)
summary(dat) X M Y
Min. :-3.39606 Min. :-3.3910 Min. :-4.0705
1st Qu.:-0.64903 1st Qu.:-0.7711 1st Qu.:-0.8969
Median :-0.02071 Median :-0.0589 Median :-0.0280
Mean : 0.00775 Mean :-0.0580 Mean :-0.0693
3rd Qu.: 0.65036 3rd Qu.: 0.6764 3rd Qu.: 0.7699
Max. : 3.04377 Max. : 3.5447 Max. : 3.1250
Using simulated data is not “making it easier”: it lets you verify whether estimation and inference behave as theory predicts.
ind := a*b and tot := cprime + a*blavaan 0.6-19 ended normally after 1 iteration
Estimator ML
Optimization method NLMINB
Number of model parameters 7
Number of observations 400
Model Test User Model:
Test statistic 0.000
Degrees of freedom 0
Parameter Estimates:
Standard errors Standard
Information Expected
Information saturated (h1) model Structured
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
M ~
X (a) 0.463 0.049 9.521 0.000 0.463 0.430
Y ~
X (cprm) -0.010 0.052 -0.185 0.853 -0.010 -0.008
M (b) 0.659 0.048 13.604 0.000 0.659 0.603
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.M -0.062 0.049 -1.248 0.212 -0.062 -0.056
.Y -0.031 0.048 -0.648 0.517 -0.031 -0.026
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.M 0.973 0.069 14.142 0.000 0.973 0.815
.Y 0.914 0.065 14.142 0.000 0.914 0.641
Defined Parameters:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
ind 0.305 0.039 7.800 0.000 0.305 0.259
tot 0.296 0.057 5.185 0.000 0.296 0.251
pe <- parameterEstimates(fit_med, standardized = TRUE)
pe[pe$op %in% c("~",":="), c("lhs","op","rhs","label","est","se","z","pvalue","std.all")] lhs op rhs label est se z pvalue std.all
1 M ~ X a 0.463 0.049 9.521 0.000 0.430
2 Y ~ X cprime -0.010 0.052 -0.185 0.853 -0.008
3 Y ~ M b 0.659 0.048 13.604 0.000 0.603
10 ind := a*b ind 0.305 0.039 7.800 0.000 0.259
11 tot := cprime+(a*b) tot 0.296 0.057 5.185 0.000 0.251
Questions (answer in words)
ind := a*b?Indirect effects are products; their sampling distribution is often skewed. Let’s bootstrap.
fit_med_boot <- sem(
mod_med, data = dat,
se = "bootstrap", bootstrap = 2000,
meanstructure = TRUE
)
peb <- parameterEstimates(fit_med_boot, ci = TRUE, level = .95,
standardized = TRUE)
peb[peb$op %in% c("~",":="), c("lhs","op","rhs","label","est","se","ci.lower","ci.upper","std.all")] lhs op rhs label est se ci.lower ci.upper std.all
1 M ~ X a 0.463 0.047 0.376 0.561 0.430
2 Y ~ X cprime -0.010 0.058 -0.125 0.105 -0.008
3 Y ~ M b 0.659 0.048 0.563 0.753 0.603
10 ind := a*b ind 0.305 0.039 0.237 0.387 0.259
11 tot := cprime+(a*b) tot 0.296 0.060 0.176 0.414 0.251
Questions
ind under delta method (Exercise 1) vs bootstrap (Exercise 2). Are they similar?Here we compare:
Chi-Squared Difference Test
Df AIC BIC Chisq Chisq diff RMSEA Df diff Pr(>Chisq)
fit_med 0 2237 2265 0.00
fit_full 1 2235 2259 0.03 0.0343 0 1 0.85
Interpretation questions
Now we simulate a covariate \((Z)\) that affects both \((M)\) and \((Y)\). Think of \((Z)\) as a plausible confounder.
\[ \begin{aligned} M &= aX + dZ + \varepsilon_M\\ Y &= c'X + bM + eZ + \varepsilon_Y \end{aligned} \]
fit_noZ <- sem(mod_med, data = dat2, meanstructure = TRUE)
parameterEstimates(fit_noZ)[parameterEstimates(fit_noZ)$op %in% c("~",":="),
c("lhs","op","rhs","label","est","se","pvalue")] lhs op rhs label est se pvalue
1 M ~ X a 0.513 0.057 0.000
2 Y ~ X cprime -0.020 0.063 0.758
3 Y ~ M b 0.846 0.050 0.000
10 ind := a*b ind 0.434 0.055 0.000
11 tot := cprime+(a*b) tot 0.415 0.075 0.000
mod_withZ <- '
M ~ a*X + d*Z
Y ~ cprime*X + b*M + e*Z
ind := a*b
tot := cprime + (a*b)
'
fit_withZ <- sem(mod_withZ, data = dat2, meanstructure = TRUE)
parameterEstimates(fit_withZ)[parameterEstimates(fit_withZ)$op %in% c("~",":="),
c("lhs","op","rhs","label","est","se","pvalue")] lhs op rhs label est se pvalue
1 M ~ X a 0.500 0.049 0.0
2 M ~ Z d 0.578 0.049 0.0
3 Y ~ X cprime 0.097 0.059 0.1
4 Y ~ M b 0.594 0.053 0.0
5 Y ~ Z e 0.558 0.060 0.0
15 ind := a*b ind 0.297 0.040 0.0
16 tot := cprime+(a*b) tot 0.394 0.060 0.0
Questions
fit_noZ vs fit_withZ. Which one looks more biased and why?Now simulate two mediators \((M_1)\) and \((M_2)\) in parallel:
\[ \begin{aligned} M_1 &= a_1X + \varepsilon_1\\ M_2 &= a_2X + \varepsilon_2\\ Y &= c'X + b_1M_1 + b_2M_2 + \varepsilon_Y \end{aligned} \]
mod_par <- '
M1 ~ a1*X
M2 ~ a2*X
Y ~ cprime*X + b1*M1 + b2*M2
M1 ~~ M2 # allow mediator covariance (often realistic)
ind1 := a1*b1
ind2 := a2*b2
indT := ind1 + ind2
tot := cprime + indT
'
fit_par <- sem(mod_par, data = dat3, se = "bootstrap", bootstrap = 2000,
meanstructure = TRUE)
pe3 <- parameterEstimates(fit_par, ci = TRUE, standardized = TRUE)
pe3[pe3$op %in% c("~",":="), c("lhs","op","rhs","label","est","ci.lower","ci.upper","std.all")] lhs op rhs label est ci.lower ci.upper std.all
1 M1 ~ X a1 0.517 0.421 0.618 0.447
2 M2 ~ X a2 0.303 0.193 0.406 0.274
3 Y ~ X cprime 0.140 0.024 0.267 0.103
4 Y ~ M1 b1 0.556 0.456 0.653 0.473
5 Y ~ M2 b2 0.343 0.241 0.444 0.279
15 ind1 := a1*b1 ind1 0.288 0.218 0.364 0.212
16 ind2 := a2*b2 ind2 0.104 0.060 0.155 0.076
17 indT := ind1+ind2 indT 0.392 0.307 0.480 0.288
18 tot := cprime+indT tot 0.531 0.416 0.654 0.391
Questions
ind1 or ind2? Does that match the generating values?M1 ~~ M2 change \((b_1)\) and \((b_2)\) much? Why might it in real data?With only \((X,M,Y)\) and a mediation structure, the model is often just-identified \((df=0)\).
That means global fit cannot distinguish among different directional stories.
Fit an alternative model where Y predicts M (reversing the middle relation), still allowing a direct X → Y path:
\[ \begin{aligned} Y &= i_Y + \alpha X + \varepsilon_Y\\ M &= i_M + \gamma X + \beta Y + \varepsilon_M \end{aligned} \]
mod_alt <- '
Y ~ alpha*X
M ~ gamma*X + beta*Y
ind_alt := alpha*beta
'
fit_alt <- sem(mod_alt, data = dat, meanstructure = TRUE)
c(
df_med = fitMeasures(fit_med, "df"),
df_alt = fitMeasures(fit_alt, "df"),
chisq_med = fitMeasures(fit_med, "chisq"),
chisq_alt = fitMeasures(fit_alt, "chisq")
) df_med.df df_alt.df chisq_med.chisq chisq_alt.chisq
0 0 0 0
Questions
If reversing a path makes it significant, the model is not better or more true