In this tutorial, we aim to demonstrate how to use blavaan (Merkle and Rosseel 2015) for structural equation models (SEMs) and the corresponding model assessment. The newest version of blavaan is equipped with an efficient approach based on Stan (Merkle et al. 2020).

Confirmatory Factor Analysis (CFA)

As a measurement model and probably one of the most popular special cases of a SEM, CFA is often used to 1) validate a hypothesized factor structure among multiple variables, 2) estimate the correlation between factors, and 3) obtain factor scores. For example, consider a two-factor (\(\eta_{1j}, \eta_{2j}\)) model with each factor measured by six items (\(y_{1j}, \dots, y_{6j}\)) for person \(j\): \[ \underbrace{\left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]}_{\boldsymbol{y}_{j}}=\underbrace{\left[\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \\ \beta_{5} \\ \beta_{6} \end{array}\right]}_{\boldsymbol{\beta}}+\underbrace{\left[\begin{array}{cc} 1 & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & 1 \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \end{array}\right]}_{\Lambda}\underbrace{\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]}_{\boldsymbol{\eta}_j}+\underbrace{\left[\begin{array}{c} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]}_{\boldsymbol{\epsilon}_j} \]

\[ \boldsymbol{\epsilon}_{j} \sim N_{I}(\mathbf{0}, \mathbf{\Theta}) \]

\[ \boldsymbol{\eta}_{j} \sim N_{K}(\mathbf{0}, \boldsymbol{\Psi}), \] where the number of items or variables is \(I = 6\), the number of factors is \(K = 2\) and \(\mathbf{\Theta}\) is often assumed to be a diagonal matrix.

  • \(Y_{ij}\) is the response of person \(j\) \((j=1,...,J)\) on item \(i\) \((i=1,...,I)\).

  • \(\beta_i\) is the intercept for item \(i\).

  • \(\eta_{jk}\) is the \(k\)th common factor for person \(j\).

  • \(\lambda_{ik}\) is the factor loading of item \(i\) on factor \(k\).

  • \(\epsilon_{ij}\) is the random error term for person \(j\) on item \(i\).

  • \(\boldsymbol{\Psi}\) is the variance-covariance matrix of the common factors \(\boldsymbol{\eta}_{j}\) .

  • \(\mathbf{\Theta}\) is the variance-covariance matrix of the residuals (or unique factors) \(\boldsymbol{\epsilon}_{j}\).

Suppose the errors or residuals \(\epsilon_{ij}\) are independent of each other. Then:

\(\psi_{kk}\) is the variance for the \(k\)th factor, \(\psi_{jk}\) is the covariance between the \(j\)th and \(k\)th factors, \(\theta_{ii}\) is the variance for the \(i\)th residual, and \(\theta_{ii'}=0\) if and only if \(i\neq i'\). Specifically,

\[ \boldsymbol{\Psi}=\mathrm{Cov}\begin{pmatrix} \eta_{1j} \\ \eta_{2j} \end{pmatrix}= \left[\begin{array}{cc} \psi_{1 1}&\psi_{1 2} \\ \psi_{2 1}&\psi_{2 2} \end{array}\right] \]

\[\mathbf{\Theta}=\mathrm{Cov}\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right] \]

To get started, we load the following R packages:

#knitr::opts_chunk$set(eval = FALSE)
knitr::opts_chunk$set(echo = TRUE, comment = "", cache = T)
library(rstan)
library(knitr)
library(blavaan)
library(lavaan)
library(MASS)
library(mvtnorm)
library(tidyverse)
library(semPlot)
library(magrittr)
library(Matrix)
options(mc.cores = parallel::detectCores())

Simulation

To better illustrate the use of blavaan, we simulate data so that we know the data generating parameters. In our simulation, we set \(\beta_i = 0\), \(\psi_{11} = 1\), \(\psi_{12} = \psi_{21} =.5\), \(\psi_{22} = .8\), \(\lambda_{21} = 1.5\), \(\lambda_{31} = 2\), \(\lambda_{52} = 1.5\), \(\lambda_{62} = 2\), \(\theta_{ii} =.3\). We simulate data from the above model for \(J = 1000\) units.

# setup
J <- 1000
I <- 6
K <- 2
psi <- matrix(c(1, 0.5,
                0.5, 0.8), nrow = K)  
beta <- seq(1, 2, by = .2)

# loading matrix
Lambda <- cbind(c(1, 1.5, 2, 0, 0, 0), c(0, 0, 0, 1, 1.5, 2))

# error covariance
Theta <- diag(0.3, nrow = I)

# factor scores
eta <- mvrnorm(J, mu = c(0, 0), Sigma = psi)

# error term
epsilon <- mvrnorm(J, mu = rep(0, ncol(Theta)),Sigma = Theta)

dat <- tcrossprod(eta, Lambda) + epsilon
dat_cfa  <-  dat %>% as.data.frame() %>% setNames(c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6"))

We define the model for lavaan as follows:

lavaan_cfa <- 'eta1 =~ Y1 + Y2 + Y3
               eta2 =~ Y4 + Y5 + Y6'

Two latent variables eta1 and eta2 are specified to be measured by three items each, denoted as eta1 =~ Y1 + Y2 + Y3, and similary for eta2. By not specifying other parts of the model, by default we assume that the error terms for items are uncorrelated with each other while the covariances between latent variables are free to be estimated.

We represent the CFA model in a path diagram and then fit the model by maximum likelihood estimation using the cfa function in the lavaan package. By convention, latent variables \(\eta_1\) and \(\eta_2\) are represented by circles, and observed variables \(Y_{1}\) to \(Y_{6}\) by rectangles. Straight arrows represent linear relations (here with coefficients given by the factor loadings \(\lambda\)), and double-headed arrows represent variances and covariances. We could make the diagram by simply using the function call semPaths(semPlotModel_lavaanModel(lavaan_cfa)). Below is the more complex syntax to display Greek letters, subscripts, etc.

FIT <-
  semPlotModel_lavaanModel(
    lavaan_cfa,
    auto.var = TRUE,
    auto.fix.first = TRUE,
    auto.cov.lv.x = TRUE
  )
semPaths(
  FIT,
  what = "paths",
  whatLabels = "par",
  ,
  nodeLabels = c(
    expression(paste(Y[1])),
    expression(paste(Y[2])),
    expression(paste(Y[3])),
    expression(paste(Y[4])),
    expression(paste(Y[5])),
    expression(paste(Y[6])),
    expression(paste(eta[1])),
    expression(paste(eta[2]))
  ),
  edge.label.cex = 0.8,
  edgeLabels = c(
    expression(paste(lambda[1])),
    expression(paste(lambda[2])),
    expression(paste(lambda[3])),
    expression(paste(lambda[4])),
    expression(paste(lambda[5])),
    expression(paste(lambda[6])),
    expression(paste("Covariance")),
    expression(paste(epsilon[1])),
    expression(paste(epsilon[2])),
    expression(paste(epsilon[3])),
    expression(paste(epsilon[4])),
    expression(paste(epsilon[5])),
    expression(paste(epsilon[6])),
    expression(paste(psi[1])),
    expression(paste(psi[2]))
  )
)

# lavaan
lav_cfa_fit <- cfa(lavaan_cfa, data = dat_cfa, meanstructure = TRUE)
summary(lav_cfa_fit, fit.measures = TRUE)
lavaan 0.6.15 ended normally after 29 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        19

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 1.553
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.992

Model Test Baseline Model:

  Test statistic                              6005.574
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.002

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -7909.932
  Loglikelihood unrestricted model (H1)      -7909.155
                                                      
  Akaike (AIC)                               15857.863
  Bayesian (BIC)                             15951.111
  Sample-size adjusted Bayesian (SABIC)      15890.766

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: RMSEA <= 0.050                    1.000
  P-value H_0: RMSEA >= 0.080                    0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  eta1 =~                                             
    Y1                1.000                           
    Y2                1.448    0.032   45.487    0.000
    Y3                1.964    0.042   46.845    0.000
  eta2 =~                                             
    Y4                1.000                           
    Y5                1.513    0.036   41.878    0.000
    Y6                1.969    0.045   43.298    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  eta1 ~~                                             
    eta2              0.523    0.037   14.100    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y1               -0.016    0.037   -0.435    0.664
   .Y2               -0.010    0.049   -0.194    0.846
   .Y3               -0.004    0.066   -0.055    0.956
   .Y4               -0.086    0.033   -2.581    0.010
   .Y5               -0.078    0.047   -1.662    0.097
   .Y6               -0.110    0.059   -1.852    0.064
    eta1              0.000                           
    eta2              0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y1                0.328    0.017   19.053    0.000
   .Y2                0.271    0.022   12.417    0.000
   .Y3                0.351    0.037    9.537    0.000
   .Y4                0.296    0.016   18.809    0.000
   .Y5                0.297    0.023   12.952    0.000
   .Y6                0.340    0.035    9.711    0.000
    eta1              1.025    0.059   17.262    0.000
    eta2              0.822    0.049   16.788    0.000

By default, lavaan uses maximum likelihood estimation. Estimates of the loadings \(\lambda_{ik}\) are given under “Latent Variables”, the estimated covariance among the common factors is given under “Covariances”, estimates of intercepts \(\beta_i\) of the measurement models, as well as the means of the common factors (set to 0, not estimated), are given under “Intercepts”, and estimates of the residual variances of the responses (variances \(\theta_{ii}\) of \(\epsilon_{ij}\)) and of the common factors (variances \(\psi_{kk}\) of \(\eta_{jk}\)) are given under “Variances.”

blavaan borrows the syntax from lavaan and is a wrapper for Bayesian estimation using Stan. The Stan object created by blavaan can be seen using lavInspect().

By default, the bcfa function in blavaan uses an anchoring item for each factor for identification, so the factor loading of the first item that loads on a factor is fixed at 1. blavaan also includes intercepts for all observed variables by default, which is different from the default in lavaan (we specified meanstructure = TRUE in the cfa function call above so that the two parameterizations match).

# blavaan
blav_cfa_fit <- bcfa(lavaan_cfa, data=dat_cfa, mcmcfile = T)
Computing posterior predictives...
summary(blav_cfa_fit)
blavaan (0.3-15) results of 1000 samples after 500 adapt/burnin iterations

  Number of observations                          1000

  Number of missing patterns                         1

  Statistic                                 MargLogLik         PPP
  Value                                      -8007.416       0.744

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  eta1 =~                                                                      
    Y1                1.000                                  NA                
    Y2                1.449    0.031     1.39    1.512    0.999    normal(0,10)
    Y3                1.967    0.042    1.887     2.05    1.000    normal(0,10)
  eta2 =~                                                                      
    Y4                1.000                                  NA                
    Y5                1.516    0.036    1.446    1.587    1.000    normal(0,10)
    Y6                1.973    0.044    1.888    2.062    1.000    normal(0,10)

Covariances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  eta1 ~~                                                                      
    eta2              0.522    0.037    0.453    0.598    1.000     lkj_corr(1)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .Y1               -0.015    0.038    -0.09     0.06    1.000    normal(0,32)
   .Y2               -0.009    0.051   -0.108    0.088    1.000    normal(0,32)
   .Y3               -0.002    0.067   -0.137     0.13    1.000    normal(0,32)
   .Y4               -0.086    0.034   -0.154   -0.022    1.000    normal(0,32)
   .Y5               -0.077    0.047    -0.17    0.014    1.001    normal(0,32)
   .Y6               -0.109    0.061    -0.23    0.009    1.001    normal(0,32)
    eta1              0.000                                  NA                
    eta2              0.000                                  NA                

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .Y1                0.330    0.017    0.297    0.365    1.000 gamma(1,.5)[sd]
   .Y2                0.272    0.022     0.23    0.318    1.000 gamma(1,.5)[sd]
   .Y3                0.351    0.039    0.279    0.431    1.000 gamma(1,.5)[sd]
   .Y4                0.298    0.016    0.269    0.333    1.000 gamma(1,.5)[sd]
   .Y5                0.299    0.023    0.255    0.345    1.000 gamma(1,.5)[sd]
   .Y6                0.342    0.036    0.272    0.412    1.000 gamma(1,.5)[sd]
    eta1              1.027    0.059    0.919    1.151    1.000 gamma(1,.5)[sd]
    eta2              0.823    0.049    0.729    0.921    0.999 gamma(1,.5)[sd]
fitmeasures(blav_cfa_fit)
      npar       logl        ppp        bic        dic      p_dic       waic 
    19.000  -7909.981      0.744  15951.190  15858.125     19.081  15858.403 
    p_waic    se_waic      looic      p_loo     se_loo margloglik 
    19.272    111.599  15858.456     19.299    111.600  -8007.416 

We can see that the output from blavaan resembles that from lavaan although it gives the posterior means and standard deviations for the model parameters.

We present the maximum likelihood estimates from lavaan and Bayesian estimates from blavaan next to each other for comparison; Since by default blavaan uses non-informative priors and the size of data is large, we see the two sets of estimates are very similar:

bind_cols(parameterEstimates(lav_cfa_fit)[, 1:4], parameterEstimates(blav_cfa_fit)[, 4]) %>% rename(ML = est, Bayes = ...5) %>% knitr::kable()
New names:
• `` -> `...5`
lhs op rhs ML Bayes
eta1 =~ Y1 1.0000000 1.0000000
eta1 =~ Y2 1.4479658 1.4494496
eta1 =~ Y3 1.9641409 1.9666819
eta2 =~ Y4 1.0000000 1.0000000
eta2 =~ Y5 1.5126562 1.5157573
eta2 =~ Y6 1.9690845 1.9727406
Y1 ~~ Y1 0.3276724 0.3296146
Y2 ~~ Y2 0.2709558 0.2724962
Y3 ~~ Y3 0.3507402 0.3512957
Y4 ~~ Y4 0.2962317 0.2984597
Y5 ~~ Y5 0.2972492 0.2991214
Y6 ~~ Y6 0.3398565 0.3415060
eta1 ~~ eta1 1.0248042 1.0265839
eta2 ~~ eta2 0.8222773 0.8226690
eta1 ~~ eta2 0.5225991 0.5223931
Y1 ~1 -0.0159813 -0.0152329
Y2 ~1 -0.0095513 -0.0086645
Y3 ~1 -0.0036208 -0.0020447
Y4 ~1 -0.0863222 -0.0856052
Y5 ~1 -0.0775607 -0.0767313
Y6 ~1 -0.1099804 -0.1088010
eta1 ~1 0.0000000 0.0000000
eta2 ~1 0.0000000 0.0000000

To check the setting for the underlying Stan program (i.e., the number of chains, the number of warm-up interations, and the number of post warm-up draws):

blavInspect(blav_cfa_fit, "mcobj")
Inference for Stan model: stanmarg.
3 chains, each with iter=1500; warmup=500; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

                 mean se_mean   sd     2.5%      25%      50%      75%    97.5%
ly_sign[1]       1.45    0.00 0.03     1.39     1.43     1.45     1.47     1.51
ly_sign[2]       1.97    0.00 0.04     1.89     1.94     1.97     2.00     2.05
ly_sign[3]       1.52    0.00 0.04     1.45     1.49     1.52     1.54     1.59
ly_sign[4]       1.97    0.00 0.04     1.89     1.94     1.97     2.00     2.06
Theta_var[1]     0.33    0.00 0.02     0.30     0.32     0.33     0.34     0.36
Theta_var[2]     0.27    0.00 0.02     0.23     0.26     0.27     0.29     0.32
Theta_var[3]     0.35    0.00 0.04     0.28     0.32     0.35     0.38     0.43
Theta_var[4]     0.30    0.00 0.02     0.27     0.29     0.30     0.31     0.33
Theta_var[5]     0.30    0.00 0.02     0.26     0.28     0.30     0.31     0.35
Theta_var[6]     0.34    0.00 0.04     0.27     0.32     0.34     0.37     0.41
Psi_cov[1]       0.52    0.00 0.04     0.45     0.50     0.52     0.55     0.60
Psi_var[1]       1.03    0.00 0.06     0.92     0.99     1.02     1.06     1.15
Psi_var[2]       0.82    0.00 0.05     0.73     0.79     0.82     0.86     0.92
Nu_free[1]      -0.02    0.00 0.04    -0.09    -0.04    -0.02     0.01     0.06
Nu_free[2]      -0.01    0.00 0.05    -0.11    -0.04    -0.01     0.03     0.09
Nu_free[3]       0.00    0.00 0.07    -0.14    -0.05     0.00     0.04     0.13
Nu_free[4]      -0.09    0.00 0.03    -0.15    -0.11    -0.09    -0.06    -0.02
Nu_free[5]      -0.08    0.00 0.05    -0.17    -0.11    -0.08    -0.04     0.01
Nu_free[6]      -0.11    0.00 0.06    -0.23    -0.15    -0.11    -0.07     0.01
lp__         -7971.62    0.09 3.19 -7978.79 -7973.49 -7971.28 -7969.34 -7966.40
             n_eff Rhat
ly_sign[1]    3120    1
ly_sign[2]    2922    1
ly_sign[3]    3137    1
ly_sign[4]    3137    1
Theta_var[1]  4755    1
Theta_var[2]  3581    1
Theta_var[3]  3796    1
Theta_var[4]  4847    1
Theta_var[5]  4220    1
Theta_var[6]  4103    1
Psi_cov[1]    3418    1
Psi_var[1]    3003    1
Psi_var[2]    3284    1
Nu_free[1]    1591    1
Nu_free[2]    1514    1
Nu_free[3]    1529    1
Nu_free[4]    1769    1
Nu_free[5]    1597    1
Nu_free[6]    1564    1
lp__          1219    1

Samples were drawn using NUTS(diag_e) at Thu Apr 20 18:43:35 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

We see that there were 3 chains with 1500 iterations each, the first 500 of which were warmup, giving a total of 3000 draws. In the Stan output above, se_mean is the Monte Carlo error due to the fact that we use posterior draws to empirically estimate posterior expectations. These Monte Carlo errors are given by \(\frac{sd}{n_{eff}}\), where sd here refers to the posterior standard deviation which can be viewed as a Bayesian counterpart of frequentist standard errors. Because there is an autocorrelation among our Monte Carlo draws within chains, the “effective” sample, n_eff, used to estimate the Monte Carlo errors is smaller than the number of post warmup draws. The effective sample size can be viewed as the number of independent draws that would give the same amount of information in terms of the Monte Carlo errors. We see that the Monte Carlo errors will go to zero as the number of draws goes to infinity. Rhat quantifies how well the different chains mix with each other (overlap) post warmup, with a value closer to 1 less than 1.1 indicating that there is adequate mixing. The purpose of running multiple chains with different starting values for the parameters is to assess whether the chains have reached the stationary distribution, i.e., the required correct posterior distribution, by checking that the chains are converging to the same distribution and are hence mixing with each other.

One benefit of running a Bayesian analysis is that we can incorporate prior information if we have it. blavaan allows users to specify priors for all the model parameters via the argument dp and uses non-informative priors (with flat densities/large variances) if this argument is left blank. When non-informative priors are used for all parameters, the Bayesian estimates will be similar to their maximum likelihood counterparts.

To check the default priors,

(default_prior <- dpriors())
               nu             alpha            lambda              beta 
   "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0,10)" 
            theta               psi               rho             ibpsi 
"gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)" "wishart(3,iden)" 
              tau             delta 
"normal(0,10^.5)" "gamma(1,.5)[sd]" 

To change a prior, we can just substitute the desired one: for example, if we would like to change the prior for beta to be normal(0, 1), then we can form a new prior vector:

(new_prior <- dpriors(beta = "normal(0, 1)"))
               nu             alpha            lambda              beta 
   "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0, 1)" 
            theta               psi               rho             ibpsi 
"gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)" "wishart(3,iden)" 
              tau             delta 
"normal(0,10^.5)" "gamma(1,.5)[sd]" 
# new cfa with updated prior for beta
# bcfa(lavaan_cfa, data=dat_cfa, dp = new_prior)

For further information on how to change the priors for individual parameters, check https://faculty.missouri.edu/~merklee/blavaan/prior.html.

Latent Growth Curve Models

As a popular method for modeling growth, in this section, we discuss another special case of SEM: latent growth-curve models. This model is useful when modeling change over time within individuals changing over time and variability among individual trajectories. A latent growth curve model for six occasions can be written as:

\[ \left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]=\left[\begin{array}{ll} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ 1 & 4 \\ 1 & 5 \end{array}\right]\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]+\left[\begin{array}{l} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]=\left[\begin{array}{l} \eta_{1 j}+0 \eta_{2 j}+\epsilon_{1 j} \\ \eta_{1 j}+1 \eta_{2 j}+\epsilon_{2 j} \\ \eta_{1 j}+2 \eta_{2 j}+\epsilon_{3 j} \\ \eta_{1 j}+3 \eta_{2 j}+\epsilon_{4 j} \\ \eta_{1 j}+4 \eta_{2 j}+\epsilon_{5 j} \\ \eta_{1 j}+5 \eta_{2 j}+\epsilon_{6 j} \end{array}\right] \]

\[ \boldsymbol{\epsilon}_{j} \sim N_{p}(\mathbf{0}, \mathbf{\Theta}) \]

Where

  • \(y_{ij}\) is the response at occasion \(i\) for individual \(j\).

  • \(\eta_{1j}\) is an individual-specific intercept and \(\eta_{2j}\) is an individual-specific slope of time. These intercepts and slopes are latent variables that follow a bivariate normal distribution with free means and an unstructured covariance matrix.

  • \(\epsilon_{ij}\) is an occasion-specific error. The errors \(\epsilon_{1j}\) to \(\epsilon_{6j}\)have zero means and a diagonal \(6 \times 6\) covariance matrix:

\[\mathbf{\Theta}=cov\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right] \] The pre-fixed loadings (0, 1, 2, 3, 4, 5) for the individual-specific slope represent the times associated with the occasions, and this kind of model works only if the time variable takes on the same set of values for all individuals. Here the occasions are additionally assumed to be equally spaced.

We simulated data for 500 individuals based on the model above as follows:

N <- 500
psi <- matrix(c(1, 0.5, 
                    0.5, 1), nrow = 2)  
# loading matrix
Lambda <- matrix(c(1, 1, 1, 1, 1, 1, 0:5), nrow = 6)

# error covariance
Theta <- diag(0.3, nrow = 6)

# exogenous latent variables
eta <- mvrnorm(N, mu = c(2, 1), Sigma = psi)

# error term
epsilon <- mvrnorm(N, mu = rep(0, ncol(Theta)), Sigma = Theta)

dat <- tcrossprod(eta, Lambda) + epsilon
dat  <-  dat %>% as.matrix() %>% as.data.frame() %>% setNames(paste0("Y", 1:6))

To estimate the parameters of our model (visualized in the path diagram below) using lavaan, we define two latent variables (i.e., ri the random intercept and rc the random slope/coefficient). The syntax 1*Y1 means that we are fixing the factor loadings for Y1 to 1. We also constrain the residual variance estimates (indicated as c) for all the observed variables to be the same by Y1 ~~ c*Y1, which means Y1 has a residual variance c to be estimated (and this is the same for the other response variables).

lavaan_lgm <- 'ri =~ 1*Y1 + 1*Y2 + 1*Y3 + 1*Y4 + 1*Y5 + 1*Y6
               rc =~ 0*Y1 + 1*Y2 + 2*Y3 + 3*Y4 + 4*Y5 + 5*Y6
               Y1 ~~ c*Y1
               Y2 ~~ c*Y2
               Y3 ~~ c*Y3
               Y4 ~~ c*Y4
               Y5 ~~ c*Y5
               Y6 ~~ c*Y6
               ri ~~ rc
               '

We can make a path diagram like this:

semPaths(semPlotModel_lavaanModel(lavaan_lgm))

To display Greek letters and subscripts, we can use the following syntax:

FIT2 <-
  semPlotModel_lavaanModel(
    lavaan_lgm,
    auto.var = TRUE,
    auto.fix.first = TRUE,
    auto.cov.lv.x = TRUE
  )
semPaths(
  FIT2,
  what = "paths",
  whatLabels = "par",
  nodeLabels = c(
    expression(paste(Y[1])),
    expression(paste(Y[2])),
    expression(paste(Y[3])),
    expression(paste(Y[4])),
    expression(paste(Y[5])),
    expression(paste(Y[6])),
    expression(paste(eta[1])),
    expression(paste(eta[2]))
  ),
  edge.label.cex = 0.6,
  edgeLabels = c(
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(0)),
    expression(paste(1)),
    expression(paste(2)),
    expression(paste(3)),
    expression(paste(4)),
    expression(paste(5)),
    expression(paste(epsilon[1])),
    expression(paste(epsilon[2])),
    expression(paste(epsilon[3])),
    expression(paste(epsilon[4])),
    expression(paste(epsilon[5])),
    expression(paste(epsilon[6])),
    expression(paste("Covariance")),
    expression(paste(psi[1])),
    expression(paste(psi[2]))
  )
)

Following the above model specification, we fit a latent growth curve model using the growth function in lavaan. By default, growth estimates the latent means, as required, rather than the intercepts for the observed variables:

# lavaan
lav_lgm_fit <- growth(lavaan_lgm, data = dat)
summary(lav_lgm_fit)
lavaan 0.6.15 ended normally after 26 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        11
  Number of equality constraints                     5

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                                30.656
  Degrees of freedom                                21
  P-value (Chi-square)                           0.080

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  ri =~                                               
    Y1                1.000                           
    Y2                1.000                           
    Y3                1.000                           
    Y4                1.000                           
    Y5                1.000                           
    Y6                1.000                           
  rc =~                                               
    Y1                0.000                           
    Y2                1.000                           
    Y3                2.000                           
    Y4                3.000                           
    Y5                4.000                           
    Y6                5.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  ri ~~                                               
    rc                0.484    0.051    9.506    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y1                0.000                           
   .Y2                0.000                           
   .Y3                0.000                           
   .Y4                0.000                           
   .Y5                0.000                           
   .Y6                0.000                           
    ri                2.038    0.048   42.629    0.000
    rc                1.058    0.044   24.113    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y1         (c)    0.300    0.010   31.623    0.000
   .Y2         (c)    0.300    0.010   31.623    0.000
   .Y3         (c)    0.300    0.010   31.623    0.000
   .Y4         (c)    0.300    0.010   31.623    0.000
   .Y5         (c)    0.300    0.010   31.623    0.000
   .Y6         (c)    0.300    0.010   31.623    0.000
    ri                0.986    0.072   13.602    0.000
    rc                0.945    0.061   15.529    0.000

In this model, we are mainly interested in the means and variances of the random intercept and slope. We can use the bgrowth function in the blavaan package to fit the same model:

# blavaan
blav_lgm_fit <- bgrowth(lavaan_lgm, data = dat)
Computing posterior predictives...
summary(blav_lgm_fit)
blavaan (0.3-15) results of 1000 samples after 500 adapt/burnin iterations

  Number of observations                           500

  Number of missing patterns                         1

  Statistic                                 MargLogLik         PPP
  Value                                      -4216.911       0.116

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  ri =~                                                                        
    Y1                1.000                                  NA                
    Y2                1.000                                  NA                
    Y3                1.000                                  NA                
    Y4                1.000                                  NA                
    Y5                1.000                                  NA                
    Y6                1.000                                  NA                
  rc =~                                                                        
    Y1                0.000                                  NA                
    Y2                1.000                                  NA                
    Y3                2.000                                  NA                
    Y4                3.000                                  NA                
    Y5                4.000                                  NA                
    Y6                5.000                                  NA                

Covariances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  ri ~~                                                                        
    rc                0.485    0.051    0.393     0.59    1.000     lkj_corr(1)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .Y1                0.000                                  NA                
   .Y2                0.000                                  NA                
   .Y3                0.000                                  NA                
   .Y4                0.000                                  NA                
   .Y5                0.000                                  NA                
   .Y6                0.000                                  NA                
    ri                2.038    0.048    1.946     2.13    1.000    normal(0,10)
    rc                1.058    0.044     0.97    1.146    0.999    normal(0,10)

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .Y1         (c)    0.301    0.010    0.283     0.32    0.999 gamma(1,.5)[sd]
   .Y2         (c)    0.301    0.010    0.283     0.32    0.999                
   .Y3         (c)    0.301    0.010    0.283     0.32    0.999                
   .Y4         (c)    0.301    0.010    0.283     0.32    0.999                
   .Y5         (c)    0.301    0.010    0.283     0.32    0.999                
   .Y6         (c)    0.301    0.010    0.283     0.32    0.999                
    ri                0.992    0.072    0.857    1.143    1.001 gamma(1,.5)[sd]
    rc                0.952    0.061    0.839    1.079    1.001 gamma(1,.5)[sd]

For latent growth curve models, we can also view the responses at the different time points as a univariate outcome (long format), instead of being multivariate (wide format). In such a multilevel perspective, responses at the different time-points (level 1) are viewed as nested within each person (level 2). The growth curve model can then be specified as a two-level linear mixed model (or hierarchical linear model, HLM) and estimated using the lmer function in the lme4 package. To do this, we first reshape our data from wide format to long format:

# wide to long
dat_long <-
  dat %>% add_column(id = 1:500) %>% gather(key = "time", value = "y",-id) %>% mutate(time = dplyr::recode(
    time,
    Y1 = 0,
    Y2 = 1,
    Y3 = 2,
    Y4 = 3,
    Y5 = 4,
    Y6 = 5
  ))
dat_long %>% head %>% kable()
id time y
1 0 2.9577313
2 0 3.2700705
3 0 2.6845096
4 0 1.4890722
5 0 0.5877364
6 0 2.6755304

For the random part of the model, (1 + time)|id is used to indicate that the model includes random intercept and random slope of time for clusters (persons in the current example) whose identifiers is in the variable id. For a detailed tutorial on Bayesian multilevel regression and multilevel regression in general, see https://mc-stan.org/users/documentation/case-studies/tutorial_rstanarm.html.

# mlm framework
lav_lgm_fit_hlm <- lme4::lmer(y ~ 1 + time + (1 + time|id), data = dat_long, REML = FALSE)
summary(lav_lgm_fit_hlm)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y ~ 1 + time + (1 + time | id)
   Data: dat_long

     AIC      BIC   logLik deviance df.resid 
  8398.1   8434.1  -4193.0   8386.1     2994 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.3512 -0.5844  0.0021  0.5514  2.9637 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 0.9856   0.9928       
          time        0.9452   0.9722   0.50
 Residual             0.3005   0.5481       
Number of obs: 3000, groups:  id, 500

Fixed effects:
            Estimate Std. Error t value
(Intercept)  2.03815    0.04781   42.63
time         1.05787    0.04387   24.11

Correlation of Fixed Effects:
     (Intr)
time 0.420 

We present the frequentist SEM estimates from lavaan, the frequentist HLM estimates from lme4 and Bayesian estimates from blavaan. As for CFA, the Bayesian and frequentist estimates from the different functions are similar:

hlm_beta <- lav_lgm_fit_hlm@beta
hlm_var_RE <-
  lme4::VarCorr(lav_lgm_fit_hlm) %>% as.data.frame(., comp = c("Variance"), order = "lower.tri")
hlm_estimates <-
  c(rep(NA, 12),
    rep(hlm_var_RE[4, 4], 6),
    hlm_var_RE[c(2, 1, 3), 4],
    rep(NA, 6),
    hlm_beta)
bind_cols(
  parameterEstimates(lav_lgm_fit)[, 1:5],
  hlm_estimates,
  parameterEstimates(blav_lgm_fit)[, 5]
) %>% rename(ML_SEM = est,
             ML_HLM = ...6,
             Bayes = ...7) %>% knitr::kable()
New names:
• `` -> `...6`
• `` -> `...7`
lhs op rhs label ML_SEM ML_HLM Bayes
ri =~ Y1 1.0000000 NA 1.0000000
ri =~ Y2 1.0000000 NA 1.0000000
ri =~ Y3 1.0000000 NA 1.0000000
ri =~ Y4 1.0000000 NA 1.0000000
ri =~ Y5 1.0000000 NA 1.0000000
ri =~ Y6 1.0000000 NA 1.0000000
rc =~ Y1 0.0000000 NA 0.0000000
rc =~ Y2 1.0000000 NA 1.0000000
rc =~ Y3 2.0000000 NA 2.0000000
rc =~ Y4 3.0000000 NA 3.0000000
rc =~ Y5 4.0000000 NA 4.0000000
rc =~ Y6 5.0000000 NA 5.0000000
Y1 ~~ Y1 c 0.3004538 0.3004539 0.3008866
Y2 ~~ Y2 c 0.3004538 0.3004539 0.3008866
Y3 ~~ Y3 c 0.3004538 0.3004539 0.3008866
Y4 ~~ Y4 c 0.3004538 0.3004539 0.3008866
Y5 ~~ Y5 c 0.3004538 0.3004539 0.3008866
Y6 ~~ Y6 c 0.3004538 0.3004539 0.3008866
ri ~~ rc 0.4838138 0.4838131 0.4853817
ri ~~ ri 0.9855904 0.9855833 0.9921259
rc ~~ rc 0.9451540 0.9451574 0.9524373
Y1 ~1 0.0000000 NA 0.0000000
Y2 ~1 0.0000000 NA 0.0000000
Y3 ~1 0.0000000 NA 0.0000000
Y4 ~1 0.0000000 NA 0.0000000
Y5 ~1 0.0000000 NA 0.0000000
Y6 ~1 0.0000000 NA 0.0000000
ri ~1 2.0381506 2.0381506 2.0380207
rc ~1 1.0578727 1.0578727 1.0576254

Structural Equations Modeling (SEM)

SEM is useful to test hypothesized covariance structures among variables based on substantive theories by modeling the relations among multiple latent (unobserved) variables or constructs that are measured by manifest (observed) variables or indicators and evaluating the corresponding model fit. It can be thought of as a combination of regression analysis and factor analysis. SEM contains two parts, a so-called measurement model describes relations between observed indicators and latent variables, and a so-called structural model specifies a set of simultaneous linear relations among latent variables and possibly observed variables. SEM involves two main types of relations: 1) correlational relations between latent variables are represented by two-headed arrows; 2) cause-and-effect type relations are represented by single-headed arrows in the path diagram. For example, a cause-and-effect type SEM model can be written by starting with the measurement model for the exogenous latent variables:

\[ \left[\begin{array}{l} x_{1 j} \\ x_{2 j} \\ x_{3 j} \\ z_{4 j} \\ z_{5 j} \\ z_{6 j} \end{array}\right]=\left[\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \\ \beta_{5} \\ \beta_{6} \end{array}\right]+\left[\begin{array}{cc} 1 & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & 1 \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \end{array}\right]\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]+\left[\begin{array}{c} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right] \]

where

  • \(x_{i j}\) represents an indicator of \(\eta_{1 j}\) and \(z_{i j}\) represents an indicator of \(\eta_{2 j}\).

  • \(\eta_{1 j}\) and \(\eta_{2 j}\) represent exogenous latent variables in the model with zero means and unstructured covariance matrix.

  • \(\epsilon_{i j}\) represents errors of measurement for \(x_{i j}\) or \(z_{i j}\).

  • \(\beta_i\) represents an intercept for \(x_i\) or \(z_{ij}\).

  • \(\lambda_{i j}\) represents a factor loading for the measurement model, which represents the magnitude of the expected change in the observed variables for a one-unit change in the latent variable.

We assume that the errors of measurement \(\epsilon_{i j}\) have an expected value of zero, that they are uncorrelated with all latent variables and uncorrelated with each other for all pairs of items.

Next, we have a measurement model for a latent endogenous variable: \[ \left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \end{array}\right]=\left[\begin{array}{c} \gamma_{1} \\ \gamma_{2} \\ \gamma_{3} \end{array}\right]+\left[\begin{array}{cc} 1 \\ \lambda_{8} \\ \lambda_{9} \end{array}\right]\left[\begin{array}{l} \xi_{1 j} \end{array}\right]+\left[\begin{array}{c} \delta_{1 j} \\ \delta_{2 j} \\ \delta_{3 j} \end{array}\right] \]

where

  • \(y_{i j}\) represents an indicator of \(\xi_{1 j}\).

  • \(\xi_{1 j}\) represents a latent endogenous variable with zero mean.

  • \(\delta_{i j}\) represents measurement error for \(y_{ij}\).

  • \(\gamma_i\) represents an intercept for \(y_i\).

  • \(\lambda_{i}\) represents a factor loading for the measurement model.

We assume that the measurement errors have zero means and are uncorrelated with the exogenous and endogenous latent variables. We also assume that \(\delta_{i j}\) is homoscedastic and non-autocorrelated.

Finally, the structural relations among the latent variables are specified as: \[ \left[\begin{array}{l} \xi_{1 j} \end{array}\right]=\left[\begin{array}{cc} \gamma_{11} & \gamma_{12} \end{array}\right] \left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right] + \zeta_{1j} \] which represents a linear relation. We simulate data from this model as follows:

N <- 500
lat_cov <- matrix(c(1, 0, 
                    0, 0.8), nrow = 2)  
# loading matrix
Lambda <- bdiag(c(1, 1.5, 2), c(1, 1.5, 2), c(1, 1.5, 2))


# error covariance
Theta <- diag(0.3, nrow = 9)

# exogenous latent variables
eta <- mvrnorm(N, mu = c(0, 0), Sigma = lat_cov)

# disturbance for latent outcome
zeta <- rnorm(N)

# structural model
xi <- tcrossprod(eta, t(c(1, 2))) + zeta

# error term
epsilon <- mvrnorm(N, mu = rep(0, ncol(Theta)), Sigma = Theta)
cbind(eta, xi) %>% dim
[1] 500   3
dat <- tcrossprod(cbind(eta, xi), Lambda) + epsilon
dat  <-  dat %>% as.matrix() %>% as.data.frame() %>% setNames(c("X1", "X2", "X3", "Z1", "Z2", "Z3", "Y1", "Y2", "Y3"))

To fit the SEM model, the measurement part is specified as before with =~ connecting the hypothesized latent variables and the corresponding observed indicators. What is new here is the “structural relations” among latent variables, indicated by ~, although here “structural relation” is loosely defined and does not necessarily mean causal.

lavaan_sem <- 'eta1 =~ X1 + X2 + X3
               eta2 =~ Z1 + Z2 + Z3
               xi =~ Y1 + Y2 + Y3
               xi ~ eta1 + eta2'


# lavaan
lav_sem_fit <- cfa(lavaan_sem, data = dat, meanstructure = TRUE)
summary(lav_sem_fit)
lavaan 0.6.15 ended normally after 64 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        30

  Number of observations                           500

Model Test User Model:
                                                      
  Test statistic                                22.985
  Degrees of freedom                                24
  P-value (Chi-square)                           0.521

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  eta1 =~                                             
    X1                1.000                           
    X2                1.547    0.045   34.389    0.000
    X3                2.087    0.059   35.315    0.000
  eta2 =~                                             
    Z1                1.000                           
    Z2                1.511    0.051   29.545    0.000
    Z3                2.037    0.064   31.647    0.000
  xi =~                                               
    Y1                1.000                           
    Y2                1.539    0.022   70.267    0.000
    Y3                2.010    0.026   76.904    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  xi ~                                                
    eta1              1.023    0.056   18.201    0.000
    eta2              2.010    0.081   24.871    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  eta1 ~~                                             
    eta2             -0.024    0.040   -0.591    0.554

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .X1               -0.080    0.051   -1.587    0.113
   .X2               -0.084    0.072   -1.165    0.244
   .X3               -0.100    0.097   -1.034    0.301
   .Z1                0.029    0.045    0.650    0.515
   .Z2                0.001    0.063    0.018    0.986
   .Z3                0.057    0.082    0.692    0.489
   .Y1               -0.015    0.102   -0.143    0.886
   .Y2               -0.023    0.155   -0.151    0.880
   .Y3               -0.025    0.200   -0.123    0.902
    eta1              0.000                           
    eta2              0.000                           
   .xi                0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .X1                0.287    0.021   13.626    0.000
   .X2                0.251    0.029    8.763    0.000
   .X3                0.338    0.048    7.017    0.000
   .Z1                0.268    0.019   13.736    0.000
   .Z2                0.292    0.026   11.063    0.000
   .Z3                0.280    0.037    7.485    0.000
   .Y1                0.323    0.024   13.366    0.000
   .Y2                0.386    0.038   10.137    0.000
   .Y3                0.312    0.053    5.892    0.000
    eta1              0.992    0.080   12.471    0.000
    eta2              0.739    0.062   11.894    0.000
   .xi                0.946    0.078   12.056    0.000
# blavaan with mcmcfile=T indicating having the blavaan-generated Stan code saved.
blav_sem_fit <- bcfa(lavaan_sem, data = dat, mcmcfile = T)
Computing posterior predictives...
summary(blav_sem_fit)
blavaan (0.3-15) results of 1000 samples after 500 adapt/burnin iterations

  Number of observations                           500

  Number of missing patterns                         1

  Statistic                                 MargLogLik         PPP
  Value                                      -6227.716       0.541

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  eta1 =~                                                                      
    X1                1.000                                  NA                
    X2                1.552    0.046    1.467    1.645    1.000    normal(0,10)
    X3                2.094    0.060    1.979    2.215    1.000    normal(0,10)
  eta2 =~                                                                      
    Z1                1.000                                  NA                
    Z2                1.518    0.053    1.417    1.625    1.000    normal(0,10)
    Z3                2.047    0.067    1.915     2.18    1.001    normal(0,10)
  xi =~                                                                        
    Y1                1.000                                  NA                
    Y2                1.540    0.023    1.496    1.585    1.001    normal(0,10)
    Y3                2.010    0.026    1.959    2.064    1.000    normal(0,10)

Regressions:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  xi ~                                                                         
    eta1              1.025    0.058    0.916    1.148    1.001    normal(0,10)
    eta2              2.019    0.081    1.859    2.175    1.001    normal(0,10)

Covariances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  eta1 ~~                                                                      
    eta2             -0.023    0.041   -0.104    0.056    0.999       beta(1,1)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .X1               -0.079    0.051   -0.178    0.025    1.000    normal(0,32)
   .X2               -0.081    0.074   -0.225    0.064    1.000    normal(0,32)
   .X3               -0.096    0.099   -0.285    0.099    1.001    normal(0,32)
   .Z1                0.028    0.046   -0.062    0.118    1.000    normal(0,32)
   .Z2                0.000    0.063   -0.121    0.125    1.000    normal(0,32)
   .Z3                0.055    0.082   -0.106    0.214    1.001    normal(0,32)
   .Y1               -0.013    0.104   -0.216    0.198    1.000    normal(0,32)
   .Y2               -0.022    0.158   -0.326     0.29    1.001    normal(0,32)
   .Y3               -0.025    0.204   -0.416    0.382    1.001    normal(0,32)
    eta1              0.000                                  NA                
    eta2              0.000                                  NA                
   .xi                0.000                                  NA                

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .X1                0.291    0.022    0.251    0.336    1.000 gamma(1,.5)[sd]
   .X2                0.254    0.028    0.202     0.31    1.001 gamma(1,.5)[sd]
   .X3                0.340    0.048    0.253     0.44    1.000 gamma(1,.5)[sd]
   .Z1                0.271    0.020    0.233    0.314    1.000 gamma(1,.5)[sd]
   .Z2                0.295    0.027    0.246    0.351    0.999 gamma(1,.5)[sd]
   .Z3                0.283    0.037    0.214    0.362    0.999 gamma(1,.5)[sd]
   .Y1                0.328    0.025    0.281    0.382    0.999 gamma(1,.5)[sd]
   .Y2                0.392    0.038    0.321     0.47    0.999 gamma(1,.5)[sd]
   .Y3                0.314    0.052    0.214    0.415    1.000 gamma(1,.5)[sd]
    eta1              0.998    0.084    0.844    1.177    1.000 gamma(1,.5)[sd]
    eta2              0.741    0.063    0.626     0.87    1.000 gamma(1,.5)[sd]
   .xi                0.960    0.079    0.812    1.123    0.999 gamma(1,.5)[sd]

The output now has a new section describing the path coefficients among latent variables, which can be interpreted as in linear regression.

FIT3 <-
  semPlotModel_lavaanModel(lavaan_sem, auto.var = TRUE, auto.fix.first = TRUE)
semPaths(
  FIT3,
  what = "paths",
  whatLabels = "par",
  nodeLabels = c(
    expression(paste(x[1])),
    expression(paste(x[2])),
    expression(paste(x[3])),
    expression(paste(z[1])),
    expression(paste(z[2])),
    expression(paste(z[3])),
    expression(paste(y[1])),
    expression(paste(y[2])),
    expression(paste(y[3])),
    expression(paste(eta[1])),
    expression(paste(eta[2])),
    expression(paste(xi))
  ),
  edge.label.cex = 0.6,
  edgeLabels = c(
    expression(paste(1)),
    expression(paste(lambda[21])),
    expression(paste(lambda[31])),
    expression(paste(1)),
    expression(paste(lambda[52])),
    expression(paste(lambda[62])),
    expression(paste(1)),
    expression(paste(lambda[7])),
    expression(paste(lambda[8])),
    expression(paste(gamma[1])),
    expression(paste(gamma[2])),
    expression(paste(epsilon[1])),
    expression(paste(epsilon[2])),
    expression(paste(epsilon[3])),
    expression(paste(epsilon[4])),
    expression(paste(epsilon[5])),
    expression(paste(epsilon[6])),
    expression(paste(delta[1])),
    expression(paste(delta[2])),
    expression(paste(delta[3])),
    expression(paste(psi[1])),
    expression(paste(psi[2])),
    expression(paste(psi[3]))
  )
)

We again compare the estimates from lavaan and blavaan:

bind_cols(parameterEstimates(lav_sem_fit)[, 1:4],
          parameterEstimates(blav_sem_fit)[, 4]) %>% rename(ML_SEM = est, Bayes = ...5) %>% knitr::kable()
New names:
• `` -> `...5`
lhs op rhs ML_SEM Bayes
eta1 =~ X1 1.0000000 1.0000000
eta1 =~ X2 1.5468897 1.5518487
eta1 =~ X3 2.0867574 2.0940599
eta2 =~ Z1 1.0000000 1.0000000
eta2 =~ Z2 1.5111137 1.5179009
eta2 =~ Z3 2.0369639 2.0465995
xi =~ Y1 1.0000000 1.0000000
xi =~ Y2 1.5394007 1.5395212
xi =~ Y3 2.0096801 2.0102847
xi ~ eta1 1.0229308 1.0254598
xi ~ eta2 2.0102597 2.0188056
X1 ~~ X1 0.2870573 0.2908129
X2 ~~ X2 0.2508078 0.2538065
X3 ~~ X3 0.3376840 0.3404139
Z1 ~~ Z1 0.2678232 0.2712257
Z2 ~~ Z2 0.2922366 0.2952527
Z3 ~~ Z3 0.2798114 0.2833929
Y1 ~~ Y1 0.3233531 0.3278058
Y2 ~~ Y2 0.3862745 0.3917939
Y3 ~~ Y3 0.3124156 0.3137321
eta1 ~~ eta1 0.9924733 0.9980795
eta2 ~~ eta2 0.7392622 0.7408280
xi ~~ xi 0.9457461 0.9603592
eta1 ~~ eta2 -0.0236847 -0.0234200
X1 ~1 -0.0802587 -0.0786916
X2 ~1 -0.0844009 -0.0808092
X3 ~1 -0.0998296 -0.0958554
Z1 ~1 0.0291862 0.0283963
Z2 ~1 0.0011052 0.0000806
Z3 ~1 0.0565958 0.0549565
Y1 ~1 -0.0145914 -0.0132324
Y2 ~1 -0.0233656 -0.0220552
Y3 ~1 -0.0246824 -0.0246455
eta1 ~1 0.0000000 0.0000000
eta2 ~1 0.0000000 0.0000000
xi ~1 0.0000000 0.0000000

Bayesian Model Evaluation

When using SEMs to model multivariate linear relations in education, sociology, and psychology, evaluating model fit is usually considered to be a vital step before drawing sensible conclusions from the results.

Statistical fit indices provide a way to quantify how well our model fits the data, and there are many choices available in blavaan ranging from those specifically proposed in the SEM context to some more general, information-theoretic indices.

In frequentist SEM, many common fit indices are available in lavaan:

fitmeasures(lav_cfa_fit)
                 npar                  fmin                 chisq 
               19.000                 0.001                 1.553 
                   df                pvalue        baseline.chisq 
                8.000                 0.992              6005.574 
          baseline.df       baseline.pvalue                   cfi 
               15.000                 0.000                 1.000 
                  tli                  nnfi                   rfi 
                1.002                 1.002                 1.000 
                  nfi                  pnfi                   ifi 
                1.000                 0.533                 1.001 
                  rni                  logl     unrestricted.logl 
                1.001             -7909.932             -7909.155 
                  aic                   bic                ntotal 
            15857.863             15951.111              1000.000 
                 bic2                 rmsea        rmsea.ci.lower 
            15890.766                 0.000                 0.000 
       rmsea.ci.upper        rmsea.ci.level          rmsea.pvalue 
                0.000                 0.900                 1.000 
       rmsea.close.h0 rmsea.notclose.pvalue     rmsea.notclose.h0 
                0.050                 0.000                 0.080 
                  rmr            rmr_nomean                  srmr 
                0.005                 0.005                 0.002 
         srmr_bentler   srmr_bentler_nomean                  crmr 
                0.002                 0.002                 0.002 
          crmr_nomean            srmr_mplus     srmr_mplus_nomean 
                0.003                 0.002                 0.002 
                cn_05                 cn_01                   gfi 
             9989.555             12941.502                 0.999 
                 agfi                  pgfi                   mfi 
                0.998                 0.296                 1.003 
                 ecvi 
                0.040 

In Bayesian latent variable modeling, there are generally two versions of fit indices, depending on which version of the likelihood function the software uses. blavaan utilizes the marginal likelihood (marginal over the latent variables), so fit indices such as DIC and WAIC reflect the model’s predictive validity for a future person’s response while the conditional likelihood-based fit indices concern future responses from the current persons in our sample (See Merkle, Furr, and Rabe-Hesketh (2019) for more detailed discussion). In almost all applications of SEM, the goal is to make inferences for an underlying population of individuals, so the conditional likelihood-based indices are not appropriate. blavaan uses the marginal version of the likelihood function by default, and we illustrate the difference in implementation below using the raw Stan syntax.

To assess how well a model fits the data with blavaan, we can check the fit indices in a similar fashion - taking the fitted CFA model as an example:

# NULL model needed for relative indices such as CFI, TLI, and NFI
cfa_null <-
  c(paste0("Y", 1:6, " ~~ Y", 1:6), paste0("Y", 1:6, " ~ 1"))
blav_cfa_null_fit <- bcfa(cfa_null, data = dat_cfa, mcmcfile = T)
Computing posterior predictives...
## The default method mimics fit indices derived from ML estimation
fitinx <-
  blavFitIndices(blav_cfa_fit, baseline.model = blav_cfa_null_fit)
fitinx
Posterior mean (EAP) of devm-based fit indices:

      BRMSEA    BGammaHat adjBGammaHat          BMc         BCFI         BTLI 
       0.003        1.000        0.999        1.000        1.000        1.002 
        BNFI 
       1.000 
summary(fitinx)

Posterior summary statistics and highest posterior density (HPD) 90% credible intervals for devm-based fit indices:

               EAP Median   MAP    SD lower upper
BRMSEA       0.003  0.000 0.000 0.009 0.000 0.018
BGammaHat    1.000  1.000 1.000 0.001 0.999 1.000
adjBGammaHat 0.999  1.000 1.000 0.003 0.997 1.000
BMc          1.000  1.000 1.000 0.001 0.999 1.000
BCFI         1.000  1.000 1.000 0.000 1.000 1.000
BTLI         1.002  1.002 1.003 0.002 0.999 1.005
BNFI         1.000  1.000 1.000 0.001 0.998 1.001

Notice the fit indices available in both lavaan and blavaan tend to be similar, just as the parameter estimates were similar in the above examples, because the posterior means are simply used in place of maximum likelihood estimates of the same parameters, yielding similar indices.

BRMESA was proposed by Hoofs et al. (2018), and Garnier-Villarreal and Jorgensen (2020) proposed Bayesian counterparts of many widely-used indices such as CFI, NFI, TLI which is available when specify fit.measures = "all".

Information-based fit indices based on marginal likelihood are also available:

fitmeasures(blav_cfa_fit)
      npar       logl        ppp        bic        dic      p_dic       waic 
    19.000  -7909.981      0.744  15951.190  15858.125     19.081  15858.403 
    p_waic    se_waic      looic      p_loo     se_loo margloglik 
    19.272    111.599  15858.456     19.299    111.600  -8007.416 

Data Example

In this section, we analyze an example dataset using blavaan. The dataset is publicly available through the lavaan.survey package It contains Belgian school children’s responses to math efficacy and math self-concept questions, as well as measures of their math ability from the Programme for International Student Assessment (PISA) study conducted in 2003. Ferla, Valcke, and Cai (2009) investigated the association between several math-related constructs and math achievement. In this example, we will construct a simplified model and investigate the relations among (negative) math efficacy, (negative) math concept and math achievement.

  • math achievement is ‘measured’ by four plausible values (PVs). PVs are random draws from the empirical Bayes posterior distribution of each individual’s math achievement, given their item responses on a math test and background variables. The posterior distribution is based on an item response model with a latent regression.

  • (negative) math efficacy is measured by eight self-reported items, e.g., “I have always believed that Mathematics is one of my best subjects” 1 (strongly agree) - 4 (strongly disagree).

  • (negative) math self-concept is measured by four self-reported items on perceived math ability for a given task, e.g., “Feel confident doing task:”1 (very) - 4 (not at all).

library(lavaan.survey)
data(pisa.be.2003)
# Simplified version of Ferla et al. (2009) model.
model.pisa <- "
math =~ PV1MATH1 + PV1MATH2 + PV1MATH3 + PV1MATH4
neg.efficacy =~ ST31Q01 + ST31Q02 + ST31Q03 + ST31Q04 +
ST31Q05 + ST31Q06 + ST31Q07 + ST31Q08
neg.selfconcept =~ ST32Q04 + ST32Q06 + ST32Q07 + ST32Q09
math ~ neg.selfconcept + neg.efficacy
"

semPaths(semPlotModel_lavaanModel(model.pisa))

# Fit the model using lavaan
fit <- sem(model.pisa, data = pisa.be.2003)
summary(fit)
lavaan 0.6.15 ended normally after 87 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        35

                                                  Used       Total
  Number of observations                          7890        8796

Model Test User Model:
                                                      
  Test statistic                              5581.954
  Degrees of freedom                               101
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                     Estimate  Std.Err  z-value  P(>|z|)
  math =~                                               
    PV1MATH1            1.000                           
    PV1MATH2            1.110    0.005  204.520    0.000
    PV1MATH3            0.994    0.006  174.365    0.000
    PV1MATH4            0.964    0.006  170.700    0.000
  neg.efficacy =~                                       
    ST31Q01             1.000                           
    ST31Q02             1.238    0.032   38.580    0.000
    ST31Q03             1.425    0.035   40.178    0.000
    ST31Q04             1.174    0.032   37.176    0.000
    ST31Q05             1.379    0.035   39.729    0.000
    ST31Q06             1.390    0.035   39.397    0.000
    ST31Q07             1.396    0.036   38.635    0.000
    ST31Q08             1.058    0.031   33.716    0.000
  neg.selfconcept =~                                    
    ST32Q04             1.000                           
    ST32Q06             1.217    0.018   68.174    0.000
    ST32Q07             1.248    0.019   64.541    0.000
    ST32Q09             1.024    0.017   61.616    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  math ~                                              
    neg.selfconcpt    0.001    0.004    0.140    0.889
    neg.efficacy     -0.238    0.008  -30.632    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  neg.efficacy ~~                                     
    neg.selfconcpt    0.112    0.004   25.691    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .PV1MATH1          0.006    0.000   57.948    0.000
   .PV1MATH2          0.001    0.000   16.184    0.000
   .PV1MATH3          0.003    0.000   51.940    0.000
   .PV1MATH4          0.004    0.000   53.499    0.000
   .ST31Q01           0.462    0.008   58.312    0.000
   .ST31Q02           0.427    0.008   55.328    0.000
   .ST31Q03           0.442    0.008   53.216    0.000
   .ST31Q04           0.466    0.008   56.656    0.000
   .ST31Q05           0.445    0.008   53.895    0.000
   .ST31Q06           0.476    0.009   54.350    0.000
   .ST31Q07           0.539    0.010   55.268    0.000
   .ST31Q08           0.578    0.010   58.796    0.000
   .ST32Q04           0.341    0.006   52.790    0.000
   .ST32Q06           0.200    0.005   37.223    0.000
   .ST32Q07           0.335    0.007   46.470    0.000
   .ST32Q09           0.291    0.006   50.354    0.000
   .math              0.028    0.001   51.228    0.000
    neg.efficacy      0.172    0.008   22.910    0.000
    neg.selfconcpt    0.362    0.010   34.841    0.000
fitmeasures(fit)
                 npar                  fmin                 chisq 
               35.000                 0.354              5581.954 
                   df                pvalue        baseline.chisq 
              101.000                 0.000             89388.597 
          baseline.df       baseline.pvalue                   cfi 
              120.000                 0.000                 0.939 
                  tli                  nnfi                   rfi 
                0.927                 0.927                 0.926 
                  nfi                  pnfi                   ifi 
                0.938                 0.789                 0.939 
                  rni                  logl     unrestricted.logl 
                0.939            -73960.921            -71169.944 
                  aic                   bic                ntotal 
           147991.842            148235.909              7890.000 
                 bic2                 rmsea        rmsea.ci.lower 
           148124.686                 0.083                 0.081 
       rmsea.ci.upper        rmsea.ci.level          rmsea.pvalue 
                0.085                 0.900                 0.000 
       rmsea.close.h0 rmsea.notclose.pvalue     rmsea.notclose.h0 
                0.050                 0.996                 0.080 
                  rmr            rmr_nomean                  srmr 
                0.035                 0.035                 0.056 
         srmr_bentler   srmr_bentler_nomean                  crmr 
                0.056                 0.056                 0.060 
          crmr_nomean            srmr_mplus     srmr_mplus_nomean 
                0.060                 0.056                 0.056 
                cn_05                 cn_01                   gfi 
              178.333               194.606                 0.914 
                 agfi                  pgfi                   mfi 
                0.885                 0.679                 0.707 
                 ecvi 
                0.716 
# Fit the model using blavaan
bfit <- bsem(model.pisa, data = pisa.be.2003)
blavaan NOTE: Posterior predictives with missing data are currently very slow.
    Consider setting test="none".

Computing posterior predictives...
summary(bfit)
blavaan (0.3-15) results of 1000 samples after 500 adapt/burnin iterations

  Number of observations                          8796

  Number of missing patterns                       119

  Statistic                                 MargLogLik         PPP
  Value                                     -79503.055       0.000

Latent Variables:
                     Estimate  Post.SD pi.lower pi.upper     Rhat
  math =~                                                        
    PV1MATH1            1.000                                  NA
    PV1MATH2            1.118    0.005    1.108    1.128    1.000
    PV1MATH3            0.983    0.005    0.973    0.993    1.000
    PV1MATH4            1.003    0.005    0.993    1.014    1.000
  neg.efficacy =~                                                
    ST31Q01             1.000                                  NA
    ST31Q02             1.224    0.030    1.168    1.287    1.000
    ST31Q03             1.394    0.034    1.331    1.461    1.000
    ST31Q04             1.159    0.030    1.104    1.219    1.000
    ST31Q05             1.388    0.035    1.322    1.462    1.000
    ST31Q06             1.366    0.034    1.303    1.435    1.000
    ST31Q07             1.395    0.037    1.327    1.469    1.001
    ST31Q08             1.038    0.029    0.982    1.096    1.000
  neg.selfconcept =~                                             
    ST32Q04             1.000                                  NA
    ST32Q06             1.215    0.018    1.182     1.25    1.002
    ST32Q07             1.248    0.019    1.211    1.288    1.000
    ST32Q09             1.020    0.017    0.987    1.053    1.000
    Prior       
                
                
    normal(0,10)
    normal(0,10)
    normal(0,10)
                
                
    normal(0,10)
    normal(0,10)
    normal(0,10)
    normal(0,10)
    normal(0,10)
    normal(0,10)
    normal(0,10)
                
                
    normal(0,10)
    normal(0,10)
    normal(0,10)

Regressions:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  math ~                                                                       
    neg.selfconcpt    0.005    0.004   -0.004    0.013    0.999    normal(0,10)
    neg.efficacy     -0.251    0.008   -0.267   -0.236    0.999    normal(0,10)

Covariances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  neg.efficacy ~~                                                              
    neg.selfconcpt    0.113    0.004    0.104    0.121    1.000       beta(1,1)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .PV1MATH1          1.068    0.002    1.063    1.072    1.001    normal(0,32)
   .PV1MATH2          1.079    0.002    1.074    1.084    1.001    normal(0,32)
   .PV1MATH3          1.058    0.002    1.054    1.062    1.001    normal(0,32)
   .PV1MATH4          1.068    0.002    1.063    1.072    1.000    normal(0,32)
   .ST31Q01           1.888    0.009     1.87    1.905    1.001    normal(0,32)
   .ST31Q02           1.888    0.009    1.871    1.905    1.000    normal(0,32)
   .ST31Q03           2.145    0.010    2.127    2.164    1.001    normal(0,32)
   .ST31Q04           2.041    0.009    2.023    2.058    1.000    normal(0,32)
   .ST31Q05           1.726    0.010    1.706    1.745    1.000    normal(0,32)
   .ST31Q06           2.169    0.010     2.15    2.188    1.000    normal(0,32)
   .ST31Q07           2.159    0.010    2.139    2.178    1.000    normal(0,32)
   .ST31Q08           2.398    0.009    2.378    2.416    1.001    normal(0,32)
   .ST32Q04           2.362    0.009    2.345     2.38    0.999    normal(0,32)
   .ST32Q06           2.526    0.009    2.508    2.543    0.999    normal(0,32)
   .ST32Q07           2.908    0.010    2.888    2.928    0.999    normal(0,32)
   .ST32Q09           2.900    0.009    2.883    2.917    0.999    normal(0,32)
   .math              0.000                                  NA                
    neg.efficacy      0.000                                  NA                
    neg.selfconcpt    0.000                                  NA                

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .PV1MATH1          0.007    0.000    0.006    0.007    1.000 gamma(1,.5)[sd]
   .PV1MATH2          0.001    0.000        0    0.001    1.001 gamma(1,.5)[sd]
   .PV1MATH3          0.004    0.000    0.004    0.004    1.000 gamma(1,.5)[sd]
   .PV1MATH4          0.004    0.000    0.004    0.005    1.001 gamma(1,.5)[sd]
   .ST31Q01           0.472    0.008    0.457    0.487    0.999 gamma(1,.5)[sd]
   .ST31Q02           0.431    0.008    0.416    0.446    1.000 gamma(1,.5)[sd]
   .ST31Q03           0.445    0.008     0.43    0.461    1.000 gamma(1,.5)[sd]
   .ST31Q04           0.469    0.008    0.454    0.485    1.000 gamma(1,.5)[sd]
   .ST31Q05           0.452    0.009    0.435    0.469    1.000 gamma(1,.5)[sd]
   .ST31Q06           0.479    0.008    0.462    0.496    1.000 gamma(1,.5)[sd]
   .ST31Q07           0.538    0.010    0.518    0.558    0.999 gamma(1,.5)[sd]
   .ST31Q08           0.581    0.010    0.563      0.6    1.000 gamma(1,.5)[sd]
   .ST32Q04           0.346    0.006    0.333    0.359    0.999 gamma(1,.5)[sd]
   .ST32Q06           0.202    0.005    0.191    0.213    1.000 gamma(1,.5)[sd]
   .ST32Q07           0.332    0.007    0.318    0.347    1.000 gamma(1,.5)[sd]
   .ST32Q09           0.294    0.006    0.283    0.305    1.001 gamma(1,.5)[sd]
   .math              0.030    0.001    0.029    0.031    1.000 gamma(1,.5)[sd]
    neg.efficacy      0.179    0.008    0.164    0.193    1.000 gamma(1,.5)[sd]
    neg.selfconcpt    0.360    0.010     0.34    0.381    1.002 gamma(1,.5)[sd]
fitmeasures(bfit)
      npar       logl        ppp        bic        dic      p_dic       waic 
    51.000 -79141.585      0.000 158746.349 158384.787     50.808 158388.655 
    p_waic    se_waic      looic      p_loo     se_loo margloglik 
    54.617    801.011 158388.792     54.685    801.013 -79503.055 

As we can see, the estimates from lavaan and blavaan are similar, and specifically, the latent construct “math achievement” is estimated to be negatively predicted by (negative) self-efficacy (i.e., they are estimated to be positively correlated) accounting for (negative) self-concept.

Ferla, Johan, Martin Valcke, and Yonghong Cai. 2009. “Academic Self-Efficacy and Academic Self-Concept: Reconsidering Structural Relationships.” Learning and Individual Differences 19 (4): 499–505.
Garnier-Villarreal, Mauricio, and Terrence D Jorgensen. 2020. “Adapting Fit Indices for Bayesian Structural Equation Modeling: Comparison to Maximum Likelihood.” Psychological Methods 25 (1): 46.
Hoofs, Huub, Rens van de Schoot, Nicole WH Jansen, and IJmert Kant. 2018. “Evaluating Model Fit in Bayesian Confirmatory Factor Analysis with Large Samples: Simulation Study Introducing the BRMSEA.” Educational and Psychological Measurement 78 (4): 537–68.
Merkle, Edgar C, Ellen Fitzsimmons, James Uanhoro, and Ben Goodrich. 2020. “Efficient Bayesian Structural Equation Modeling in Stan.” Journal of Statistical Software.
Merkle, Edgar C, Daniel Furr, and Sophia Rabe-Hesketh. 2019. “Bayesian Comparison of Latent Variable Models: Conditional Versus Marginal Likelihoods.” Psychometrika 84 (3): 802–29.
Merkle, Edgar C, and Yves Rosseel. 2015. “Blavaan: Bayesian Structural Equation Models via Parameter Expansion.” Journal of Statistical Software.