In this tutorial, we aim to demonstrate how to use blavaan (Merkle and Rosseel 2015) for structural equation models (SEMs) and the corresponding model assessment. The newest version of blavaan is equipped with an efficient approach based on Stan (Merkle et al. 2020).

Confirmatory Factor Analysis (CFA)

As a measurement model and probably one of the most popular special cases of a SEM, CFA is often used to 1) validate a hypothesized factor structure among multiple variables, 2) estimate the correlation between factors, and 3) obtain factor scores. For example, consider a two-factor (\(\eta_{1j}, \eta_{2j}\)) model with each factor measured by six items (\(y_{1j}, \dots, y_{6j}\)) for person \(j\): \[ \underbrace{\left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]}_{\boldsymbol{y}_{j}}=\underbrace{\left[\begin{array}{c} \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \beta_{4} \\ \beta_{5} \\ \beta_{6} \end{array}\right]}_{\boldsymbol{\beta}}+\underbrace{\left[\begin{array}{cc} 1 & 0 \\ \lambda_{21} & 0 \\ \lambda_{31} & 0 \\ 0 & 1 \\ 0 & \lambda_{52} \\ 0 & \lambda_{62} \end{array}\right]}_{\Lambda}\underbrace{\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]}_{\boldsymbol{\eta}_j}+\underbrace{\left[\begin{array}{c} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]}_{\boldsymbol{\epsilon}_j} \]

\[ \boldsymbol{\epsilon}_{j} \sim N_{I}(\mathbf{0}, \mathbf{\Theta}) \]

\[ \boldsymbol{\eta}_{j} \sim N_{K}(\mathbf{0}, \boldsymbol{\Psi}), \] where the number of items or variables is \(I = 6\), the number of factors is \(K = 2\) and \(\mathbf{\Theta}\) is often assumed to be a diagonal matrix.

  • \(Y_{ij}\) is the response of person \(j\) \((j=1,...,J)\) on item \(i\) \((i=1,...,I)\).

  • \(\beta_i\) is the intercept for item \(i\).

  • \(\eta_{jk}\) is the \(k\)th common factor for person \(j\).

  • \(\lambda_{ik}\) is the factor loading of item \(i\) on factor \(k\).

  • \(\epsilon_{ij}\) is the random error term for person \(j\) on item \(i\).

  • \(\boldsymbol{\Psi}\) is the variance-covariance matrix of the common factors \(\boldsymbol{\eta}_{j}\) .

  • \(\mathbf{\Theta}\) is the variance-covariance matrix of the residuals (or unique factors) \(\boldsymbol{\epsilon}_{j}\).

Suppose the errors or residuals \(\epsilon_{ij}\) are independent of each other. Then:

\(\psi_{kk}\) is the variance for the \(k\)th factor, \(\psi_{jk}\) is the covariance between the \(j\)th and \(k\)th factors, \(\theta_{ii}\) is the variance for the \(i\)th residual, and \(\theta_{ii'}=0\) if and only if \(i\neq i'\). Specifically,

\[ \boldsymbol{\Psi}=\mathrm{Cov}\begin{pmatrix} \eta_{1j} \\ \eta_{2j} \end{pmatrix}= \left[\begin{array}{cc} \psi_{1 1}&\psi_{1 2} \\ \psi_{2 1}&\psi_{2 2} \end{array}\right] \]

\[\mathbf{\Theta}=\mathrm{Cov}\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right] \]

To get started, we load the following R packages:

#knitr::opts_chunk$set(eval = FALSE)
knitr::opts_chunk$set(echo = TRUE, comment = "", cache = T)
library(rstan)
library(knitr)
library(blavaan)
library(lavaan)
library(MASS)
library(mvtnorm)
library(tidyverse)
library(semPlot)
library(magrittr)
library(Matrix)
options(mc.cores = parallel::detectCores())

Simulation

To better illustrate the use of blavaan, we simulate data so that we know the data generating parameters. In our simulation, we set \(\beta_i = 0\), \(\psi_{11} = 1\), \(\psi_{12} = \psi_{21} =.5\), \(\psi_{22} = .8\), \(\lambda_{21} = 1.5\), \(\lambda_{31} = 2\), \(\lambda_{52} = 1.5\), \(\lambda_{62} = 2\), \(\theta_{ii} =.3\). We simulate data from the above model for \(J = 1000\) units.

# setup
J <- 1000
I <- 6
K <- 2
psi <- matrix(c(1, 0.5,
                0.5, 0.8), nrow = K)  
beta <- seq(1, 2, by = .2)

# loading matrix
Lambda <- cbind(c(1, 1.5, 2, 0, 0, 0), c(0, 0, 0, 1, 1.5, 2))

# error covariance
Theta <- diag(0.3, nrow = I)

# factor scores
eta <- mvrnorm(J, mu = c(0, 0), Sigma = psi)

# error term
epsilon <- mvrnorm(J, mu = rep(0, ncol(Theta)),Sigma = Theta)

dat <- tcrossprod(eta, Lambda) + epsilon
dat_cfa  <-  dat %>% as.data.frame() %>% setNames(c("Y1", "Y2", "Y3", "Y4", "Y5", "Y6"))

We define the model for lavaan as follows:

lavaan_cfa <- 'eta1 =~ Y1 + Y2 + Y3
               eta2 =~ Y4 + Y5 + Y6'

Two latent variables eta1 and eta2 are specified to be measured by three items each, denoted as eta1 =~ Y1 + Y2 + Y3, and similary for eta2. By not specifying other parts of the model, by default we assume that the error terms for items are uncorrelated with each other while the covariances between latent variables are free to be estimated.

We represent the CFA model in a path diagram and then fit the model by maximum likelihood estimation using the cfa function in the lavaan package. By convention, latent variables \(\eta_1\) and \(\eta_2\) are represented by circles, and observed variables \(Y_{1}\) to \(Y_{6}\) by rectangles. Straight arrows represent linear relations (here with coefficients given by the factor loadings \(\lambda\)), and double-headed arrows represent variances and covariances. We could make the diagram by simply using the function call semPaths(semPlotModel_lavaanModel(lavaan_cfa)). Below is the more complex syntax to display Greek letters, subscripts, etc.

FIT <-
  semPlotModel_lavaanModel(
    lavaan_cfa,
    auto.var = TRUE,
    auto.fix.first = TRUE,
    auto.cov.lv.x = TRUE
  )
semPaths(
  FIT,
  what = "paths",
  whatLabels = "par",
  ,
  nodeLabels = c(
    expression(paste(Y[1])),
    expression(paste(Y[2])),
    expression(paste(Y[3])),
    expression(paste(Y[4])),
    expression(paste(Y[5])),
    expression(paste(Y[6])),
    expression(paste(eta[1])),
    expression(paste(eta[2]))
  ),
  edge.label.cex = 0.8,
  edgeLabels = c(
    expression(paste(lambda[1])),
    expression(paste(lambda[2])),
    expression(paste(lambda[3])),
    expression(paste(lambda[4])),
    expression(paste(lambda[5])),
    expression(paste(lambda[6])),
    expression(paste("Covariance")),
    expression(paste(epsilon[1])),
    expression(paste(epsilon[2])),
    expression(paste(epsilon[3])),
    expression(paste(epsilon[4])),
    expression(paste(epsilon[5])),
    expression(paste(epsilon[6])),
    expression(paste(psi[1])),
    expression(paste(psi[2]))
  )
)

# lavaan
lav_cfa_fit <- cfa(lavaan_cfa, data = dat_cfa, meanstructure = TRUE)
summary(lav_cfa_fit, fit.measures = TRUE)
lavaan 0.6.15 ended normally after 29 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        19

  Number of observations                          1000

Model Test User Model:
                                                      
  Test statistic                                 1.553
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.992

Model Test Baseline Model:

  Test statistic                              6005.574
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000
  Tucker-Lewis Index (TLI)                       1.002

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -7909.932
  Loglikelihood unrestricted model (H1)      -7909.155
                                                      
  Akaike (AIC)                               15857.863
  Bayesian (BIC)                             15951.111
  Sample-size adjusted Bayesian (SABIC)      15890.766

Root Mean Square Error of Approximation:

  RMSEA                                          0.000
  90 Percent confidence interval - lower         0.000
  90 Percent confidence interval - upper         0.000
  P-value H_0: RMSEA <= 0.050                    1.000
  P-value H_0: RMSEA >= 0.080                    0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.002

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  eta1 =~                                             
    Y1                1.000                           
    Y2                1.448    0.032   45.487    0.000
    Y3                1.964    0.042   46.845    0.000
  eta2 =~                                             
    Y4                1.000                           
    Y5                1.513    0.036   41.878    0.000
    Y6                1.969    0.045   43.298    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  eta1 ~~                                             
    eta2              0.523    0.037   14.100    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y1               -0.016    0.037   -0.435    0.664
   .Y2               -0.010    0.049   -0.194    0.846
   .Y3               -0.004    0.066   -0.055    0.956
   .Y4               -0.086    0.033   -2.581    0.010
   .Y5               -0.078    0.047   -1.662    0.097
   .Y6               -0.110    0.059   -1.852    0.064
    eta1              0.000                           
    eta2              0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .Y1                0.328    0.017   19.053    0.000
   .Y2                0.271    0.022   12.417    0.000
   .Y3                0.351    0.037    9.537    0.000
   .Y4                0.296    0.016   18.809    0.000
   .Y5                0.297    0.023   12.952    0.000
   .Y6                0.340    0.035    9.711    0.000
    eta1              1.025    0.059   17.262    0.000
    eta2              0.822    0.049   16.788    0.000

By default, lavaan uses maximum likelihood estimation. Estimates of the loadings \(\lambda_{ik}\) are given under “Latent Variables”, the estimated covariance among the common factors is given under “Covariances”, estimates of intercepts \(\beta_i\) of the measurement models, as well as the means of the common factors (set to 0, not estimated), are given under “Intercepts”, and estimates of the residual variances of the responses (variances \(\theta_{ii}\) of \(\epsilon_{ij}\)) and of the common factors (variances \(\psi_{kk}\) of \(\eta_{jk}\)) are given under “Variances.”

blavaan borrows the syntax from lavaan and is a wrapper for Bayesian estimation using Stan. The Stan object created by blavaan can be seen using lavInspect().

By default, the bcfa function in blavaan uses an anchoring item for each factor for identification, so the factor loading of the first item that loads on a factor is fixed at 1. blavaan also includes intercepts for all observed variables by default, which is different from the default in lavaan (we specified meanstructure = TRUE in the cfa function call above so that the two parameterizations match).

# blavaan
blav_cfa_fit <- bcfa(lavaan_cfa, data=dat_cfa, mcmcfile = T)
Computing posterior predictives...
summary(blav_cfa_fit)
blavaan (0.3-15) results of 1000 samples after 500 adapt/burnin iterations

  Number of observations                          1000

  Number of missing patterns                         1

  Statistic                                 MargLogLik         PPP
  Value                                      -8007.416       0.744

Latent Variables:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  eta1 =~                                                                      
    Y1                1.000                                  NA                
    Y2                1.449    0.031     1.39    1.512    0.999    normal(0,10)
    Y3                1.967    0.042    1.887     2.05    1.000    normal(0,10)
  eta2 =~                                                                      
    Y4                1.000                                  NA                
    Y5                1.516    0.036    1.446    1.587    1.000    normal(0,10)
    Y6                1.973    0.044    1.888    2.062    1.000    normal(0,10)

Covariances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
  eta1 ~~                                                                      
    eta2              0.522    0.037    0.453    0.598    1.000     lkj_corr(1)

Intercepts:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .Y1               -0.015    0.038    -0.09     0.06    1.000    normal(0,32)
   .Y2               -0.009    0.051   -0.108    0.088    1.000    normal(0,32)
   .Y3               -0.002    0.067   -0.137     0.13    1.000    normal(0,32)
   .Y4               -0.086    0.034   -0.154   -0.022    1.000    normal(0,32)
   .Y5               -0.077    0.047    -0.17    0.014    1.001    normal(0,32)
   .Y6               -0.109    0.061    -0.23    0.009    1.001    normal(0,32)
    eta1              0.000                                  NA                
    eta2              0.000                                  NA                

Variances:
                   Estimate  Post.SD pi.lower pi.upper     Rhat    Prior       
   .Y1                0.330    0.017    0.297    0.365    1.000 gamma(1,.5)[sd]
   .Y2                0.272    0.022     0.23    0.318    1.000 gamma(1,.5)[sd]
   .Y3                0.351    0.039    0.279    0.431    1.000 gamma(1,.5)[sd]
   .Y4                0.298    0.016    0.269    0.333    1.000 gamma(1,.5)[sd]
   .Y5                0.299    0.023    0.255    0.345    1.000 gamma(1,.5)[sd]
   .Y6                0.342    0.036    0.272    0.412    1.000 gamma(1,.5)[sd]
    eta1              1.027    0.059    0.919    1.151    1.000 gamma(1,.5)[sd]
    eta2              0.823    0.049    0.729    0.921    0.999 gamma(1,.5)[sd]
fitmeasures(blav_cfa_fit)
      npar       logl        ppp        bic        dic      p_dic       waic 
    19.000  -7909.981      0.744  15951.190  15858.125     19.081  15858.403 
    p_waic    se_waic      looic      p_loo     se_loo margloglik 
    19.272    111.599  15858.456     19.299    111.600  -8007.416 

We can see that the output from blavaan resembles that from lavaan although it gives the posterior means and standard deviations for the model parameters.

We present the maximum likelihood estimates from lavaan and Bayesian estimates from blavaan next to each other for comparison; Since by default blavaan uses non-informative priors and the size of data is large, we see the two sets of estimates are very similar:

bind_cols(parameterEstimates(lav_cfa_fit)[, 1:4], parameterEstimates(blav_cfa_fit)[, 4]) %>% rename(ML = est, Bayes = ...5) %>% knitr::kable()
New names:
• `` -> `...5`
lhs op rhs ML Bayes
eta1 =~ Y1 1.0000000 1.0000000
eta1 =~ Y2 1.4479658 1.4494496
eta1 =~ Y3 1.9641409 1.9666819
eta2 =~ Y4 1.0000000 1.0000000
eta2 =~ Y5 1.5126562 1.5157573
eta2 =~ Y6 1.9690845 1.9727406
Y1 ~~ Y1 0.3276724 0.3296146
Y2 ~~ Y2 0.2709558 0.2724962
Y3 ~~ Y3 0.3507402 0.3512957
Y4 ~~ Y4 0.2962317 0.2984597
Y5 ~~ Y5 0.2972492 0.2991214
Y6 ~~ Y6 0.3398565 0.3415060
eta1 ~~ eta1 1.0248042 1.0265839
eta2 ~~ eta2 0.8222773 0.8226690
eta1 ~~ eta2 0.5225991 0.5223931
Y1 ~1 -0.0159813 -0.0152329
Y2 ~1 -0.0095513 -0.0086645
Y3 ~1 -0.0036208 -0.0020447
Y4 ~1 -0.0863222 -0.0856052
Y5 ~1 -0.0775607 -0.0767313
Y6 ~1 -0.1099804 -0.1088010
eta1 ~1 0.0000000 0.0000000
eta2 ~1 0.0000000 0.0000000

To check the setting for the underlying Stan program (i.e., the number of chains, the number of warm-up interations, and the number of post warm-up draws):

blavInspect(blav_cfa_fit, "mcobj")
Inference for Stan model: stanmarg.
3 chains, each with iter=1500; warmup=500; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

                 mean se_mean   sd     2.5%      25%      50%      75%    97.5%
ly_sign[1]       1.45    0.00 0.03     1.39     1.43     1.45     1.47     1.51
ly_sign[2]       1.97    0.00 0.04     1.89     1.94     1.97     2.00     2.05
ly_sign[3]       1.52    0.00 0.04     1.45     1.49     1.52     1.54     1.59
ly_sign[4]       1.97    0.00 0.04     1.89     1.94     1.97     2.00     2.06
Theta_var[1]     0.33    0.00 0.02     0.30     0.32     0.33     0.34     0.36
Theta_var[2]     0.27    0.00 0.02     0.23     0.26     0.27     0.29     0.32
Theta_var[3]     0.35    0.00 0.04     0.28     0.32     0.35     0.38     0.43
Theta_var[4]     0.30    0.00 0.02     0.27     0.29     0.30     0.31     0.33
Theta_var[5]     0.30    0.00 0.02     0.26     0.28     0.30     0.31     0.35
Theta_var[6]     0.34    0.00 0.04     0.27     0.32     0.34     0.37     0.41
Psi_cov[1]       0.52    0.00 0.04     0.45     0.50     0.52     0.55     0.60
Psi_var[1]       1.03    0.00 0.06     0.92     0.99     1.02     1.06     1.15
Psi_var[2]       0.82    0.00 0.05     0.73     0.79     0.82     0.86     0.92
Nu_free[1]      -0.02    0.00 0.04    -0.09    -0.04    -0.02     0.01     0.06
Nu_free[2]      -0.01    0.00 0.05    -0.11    -0.04    -0.01     0.03     0.09
Nu_free[3]       0.00    0.00 0.07    -0.14    -0.05     0.00     0.04     0.13
Nu_free[4]      -0.09    0.00 0.03    -0.15    -0.11    -0.09    -0.06    -0.02
Nu_free[5]      -0.08    0.00 0.05    -0.17    -0.11    -0.08    -0.04     0.01
Nu_free[6]      -0.11    0.00 0.06    -0.23    -0.15    -0.11    -0.07     0.01
lp__         -7971.62    0.09 3.19 -7978.79 -7973.49 -7971.28 -7969.34 -7966.40
             n_eff Rhat
ly_sign[1]    3120    1
ly_sign[2]    2922    1
ly_sign[3]    3137    1
ly_sign[4]    3137    1
Theta_var[1]  4755    1
Theta_var[2]  3581    1
Theta_var[3]  3796    1
Theta_var[4]  4847    1
Theta_var[5]  4220    1
Theta_var[6]  4103    1
Psi_cov[1]    3418    1
Psi_var[1]    3003    1
Psi_var[2]    3284    1
Nu_free[1]    1591    1
Nu_free[2]    1514    1
Nu_free[3]    1529    1
Nu_free[4]    1769    1
Nu_free[5]    1597    1
Nu_free[6]    1564    1
lp__          1219    1

Samples were drawn using NUTS(diag_e) at Thu Apr 20 18:43:35 2023.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

We see that there were 3 chains with 1500 iterations each, the first 500 of which were warmup, giving a total of 3000 draws. In the Stan output above, se_mean is the Monte Carlo error due to the fact that we use posterior draws to empirically estimate posterior expectations. These Monte Carlo errors are given by \(\frac{sd}{n_{eff}}\), where sd here refers to the posterior standard deviation which can be viewed as a Bayesian counterpart of frequentist standard errors. Because there is an autocorrelation among our Monte Carlo draws within chains, the “effective” sample, n_eff, used to estimate the Monte Carlo errors is smaller than the number of post warmup draws. The effective sample size can be viewed as the number of independent draws that would give the same amount of information in terms of the Monte Carlo errors. We see that the Monte Carlo errors will go to zero as the number of draws goes to infinity. Rhat quantifies how well the different chains mix with each other (overlap) post warmup, with a value closer to 1 less than 1.1 indicating that there is adequate mixing. The purpose of running multiple chains with different starting values for the parameters is to assess whether the chains have reached the stationary distribution, i.e., the required correct posterior distribution, by checking that the chains are converging to the same distribution and are hence mixing with each other.

One benefit of running a Bayesian analysis is that we can incorporate prior information if we have it. blavaan allows users to specify priors for all the model parameters via the argument dp and uses non-informative priors (with flat densities/large variances) if this argument is left blank. When non-informative priors are used for all parameters, the Bayesian estimates will be similar to their maximum likelihood counterparts.

To check the default priors,

(default_prior <- dpriors())
               nu             alpha            lambda              beta 
   "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0,10)" 
            theta               psi               rho             ibpsi 
"gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)" "wishart(3,iden)" 
              tau             delta 
"normal(0,10^.5)" "gamma(1,.5)[sd]" 

To change a prior, we can just substitute the desired one: for example, if we would like to change the prior for beta to be normal(0, 1), then we can form a new prior vector:

(new_prior <- dpriors(beta = "normal(0, 1)"))
               nu             alpha            lambda              beta 
   "normal(0,32)"    "normal(0,10)"    "normal(0,10)"    "normal(0, 1)" 
            theta               psi               rho             ibpsi 
"gamma(1,.5)[sd]" "gamma(1,.5)[sd]"       "beta(1,1)" "wishart(3,iden)" 
              tau             delta 
"normal(0,10^.5)" "gamma(1,.5)[sd]" 
# new cfa with updated prior for beta
# bcfa(lavaan_cfa, data=dat_cfa, dp = new_prior)

For further information on how to change the priors for individual parameters, check https://faculty.missouri.edu/~merklee/blavaan/prior.html.

Latent Growth Curve Models

As a popular method for modeling growth, in this section, we discuss another special case of SEM: latent growth-curve models. This model is useful when modeling change over time within individuals changing over time and variability among individual trajectories. A latent growth curve model for six occasions can be written as:

\[ \left[\begin{array}{l} y_{1 j} \\ y_{2 j} \\ y_{3 j} \\ y_{4 j} \\ y_{5 j} \\ y_{6 j} \end{array}\right]=\left[\begin{array}{ll} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \\ 1 & 4 \\ 1 & 5 \end{array}\right]\left[\begin{array}{l} \eta_{1 j} \\ \eta_{2 j} \end{array}\right]+\left[\begin{array}{l} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{array}\right]=\left[\begin{array}{l} \eta_{1 j}+0 \eta_{2 j}+\epsilon_{1 j} \\ \eta_{1 j}+1 \eta_{2 j}+\epsilon_{2 j} \\ \eta_{1 j}+2 \eta_{2 j}+\epsilon_{3 j} \\ \eta_{1 j}+3 \eta_{2 j}+\epsilon_{4 j} \\ \eta_{1 j}+4 \eta_{2 j}+\epsilon_{5 j} \\ \eta_{1 j}+5 \eta_{2 j}+\epsilon_{6 j} \end{array}\right] \]

\[ \boldsymbol{\epsilon}_{j} \sim N_{p}(\mathbf{0}, \mathbf{\Theta}) \]

Where

  • \(y_{ij}\) is the response at occasion \(i\) for individual \(j\).

  • \(\eta_{1j}\) is an individual-specific intercept and \(\eta_{2j}\) is an individual-specific slope of time. These intercepts and slopes are latent variables that follow a bivariate normal distribution with free means and an unstructured covariance matrix.

  • \(\epsilon_{ij}\) is an occasion-specific error. The errors \(\epsilon_{1j}\) to \(\epsilon_{6j}\)have zero means and a diagonal \(6 \times 6\) covariance matrix:

\[\mathbf{\Theta}=cov\begin{pmatrix} \epsilon_{1 j} \\ \epsilon_{2 j} \\ \epsilon_{3 j} \\ \epsilon_{4 j} \\ \epsilon_{5 j} \\ \epsilon_{6 j} \end{pmatrix}= \left[\begin{array}{cc} \theta_{1 1} &0&0&0&0&0\\ 0&\theta_{2 2}&0&0&0&0 \\ 0&0&\theta_{3 3}&0&0&0 \\ 0&0&0&\theta_{4 4}&0&0 \\ 0&0&0&0&\theta_{5 5}&0 \\ 0&0&0&0&0&\theta_{6 6} \end{array}\right] \] The pre-fixed loadings (0, 1, 2, 3, 4, 5) for the individual-specific slope represent the times associated with the occasions, and this kind of model works only if the time variable takes on the same set of values for all individuals. Here the occasions are additionally assumed to be equally spaced.

We simulated data for 500 individuals based on the model above as follows:

N <- 500
psi <- matrix(c(1, 0.5, 
                    0.5, 1), nrow = 2)  
# loading matrix
Lambda <- matrix(c(1, 1, 1, 1, 1, 1, 0:5), nrow = 6)

# error covariance
Theta <- diag(0.3, nrow = 6)

# exogenous latent variables
eta <- mvrnorm(N, mu = c(2, 1), Sigma = psi)

# error term
epsilon <- mvrnorm(N, mu = rep(0, ncol(Theta)), Sigma = Theta)

dat <- tcrossprod(eta, Lambda) + epsilon
dat  <-  dat %>% as.matrix() %>% as.data.frame() %>% setNames(paste0("Y", 1:6))

To estimate the parameters of our model (visualized in the path diagram below) using lavaan, we define two latent variables (i.e., ri the random intercept and rc the random slope/coefficient). The syntax 1*Y1 means that we are fixing the factor loadings for Y1 to 1. We also constrain the residual variance estimates (indicated as c) for all the observed variables to be the same by Y1 ~~ c*Y1, which means Y1 has a residual variance c to be estimated (and this is the same for the other response variables).

lavaan_lgm <- 'ri =~ 1*Y1 + 1*Y2 + 1*Y3 + 1*Y4 + 1*Y5 + 1*Y6
               rc =~ 0*Y1 + 1*Y2 + 2*Y3 + 3*Y4 + 4*Y5 + 5*Y6
               Y1 ~~ c*Y1
               Y2 ~~ c*Y2
               Y3 ~~ c*Y3
               Y4 ~~ c*Y4
               Y5 ~~ c*Y5
               Y6 ~~ c*Y6
               ri ~~ rc
               '

We can make a path diagram like this:

semPaths(semPlotModel_lavaanModel(lavaan_lgm))

To display Greek letters and subscripts, we can use the following syntax:

FIT2 <-
  semPlotModel_lavaanModel(
    lavaan_lgm,
    auto.var = TRUE,
    auto.fix.first = TRUE,
    auto.cov.lv.x = TRUE
  )
semPaths(
  FIT2,
  what = "paths",
  whatLabels = "par",
  nodeLabels = c(
    expression(paste(Y[1])),
    expression(paste(Y[2])),
    expression(paste(Y[3])),
    expression(paste(Y[4])),
    expression(paste(Y[5])),
    expression(paste(Y[6])),
    expression(paste(eta[1])),
    expression(paste(eta[2]))
  ),
  edge.label.cex = 0.6,
  edgeLabels = c(
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(1)),
    expression(paste(0)),
    expression(paste(1)),
    expression(paste(2)),
    expression(paste(3)),
    expression(paste(4)),
    expression(paste(5)),
    expression(paste(epsilon[1])),
    expression(paste(epsilon[2])),
    expression(paste(epsilon[3])),
    expression(paste(epsilon[4])),
    expression(paste(epsilon[5])),
    expression(paste(epsilon[6])),
    expression(paste("Covariance")),
    expression(paste(psi[1])),
    expression(paste(psi[2]))
  )
)