Bayesian multivariate generalized linear models with correlated group-specific terms via Stan
Source:R/stan_mvmer.R
stan_mvmer.RdBayesian inference for multivariate GLMs with group-specific coefficients that are assumed to be correlated across the GLM submodels.
Usage
stan_mvmer(
formula,
data,
family = gaussian,
weights,
prior = normal(autoscale = TRUE),
prior_intercept = normal(autoscale = TRUE),
prior_aux = cauchy(0, 5, autoscale = TRUE),
prior_covariance = lkj(autoscale = TRUE),
prior_PD = FALSE,
algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL,
max_treedepth = 10L,
init = "random",
QR = FALSE,
sparse = FALSE,
...
)Arguments
- formula
A two-sided linear formula object describing both the fixed-effects and random-effects parts of the longitudinal submodel similar in vein to formula specification in the lme4 package (see
glmeror the lme4 vignette for details). Note however that the double bar (||) notation is not allowed when specifying the random-effects parts of the formula, and neither are nested grouping factors (e.g.(1 | g1/g2))or(1 | g1:g2), whereg1,g2are grouping factors. For a multivariate GLM this should be a list of such formula objects, with each element of the list providing the formula for one of the GLM submodels.- data
A data frame containing the variables specified in
formula. For a multivariate GLM, this can be either a single data frame which contains the data for all GLM submodels, or it can be a list of data frames where each element of the list provides the data for one of the GLM submodels.- family
The family (and possibly also the link function) for the GLM submodel(s). See
glmerfor details. If fitting a multivariate GLM, then this can optionally be a list of families, in which case each element of the list specifies the family for one of the GLM submodels. In other words, a different family can be specified for each GLM submodel.- weights
Same as in
glm, except that when fitting a multivariate GLM and a list of data frames is provided indatathen a corresponding list of weights must be provided. If weights are provided for one of the GLM submodels, then they must be provided for all GLM submodels.- prior, prior_intercept, prior_aux
Same as in
stan_glmerexcept that for a multivariate GLM a list of priors can be provided for any ofprior,prior_interceptorprior_auxarguments. That is, different priors can optionally be specified for each of the GLM submodels. If a list is not provided, then the same prior distributions are used for each GLM submodel. Note that the"product_normal"prior is not allowed forstan_mvmer.- prior_covariance
Cannot be
NULL; seepriorsfor more information about the prior distributions on covariance matrices. Note however that the default prior for covariance matrices instan_mvmeris slightly different to that instan_glmer(the details of which are described on thepriorspage).- prior_PD
A logical scalar (defaulting to
FALSE) indicating whether to draw from the prior predictive distribution instead of conditioning on the outcome.- algorithm
A string (possibly abbreviated) indicating the estimation approach to use. Can be
"sampling"for MCMC (the default),"optimizing"for optimization,"meanfield"for variational inference with independent normal distributions, or"fullrank"for variational inference with a multivariate normal distribution. Seerstanarm-packagefor more details on the estimation algorithms. NOTE: not all fitting functions support all four algorithms.- adapt_delta
Only relevant if
algorithm="sampling". See the adapt_delta help page for details.- max_treedepth
A positive integer specifying the maximum treedepth for the non-U-turn sampler. See the
controlargument instan.- init
The method for generating initial values. See
stan.- QR
A logical scalar defaulting to
FALSE, but ifTRUEapplies a scaledqrdecomposition to the design matrix. The transformation does not change the likelihood of the data but is recommended for computational reasons when there are multiple predictors. See the QR-argument documentation page for details on how rstanarm does the transformation and important information about how to interpret the prior distributions of the model parameters when usingQR=TRUE.- sparse
A logical scalar (defaulting to
FALSE) indicating whether to use a sparse representation of the design (X) matrix. IfTRUE, the the design matrix is not centered (since that would destroy the sparsity) and likewise it is not possible to specify bothQR = TRUEandsparse = TRUE. Depending on how many zeros there are in the design matrix, settingsparse = TRUEmay make the code run faster and can consume much less RAM.- ...
Further arguments passed to the function in the rstan package (
sampling,vb, oroptimizing), corresponding to the estimation method named byalgorithm. For example, ifalgorithmis"sampling"it is possible to specifyiter,chains,cores, and other MCMC controls.Another useful argument that can be passed to rstan via
...isrefresh, which specifies how often to print updates when sampling (i.e., show the progress everyrefreshiterations).refresh=0turns off the iteration updates.
Value
A stanmvreg object is returned.
Details
The stan_mvmer function can be used to fit a multivariate
generalized linear model (GLM) with group-specific terms. The model consists
of distinct GLM submodels, each which contains group-specific terms; within
a grouping factor (for example, patient ID) the grouping-specific terms are
assumed to be correlated across the different GLM submodels. It is
possible to specify a different outcome type (for example a different
family and/or link function) for each of the GLM submodels.
Bayesian estimation of the model is performed via MCMC, in the same way as
for stan_glmer. Also, similar to stan_glmer,
an unstructured covariance matrix is used for the group-specific terms
within a given grouping factor, with priors on the terms of a decomposition
of the covariance matrix.See priors for more information about
the priors distributions that are available for the covariance matrices,
the regression coefficients and the intercept and auxiliary parameters.
Examples
if (.Platform$OS.type != "windows" || .Platform$r_arch !="i386") {
# \donttest{
#####
# A multivariate GLM with two submodels. For the grouping factor 'id', the
# group-specific intercept from the first submodel (logBili) is assumed to
# be correlated with the group-specific intercept and linear slope in the
# second submodel (albumin)
f1 <- stan_mvmer(
formula = list(
logBili ~ year + (1 | id),
albumin ~ sex + year + (year | id)),
data = pbcLong,
# this next line is only to keep the example small in size!
chains = 1, cores = 1, seed = 12345, iter = 1000)
summary(f1)
#####
# A multivariate GLM with one bernoulli outcome and one
# gaussian outcome. We will artificially create the bernoulli
# outcome by dichotomising log serum bilirubin
pbcLong$ybern <- as.integer(pbcLong$logBili >= mean(pbcLong$logBili))
f2 <- stan_mvmer(
formula = list(
ybern ~ year + (1 | id),
albumin ~ sex + year + (year | id)),
data = pbcLong,
family = list(binomial, gaussian),
chains = 1, cores = 1, seed = 12345, iter = 1000)
# }
}
#> Fitting a multivariate glmer model.
#>
#> Please note the warmup may be much slower than later iterations!
#>
#> SAMPLING FOR MODEL 'mvmer' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 9.4e-05 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.94 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 3.733 seconds (Warm-up)
#> Chain 1: 2.1 seconds (Sampling)
#> Chain 1: 5.833 seconds (Total)
#> Chain 1:
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Fitting a multivariate glmer model.
#>
#> Please note the warmup may be much slower than later iterations!
#>
#> SAMPLING FOR MODEL 'mvmer' NOW (CHAIN 1).
#> Chain 1:
#> Chain 1: Gradient evaluation took 0.000104 seconds
#> Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.04 seconds.
#> Chain 1: Adjust your expectations accordingly!
#> Chain 1:
#> Chain 1:
#> Chain 1: Iteration: 1 / 1000 [ 0%] (Warmup)
#> Chain 1: Iteration: 100 / 1000 [ 10%] (Warmup)
#> Chain 1: Iteration: 200 / 1000 [ 20%] (Warmup)
#> Chain 1: Iteration: 300 / 1000 [ 30%] (Warmup)
#> Chain 1: Iteration: 400 / 1000 [ 40%] (Warmup)
#> Chain 1: Iteration: 500 / 1000 [ 50%] (Warmup)
#> Chain 1: Iteration: 501 / 1000 [ 50%] (Sampling)
#> Chain 1: Iteration: 600 / 1000 [ 60%] (Sampling)
#> Chain 1: Iteration: 700 / 1000 [ 70%] (Sampling)
#> Chain 1: Iteration: 800 / 1000 [ 80%] (Sampling)
#> Chain 1: Iteration: 900 / 1000 [ 90%] (Sampling)
#> Chain 1: Iteration: 1000 / 1000 [100%] (Sampling)
#> Chain 1:
#> Chain 1: Elapsed Time: 4.113 seconds (Warm-up)
#> Chain 1: 2.241 seconds (Sampling)
#> Chain 1: 6.354 seconds (Total)
#> Chain 1:
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess