For models fit using MCMC, compute approximate leave-one-out
cross-validation (LOO, LOOIC) or, less preferably, the Widely Applicable
Information Criterion (WAIC) using the loo
package. (For \(K\)-fold cross-validation see `kfold.stanreg`

.)
Functions for model comparison, and model weighting/averaging are also
provided.

**Note**: these functions are not guaranteed to work
properly unless the `data`

argument was specified when the model was
fit. Also, as of loo version `2.0.0`

the default number of cores
is now only 1, but we recommend using as many (or close to as many) cores
as possible by setting the `cores`

argument or using
`options(mc.cores = VALUE)`

to set it for an entire session.

# S3 method for stanreg loo( x, ..., cores = getOption("mc.cores", 1), save_psis = FALSE, k_threshold = NULL ) # S3 method for stanreg waic(x, ...) # S3 method for stanreg loo_compare(x, ..., criterion = c("loo", "kfold", "waic"), detail = FALSE) # S3 method for stanreg_list loo_compare(x, ..., criterion = c("loo", "kfold", "waic"), detail = FALSE) # S3 method for stanreg_list loo_model_weights(x, ..., cores = getOption("mc.cores", 1), k_threshold = NULL) compare_models(..., loos = list(), detail = FALSE)

x | For For the |
---|---|

... | For For |

cores, save_psis | Passed to |

k_threshold | Threshold for flagging estimates of the Pareto shape
parameters \(k\) estimated by |

criterion | For |

detail | For |

loos | a list of objects produced by the |

The structure of the objects returned by `loo`

and `waic`

methods are documented in detail in the **Value** section in
`loo`

and `waic`

(from the loo
package).

`loo_compare`

returns a matrix with class 'compare.loo'. See the
**Comparing models** section below for more details.

The `loo`

method for stanreg objects
provides an interface to the loo package for
approximate leave-one-out cross-validation (LOO). The LOO Information
Criterion (LOOIC) has the same purpose as the Akaike Information Criterion
(AIC) that is used by frequentists. Both are intended to estimate the
expected log predictive density (ELPD) for a new dataset. However, the AIC
ignores priors and assumes that the posterior distribution is multivariate
normal, whereas the functions from the loo package do not make this
distributional assumption and integrate over uncertainty in the parameters.
This only assumes that any one observation can be omitted without having a
major effect on the posterior distribution, which can be judged using the
diagnostic plot provided by the `plot.loo`

method and the
warnings provided by the `print.loo`

method (see the
*How to Use the rstanarm Package* vignette for an example of this
process).

`loo`

gives warnings (k_threshold)The `k_threshold`

argument to the `loo`

method for rstanarm
models is provided as a possible remedy when the diagnostics reveal
problems stemming from the posterior's sensitivity to particular
observations. Warnings about Pareto \(k\) estimates indicate observations
for which the approximation to LOO is problematic (this is described in
detail in Vehtari, Gelman, and Gabry (2017) and the
loo package documentation). The
`k_threshold`

argument can be used to set the \(k\) value above
which an observation is flagged. If `k_threshold`

is not `NULL`

and there are \(J\) observations with \(k\) estimates above
`k_threshold`

then when `loo`

is called it will refit the
original model \(J\) times, each time leaving out one of the \(J\)
problematic observations. The pointwise contributions of these observations
to the total ELPD are then computed directly and substituted for the
previous estimates from these \(J\) observations that are stored in the
object created by `loo`

. Another option to consider is K-fold
cross-validation, which is documented on a separate page (see
`kfold`

).

**Note**: in the warning messages issued by `loo`

about large
Pareto \(k\) estimates we recommend setting `k_threshold`

to at
least \(0.7\). There is a theoretical reason, explained in Vehtari,
Gelman, and Gabry (2017), for setting the threshold to the stricter value
of \(0.5\), but in practice they find that errors in the LOO
approximation start to increase non-negligibly when \(k > 0.7\).

"loo" (or "waic" or "kfold") objects can be passed
to the `loo_compare`

function in the loo package to
perform model comparison. rstanarm also provides a
`loo_compare.stanreg`

method that can be used if the "loo" (or "waic"
or "kfold") object has been added to the fitted model object (see the
**Examples** section below for how to do this). This second method
allows rstanarm to perform some extra checks that can't be done by
the loo package itself (e.g., verifying that all models to be
compared were fit using the same outcome variable).

`loo_compare`

will return a matrix with one row per model and columns
containing the ELPD difference and the standard error of the difference. In
the first row of the matrix will be the model with the largest ELPD
(smallest LOOIC) and will contain zeros (there is no difference between
this model and itself). For each of the remaining models the ELPD
difference and SE are reported relative to the model with the best ELPD
(the first row). See the **Details** section at the
`loo_compare`

page in the loo package for more
information.

The `loo_model_weights`

method can be used to
compute model weights for a `"stanreg_list"`

object, which is a list
of fitted model objects made with `stanreg_list`

. The end of
the **Examples** section has a demonstration. For details see the
`loo_model_weights`

documentation in the loo
package.

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical
Bayesian model evaluation using leave-one-out cross-validation and WAIC.
*Statistics and Computing*. 27(5), 1413--1432.
doi:10.1007/s11222-016-9696-4. arXiv preprint:
http://arxiv.org/abs/1507.04544/

Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using
stacking to average Bayesian predictive distributions. *Bayesian
Analysis*, advance publication, doi:10.1214/17-BA1091.
(online).

Gabry, J. , Simpson, D. , Vehtari, A. , Betancourt, M. and
Gelman, A. (2019), Visualization in Bayesian workflow. *J. R. Stat.
Soc. A*, 182: 389-402. doi:10.1111/rssa.12378,
(journal version,
arXiv preprint,
code on GitHub)

The new loo package vignettes and various rstanarm vignettes for more examples using

`loo`

and related functions with rstanarm models.`pareto-k-diagnostic`

in the loo package for more on Pareto \(k\) diagnostics.`log_lik.stanreg`

to directly access the pointwise log-likelihood matrix.

# \donttest{ fit1 <- stan_glm(mpg ~ wt, data = mtcars, refresh = 0) fit2 <- stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0) # (for bigger models use as many cores as possible) loo1 <- loo(fit1, cores = 2) print(loo1)#> #> Computed from 4000 by 32 log-likelihood matrix #> #> Estimate SE #> elpd_loo -83.6 4.3 #> p_loo 3.3 1.2 #> looic 167.1 8.6 #> ------ #> Monte Carlo SE of elpd_loo is 0.1. #> #> All Pareto k estimates are good (k < 0.5). #> See help('pareto-k-diagnostic') for details.#> #> Computed from 4000 by 32 log-likelihood matrix #> #> Estimate SE #> elpd_loo -78.6 4.6 #> p_loo 4.1 1.3 #> looic 157.1 9.2 #> ------ #> Monte Carlo SE of elpd_loo is 0.1. #> #> Pareto k diagnostic values: #> Count Pct. Min. n_eff #> (-Inf, 0.5] (good) 31 96.9% 711 #> (0.5, 0.7] (ok) 1 3.1% 456 #> (0.7, 1] (bad) 0 0.0% <NA> #> (1, Inf) (very bad) 0 0.0% <NA> #> #> All Pareto k estimates are ok (k < 0.7). #> See help('pareto-k-diagnostic') for details.# when comparing models the loo objects can be passed to loo_compare # as individual arguments or as a list of loo objects loo_compare(loo1, loo2)#> elpd_diff se_diff #> fit2 0.0 0.0 #> fit1 -5.0 2.8#> elpd_diff se_diff #> fit2 0.0 0.0 #> fit1 -5.0 2.8# if the fitted model objects contain a loo object in the component "loo" # then the model objects can be passed directly or as a stanreg_list fit1$loo <- loo1 fit2$loo <- loo2 loo_compare(fit1, fit2)#> Model comparison based on LOO-CV: #> elpd_diff se_diff #> fit2 0.0 0.0 #> fit1 -5.0 2.8# if the fitted model objects contain a loo object _and_ a waic or kfold # object, then the criterion argument determines which of them the comparison # is based on fit1$waic <- waic(fit1)#> Warning: #> 3 (9.4%) p_waic estimates greater than 0.4. We recommend trying loo instead.fit2$waic <- waic(fit2)#> Warning: #> 3 (9.4%) p_waic estimates greater than 0.4. We recommend trying loo instead.loo_compare(fit1, fit2, criterion = "waic")#> Model comparison based on WAIC: #> elpd_diff se_diff #> fit2 0.0 0.0 #> fit1 -5.0 2.8# the models can also be combined into a stanreg_list object, and more # informative model names can be provided to use when printing model_list <- stanreg_list(fit1, fit2, model_names = c("Fewer predictors", "More predictors")) loo_compare(model_list)#> Model comparison based on LOO-CV: #> elpd_diff se_diff #> More predictors 0.0 0.0 #> Fewer predictors -5.0 2.8fit3 <- stan_glm(mpg ~ disp * as.factor(cyl), data = mtcars, refresh = 0) loo3 <- loo(fit3, cores = 2, k_threshold = 0.7)#>#>loo_compare(loo1, loo2, loo3)#> elpd_diff se_diff #> fit3 0.0 0.0 #> fit2 -1.0 4.2 #> fit1 -6.0 5.1# setting detail=TRUE will also print model formulas if used with # loo_compare.stanreg or loo_compare.stanreg_list fit3$loo <- loo3 model_list <- stanreg_list(fit1, fit2, fit3) loo_compare(model_list, detail=TRUE)#> Model formulas: #> fit1: mpg ~ wt #> fit2: mpg ~ wt + cyl #> fit3: mpg ~ disp * as.factor(cyl) #> #> Model comparison based on LOO-CV: #> elpd_diff se_diff #> fit3 0.0 0.0 #> fit2 -1.0 4.2 #> fit1 -6.0 5.1# Computing model weights # # if the objects in model_list already have 'loo' components then those # will be used. otherwise loo will be computed for each model internally # (in which case the 'cores' argument may also be used and is passed to loo()) loo_model_weights(model_list) # defaults to method="stacking"#> Method: stacking #> ------ #> weight #> fit1 0.000 #> fit2 0.494 #> fit3 0.506loo_model_weights(model_list, method = "pseudobma")#> Method: pseudo-BMA+ with Bayesian bootstrap #> ------ #> weight #> fit1 0.034 #> fit2 0.396 #> fit3 0.570loo_model_weights(model_list, method = "pseudobma", BB = FALSE)#> Method: pseudo-BMA #> ------ #> weight #> fit1 0.002 #> fit2 0.273 #> fit3 0.725# you can also pass precomputed loo objects directly to loo_model_weights loo_list <- list(A = loo1, B = loo2, C = loo3) # names optional (affects printing) loo_model_weights(loo_list)#> Method: stacking #> ------ #> weight #> A 0.000 #> B 0.494 #> C 0.506# }