For models fit using MCMC, compute approximate leave-one-out cross-validation (LOO, LOOIC) or, less preferably, the Widely Applicable Information Criterion (WAIC) using the loo package. Functions for \(K\)-fold cross-validation, model comparison, and model weighting/averaging are also provided. Note: these functions are not guaranteed to work properly unless the data argument was specified when the model was fit. Also, as of loo version 2.0.0 the default number of cores is now only 1, but we recommend using as many (or close to as many) cores as possible by setting the cores argument or using options(mc.cores = VALUE) to set it for an entire session.

# S3 method for stanreg
loo(x, ..., cores = getOption("mc.cores", 1),
  save_psis = FALSE, k_threshold = NULL)

# S3 method for stanreg
waic(x, ...)

kfold(x, K = 10, save_fits = FALSE, folds = NULL)

compare_models(..., loos = list(), detail = FALSE)

# S3 method for stanreg_list
loo_model_weights(x, ..., cores = getOption("mc.cores",
  1), k_threshold = NULL)

Arguments

x

For loo, waic, and kfold methods, a fitted model object returned by one of the rstanarm modeling functions. See stanreg-objects.

For loo_model_weights, x should be a "stanreg_list" object, which is a list of fitted model objects created by stanreg_list.

...

For compare_models, ... should contain two or more objects returned by the loo, kfold, or waic method (see the Examples section, below).

For loo_model_weights, ... should contain arguments (e.g. method) to pass to the default loo_model_weights method from the loo package.

cores, save_psis

Passed to loo.

k_threshold

Threshold for flagging estimates of the Pareto shape parameters \(k\) estimated by loo. See the How to proceed when loo gives warnings section, below, for details.

K

For kfold, the number of subsets (folds) into which the data will be partitioned for performing \(K\)-fold cross-validation. The model is refit K times, each time leaving out one of the K folds. If K is equal to the total number of observations in the data then \(K\)-fold cross-validation is equivalent to exact leave-one-out cross-validation.

save_fits

For kfold, if TRUE, a component 'fits' is added to the returned object to store the cross-validated stanreg objects and the indices of the omitted observations for each fold. Defaults to FALSE.

folds

For kfold, an optional integer vector with one element per observation in the data used to fit the model. Each element of the vector is an integer in 1:K indicating to which of the K folds the corresponding observation belongs. There are some convenience functions available in the loo package that create integer vectors to use for this purpose (see the Examples section below and also the kfold-helpers page).

If folds is not specified then the default is to call loo::kfold_split_random to randomly partition the data into K subsets of equal (or as close to equal as possible) size.

loos

For compare_models, a list of two or more objects returned by the loo, kfold, or waic method. This argument can be used as an alternative to passing these objects via ....

detail

For compare_models, if TRUE then extra information about each model (currently just the model formulas) will be printed with the output.

Value

The structure of the objects returned by loo and waic methods are documented in detail in the Value section in loo and waic (from the loo package).

kfold returns an object with classes 'kfold' and 'loo' that has a similar structure as the objects returned by the loo and waic methods.

compare_models returns a vector or matrix with class 'compare.loo'. See the Comparing models section below for more details.

Approximate LOO CV

The loo method for stanreg objects provides an interface to the loo package for approximate leave-one-out cross-validation (LOO). The LOO Information Criterion (LOOIC) has the same purpose as the Akaike Information Criterion (AIC) that is used by frequentists. Both are intended to estimate the expected log predictive density (ELPD) for a new dataset. However, the AIC ignores priors and assumes that the posterior distribution is multivariate normal, whereas the functions from the loo package do not make this distributional assumption and integrate over uncertainty in the parameters. This only assumes that any one observation can be omitted without having a major effect on the posterior distribution, which can be judged using the diagnostic plot provided by the plot.loo method and the warnings provided by the print.loo method (see the How to Use the rstanarm Package vignette for an example of this process).

How to proceed when loo gives warnings (k_threshold)

The k_threshold argument to the loo method for rstanarm models is provided as a possible remedy when the diagnostics reveal problems stemming from the posterior's sensitivity to particular observations. Warnings about Pareto \(k\) estimates indicate observations for which the approximation to LOO is problematic (this is described in detail in Vehtari, Gelman, and Gabry (2017) and the loo package documentation). The k_threshold argument can be used to set the \(k\) value above which an observation is flagged. If k_threshold is not NULL and there are \(J\) observations with \(k\) estimates above k_threshold then when loo is called it will refit the original model \(J\) times, each time leaving out one of the \(J\) problematic observations. The pointwise contributions of these observations to the total ELPD are then computed directly and substituted for the previous estimates from these \(J\) observations that are stored in the object created by loo. Note: in the warning messages issued by loo about large Pareto \(k\) estimates we recommend setting k_threshold to at least \(0.7\). There is a theoretical reason, explained in Vehtari, Gelman, and Gabry (2017), for setting the threshold to the stricter value of \(0.5\), but in practice they find that errors in the LOO approximation start to increase non-negligibly when \(k > 0.7\).

K-fold CV

The kfold function performs exact \(K\)-fold cross-validation. First the data are randomly partitioned into \(K\) subsets of equal (or as close to equal as possible) size (unless the folds are specified manually). Then the model is refit \(K\) times, each time leaving out one of the \(K\) subsets. If \(K\) is equal to the total number of observations in the data then \(K\)-fold cross-validation is equivalent to exact leave-one-out cross-validation (to which loo is an efficient approximation). The compare_models function is also compatible with the objects returned by kfold.

Comparing models

compare_models is a method for the compare function in the loo package that performs some extra checks to make sure the rstanarm models are suitable for comparison. These extra checks include verifying that all models to be compared were fit using the same outcome variable and likelihood family.

If exactly two models are being compared then compare_models returns a vector containing the difference in expected log predictive density (ELPD) between the models and the standard error of that difference (the documentation for compare in the loo package has additional details about the calculation of the standard error of the difference). The difference in ELPD will be negative if the expected out-of-sample predictive accuracy of the first model is higher. If the difference is be positive then the second model is preferred.

If more than two models are being compared then compare_models returns a matrix with one row per model. This matrix summarizes the objects and arranges them in descending order according to expected out-of-sample predictive accuracy. That is, the first row of the matrix will be for the model with the largest ELPD (smallest LOOIC). The columns containing the ELPD difference and the standard error of the difference contain values relative to the model with the best ELPD. See the Details section at the compare page in the loo package for more information.

Model weights

The loo_model_weights method can be used to compute model weights for a "stanreg_list" object, which is a list of fitted model objects made with stanreg_list. The end of the Examples section has a demonstration. For details see the loo_model_weights documentation in the loo package.

References

Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), 1413--1432. doi:10.1007/s11222-016-9696-4. arXiv preprint: http://arxiv.org/abs/1507.04544/

Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17-BA1091. (online).

Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., and Gelman, A. (2018). Visualization in Bayesian workflow. Journal of the Royal Statistical Society Series A, accepted for publication. arXiv preprint: http://arxiv.org/abs/1709.01449.

See also

Examples

fit1 <- stan_glm(mpg ~ wt, data = mtcars)
#> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). #> #> Gradient evaluation took 3.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.31 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.0652 seconds (Warm-up) #> 0.056849 seconds (Sampling) #> 0.122049 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). #> #> Gradient evaluation took 1.3e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.05995 seconds (Warm-up) #> 0.049897 seconds (Sampling) #> 0.109847 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). #> #> Gradient evaluation took 1.2e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.057498 seconds (Warm-up) #> 0.051135 seconds (Sampling) #> 0.108633 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). #> #> Gradient evaluation took 1.6e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.063294 seconds (Warm-up) #> 0.056327 seconds (Sampling) #> 0.119621 seconds (Total) #>
fit2 <- stan_glm(mpg ~ wt + cyl, data = mtcars)
#> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). #> #> Gradient evaluation took 2.4e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.24 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.090161 seconds (Warm-up) #> 0.08255 seconds (Sampling) #> 0.172711 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). #> #> Gradient evaluation took 2e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.099479 seconds (Warm-up) #> 0.084725 seconds (Sampling) #> 0.184204 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). #> #> Gradient evaluation took 2e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.2 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.100861 seconds (Warm-up) #> 0.083374 seconds (Sampling) #> 0.184235 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). #> #> Gradient evaluation took 1.9e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.094839 seconds (Warm-up) #> 0.078429 seconds (Sampling) #> 0.173268 seconds (Total) #>
# compare on LOOIC # (for bigger models use as many cores as possible) loo1 <- loo(fit1, cores = 2) print(loo1)
#> #> Computed from 4000 by 32 log-likelihood matrix #> #> Estimate SE #> elpd_loo -83.5 4.3 #> p_loo 3.3 1.2 #> looic 166.9 8.6 #> ------ #> Monte Carlo SE of elpd_loo is 0.1. #> #> All Pareto k estimates are good (k < 0.5). #> See help('pareto-k-diagnostic') for details.
loo2 <- loo(fit2, cores = 2) print(loo2)
#> #> Computed from 4000 by 32 log-likelihood matrix #> #> Estimate SE #> elpd_loo -78.8 4.7 #> p_loo 4.3 1.3 #> looic 157.5 9.3 #> ------ #> Monte Carlo SE of elpd_loo is 0.1. #> #> Pareto k diagnostic values: #> Count Pct. Min. n_eff #> (-Inf, 0.5] (good) 30 93.8% 673 #> (0.5, 0.7] (ok) 2 6.2% 251 #> (0.7, 1] (bad) 0 0.0% <NA> #> (1, Inf) (very bad) 0 0.0% <NA> #> #> All Pareto k estimates are ok (k < 0.7). #> See help('pareto-k-diagnostic') for details.
# when comparing exactly two models, the reported 'elpd_diff' # will be positive if the expected predictive accuracy for the # second model is higher. the approximate standard error of the # difference is also reported. compare_models(loo1, loo2)
#> #> Model comparison: #> (negative 'elpd_diff' favors 1st model, positive favors 2nd) #> #> elpd_diff se #> 4.7 2.8
compare_models(loos = list(loo1, loo2)) # can also provide list
#> #> Model comparison: #> (negative 'elpd_diff' favors 1st model, positive favors 2nd) #> #> elpd_diff se #> 4.7 2.8
# when comparing three or more models they are ordered by # expected predictive accuracy. elpd_diff and se_diff are relative # to the model with best elpd_loo (first row) fit3 <- stan_glm(mpg ~ disp * as.factor(cyl), data = mtcars)
#> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). #> #> Gradient evaluation took 4.4e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.44 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.489137 seconds (Warm-up) #> 0.483224 seconds (Sampling) #> 0.972361 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). #> #> Gradient evaluation took 1.3e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.540323 seconds (Warm-up) #> 0.430132 seconds (Sampling) #> 0.970455 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). #> #> Gradient evaluation took 1.3e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.13 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.53045 seconds (Warm-up) #> 0.489836 seconds (Sampling) #> 1.02029 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). #> #> Gradient evaluation took 1.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.492177 seconds (Warm-up) #> 0.393615 seconds (Sampling) #> 0.885792 seconds (Total) #>
loo3 <- loo(fit3, cores = 2, k_threshold = 0.7)
#> All pareto_k estimates below user-specified threshold of 0.7. #> Returning loo object.
compare_models(loo1, loo2, loo3)
#> #> Model comparison: #> (ordered by highest ELPD) #> #> elpd_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic #> fit3 0.0 -77.7 3.3 5.6 1.1 155.5 6.7 #> fit2 -1.0 -78.8 4.7 4.3 1.3 157.5 9.3 #> fit1 -5.7 -83.5 4.3 3.3 1.2 166.9 8.6
# setting detail=TRUE will also print model formulas compare_models(loo1, loo2, loo3, detail=TRUE)
#> Model formulas: #> fit1: mpg ~ wt #> fit2: mpg ~ wt + cyl #> fit3: mpg ~ disp * as.factor(cyl) #> #> Model comparison: #> (ordered by highest ELPD) #> #> elpd_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic #> fit3 0.0 -77.7 3.3 5.6 1.1 155.5 6.7 #> fit2 -1.0 -78.8 4.7 4.3 1.3 157.5 9.3 #> fit1 -5.7 -83.5 4.3 3.3 1.2 166.9 8.6
# Computing model weights model_list <- stanreg_list(fit1, fit2, fit3) loo_model_weights(model_list, cores = 2) # can specify k_threshold=0.7 if necessary
#> Method: stacking #> ------ #> weight #> fit1 0.000 #> fit2 0.477 #> fit3 0.523
# if you have already computed loo then it's more efficient to pass a list # of precomputed loo objects than a "stanreg_list", avoiding the need # for loo_models weights to call loo() internally loo_list <- list(fit1 = loo1, fit2 = loo2, fit3 = loo3) # names optional (affects printing) loo_model_weights(loo_list)
#> Method: stacking #> ------ #> weight #> fit1 0.000 #> fit2 0.477 #> fit3 0.523
# averaging predictions wts <- loo_model_weights(loo_list) yrep1 <- posterior_predict(fit1) yrep2 <- posterior_predict(fit2) yrep3 <- posterior_predict(fit3) wt_avg_yrep <- wts[1] * yrep1 + wts[2] * yrep2 + wts[3] * yrep3 # 10-fold cross-validation (kfold1 <- kfold(fit1, K = 10))
#> Fitting model 1 out of 10
#> Fitting model 2 out of 10
#> Fitting model 3 out of 10
#> Fitting model 4 out of 10
#> Fitting model 5 out of 10
#> Fitting model 6 out of 10
#> Fitting model 7 out of 10
#> Fitting model 8 out of 10
#> Fitting model 9 out of 10
#> Fitting model 10 out of 10
#> #> 10-fold cross-validation #> #> Estimate SE #> elpd_kfold -84.7 5.4
kfold2 <- kfold(fit2, K = 10)
#> Fitting model 1 out of 10
#> Fitting model 2 out of 10
#> Fitting model 3 out of 10
#> Fitting model 4 out of 10
#> Fitting model 5 out of 10
#> Fitting model 6 out of 10
#> Fitting model 7 out of 10
#> Fitting model 8 out of 10
#> Fitting model 9 out of 10
#> Fitting model 10 out of 10
compare_models(kfold1, kfold2, detail=TRUE)
#> Model formulas: #> fit1: mpg ~ wt #> fit2: mpg ~ wt + cyl #> #> Model comparison: #> (negative 'elpd_diff' favors 1st model, positive favors 2nd) #> #> elpd_diff se #> 6.2 2.8
# Cross-validation stratifying by a grouping variable # (note: might get some divergences warnings with this model but # this is just intended as a quick example of how to code this) library(loo)
#> This is loo version 2.0.0. #> **NOTE: As of version 2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session. Visit mc-stan.org/loo/news for details on other changes.
#> #> Attaching package: ‘loo’
#> The following object is masked from ‘package:rstanarm’: #> #> nlist
fit4 <- stan_lmer(mpg ~ disp + (1|cyl), data = mtcars)
#> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). #> #> Gradient evaluation took 3e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.3 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.391393 seconds (Warm-up) #> 0.359205 seconds (Sampling) #> 0.750598 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). #> #> Gradient evaluation took 3.3e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.33 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.428767 seconds (Warm-up) #> 0.300137 seconds (Sampling) #> 0.728904 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). #> #> Gradient evaluation took 2.6e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.26 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.439365 seconds (Warm-up) #> 0.409278 seconds (Sampling) #> 0.848643 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). #> #> Gradient evaluation took 1.6e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.16 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.417528 seconds (Warm-up) #> 0.365916 seconds (Sampling) #> 0.783444 seconds (Total) #>
#> Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See #> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> Warning: Examine the pairs() plot to diagnose sampling problems
table(mtcars$cyl)
#> #> 4 6 8 #> 11 7 14
folds_cyl <- kfold_split_stratified(K = 3, x = mtcars$cyl) table(cyl = mtcars$cyl, fold = folds_cyl)
#> fold #> cyl 1 2 3 #> 4 11 0 0 #> 6 0 7 0 #> 8 0 0 14
kfold4 <- kfold(fit4, K = 3, folds = folds_cyl)
#> Fitting model 1 out of 3
#> Warning: There were 7 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See #> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Fitting model 2 out of 3
#> Warning: There were 12 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See #> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> Warning: Examine the pairs() plot to diagnose sampling problems
#> Fitting model 3 out of 3
#> Warning: There were 16 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See #> http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
#> Warning: Examine the pairs() plot to diagnose sampling problems