The kfold
method performs exact \(K\)fold crossvalidation. First
the data are randomly partitioned into \(K\) subsets of equal size (or as close
to equal as possible), or the user can specify the folds
argument
to determine the partitioning. Then the model is refit \(K\) times, each time
leaving out one of the \(K\) subsets. If \(K\) is equal to the total
number of observations in the data then \(K\)fold crossvalidation is
equivalent to exact leaveoneout crossvalidation (to which
loo
is an efficient approximation).
# S3 method for stanreg kfold( x, K = 10, ..., folds = NULL, save_fits = FALSE, cores = getOption("mc.cores", 1) )
x  A fitted model object returned by one of the rstanarm modeling functions. See stanregobjects. 

K  For 
...  Currently ignored. 
folds  For 
save_fits  For 
cores  The number of cores to use for parallelization. Instead fitting
separate Markov chains for the same model on different cores, by default

An object with classes 'kfold' and 'loo' that has a similar structure
as the objects returned by the loo
and waic
methods and is compatible with the loo_compare
function for
comparing models.
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical Bayesian model evaluation using leaveoneout crossvalidation and WAIC. Statistics and Computing. 27(5), 14131432. doi:10.1007/s1122201696964. arXiv preprint: http://arxiv.org/abs/1507.04544/
Yao, Y., Vehtari, A., Simpson, D., and Gelman, A. (2018) Using stacking to average Bayesian predictive distributions. Bayesian Analysis, advance publication, doi:10.1214/17BA1091. (online).
# \donttest{ fit1 < stan_glm(mpg ~ wt, data = mtcars, refresh = 0) fit2 < stan_glm(mpg ~ wt + cyl, data = mtcars, refresh = 0) fit3 < stan_glm(mpg ~ disp * as.factor(cyl), data = mtcars, refresh = 0) # 10fold crossvalidation # (if possible also specify the 'cores' argument to use multiple cores) (kfold1 < kfold(fit1, K = 10))#>#>#>#>#>#>#>#>#>#>#> #> Based on 10fold crossvalidation #> #> Estimate SE #> elpd_kfold 83.6 4.2 #> p_kfold NA NA #> kfoldic 167.2 8.5kfold2 < kfold(fit2, K = 10)#>#>#>#>#>#>#>#>#>#>kfold3 < kfold(fit3, K = 10)#>#>#>#>#>#>#>#>#>#>#> elpd_diff se_diff #> fit3 0.0 0.0 #> fit2 4.6 6.1 #> fit1 6.0 5.0# stratifying by a grouping variable # (note: might get some divergences warnings with this model but # this is just intended as a quick example of how to code this) fit4 < stan_lmer(mpg ~ disp + (1cyl), data = mtcars, refresh = 0)#> Warning: There were 5 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help. See #> http://mcstan.org/misc/warnings.html#divergenttransitionsafterwarmup#> Warning: Examine the pairs() plot to diagnose sampling problems#> #> 4 6 8 #> 11 7 14folds_cyl < loo::kfold_split_stratified(K = 3, x = mtcars$cyl) table(cyl = mtcars$cyl, fold = folds_cyl)#> fold #> cyl 1 2 3 #> 4 4 4 3 #> 6 2 2 3 #> 8 5 5 4kfold4 < kfold(fit4, folds = folds_cyl, cores = 2)#>print(kfold4)#> #> Based on 3fold crossvalidation #> #> Estimate SE #> elpd_kfold 86.4 4.2 #> p_kfold NA NA #> kfoldic 172.7 8.5# } # Example code demonstrating the different ways to specify the number # of cores and how the cores are used # # options(mc.cores = NULL) # # # spread the K models over N_CORES cores (method 1) # kfold(fit, K, cores = N_CORES) # # # spread the K models over N_CORES cores (method 2) # options(mc.cores = N_CORES) # kfold(fit, K) # # # fit K models sequentially using N_CORES cores for the Markov chains each time # options(mc.cores = N_CORES) # kfold(fit, K, cores = 1)