Efficient approximate leave-one-out cross-validation (LOO) using subsampling, so that less costly and more approximate computation is made for all LOO-fold, and more costly and accurate computations are made only for m<N LOO-folds.
Source:R/loo_subsample.R
loo_subsample.RdEfficient approximate leave-one-out cross-validation (LOO) using subsampling, so that less costly and more approximate computation is made for all LOO-fold, and more costly and accurate computations are made only for m<N LOO-folds.
Usage
loo_subsample(x, ...)
# S3 method for class '`function`'
loo_subsample(
x,
...,
data = NULL,
draws = NULL,
observations = 400,
log_p = NULL,
log_g = NULL,
r_eff = 1,
save_psis = FALSE,
cores = getOption("mc.cores", 1),
loo_approximation = "plpd",
loo_approximation_draws = NULL,
estimator = "diff_srs",
llgrad = NULL,
llhess = NULL
)Arguments
- x
A function. The Methods (by class) section, below, has detailed descriptions of how to specify the inputs.
- data, draws, ...
For
loo_subsample.function(), these are the data, posterior draws, and other arguments to pass to the log-likelihood function. Note that for someloo_approximations, the draws will be replaced by the posteriors summary statistics to compute loo approximations. See argumentloo_approximationfor details.- observations
The subsample observations to use. The argument can take four (4) types of arguments:
NULLto use all observations. The algorithm then just uses standardloo()orloo_approximate_posterior().A single integer to specify the number of observations to be subsampled.
A vector of integers to provide the indices used to subset the data. These observations need to be subsampled with the same scheme as given by the
estimatorargument.A
psis_loo_ssobject to use the same observations that were used in a previous call toloo_subsample().
- log_p, log_g
Should be supplied only if approximate posterior draws are used. The default (
NULL) indicates draws are from "true" posterior (i.e. using MCMC). If notNULLthen they should be specified as described inloo_approximate_posterior().- r_eff
Vector of relative effective sample size estimates for the likelihood (
exp(log_lik)) of each observation. This is related to the relative efficiency of estimating the normalizing term in self-normalized importance sampling when using posterior draws obtained with MCMC. If MCMC draws are used andr_effis not provided then the reported PSIS effective sample sizes and Monte Carlo error estimates can be over-optimistic. If the posterior draws are (near) independent thenr_eff=1can be used.r_effhas to be a scalar (same value is used for all observations) or a vector with length equal to the number of observations. The default value is 1. See therelative_eff()helper functions for help computingr_eff.- save_psis
Should the
"psis"object created internally byloo_subsample()be saved in the returned object? Seeloo()for details.- cores
The number of cores to use for parallelization. This defaults to the option
mc.coreswhich can be set for an entire R session byoptions(mc.cores = NUMBER). The old optionloo.coresis now deprecated but will be given precedence overmc.coresuntilloo.coresis removed in a future release. As of version 2.0.0 the default is now 1 core ifmc.coresis not set, but we recommend using as many (or close to as many) cores as possible.Note for Windows 10 users: it is strongly recommended to avoid using the
.Rprofilefile to setmc.cores(using thecoresargument or settingmc.coresinteractively or in a script is fine).
- loo_approximation
What type of approximation of the loo_i's should be used? The default is
"plpd"(the log predictive density using the posterior expectation). There are six different methods implemented to approximate loo_i's (see the references for more details):"plpd": uses the lpd based on point estimates (i.e., \(p(y_i|\hat{\theta})\))."lpd": uses the lpds (i,e., \(p(y_i|y)\))."tis": uses truncated importance sampling to approximate PSIS-LOO."waic": uses waic (i.e., \(p(y_i|y) - p_{waic}\))."waic_grad_marginal": uses waic approximation using first order delta method and posterior marginal variances to approximate \(p_{waic}\) (ie. \(p(y_i|\hat{\theta})\)-p_waic_grad_marginal). Requires gradient of likelihood function."waic_grad": uses waic approximation using first order delta method and posterior covariance to approximate \(p_{waic}\) (ie. \(p(y_i|\hat{\theta})\)-p_waic_grad). Requires gradient of likelihood function."waic_hess": uses waic approximation using second order delta method and posterior covariance to approximate \(p_{waic}\) (ie. \(p(y_i|\hat{\theta})\)-p_waic_grad). Requires gradient and Hessian of likelihood function.
As point estimates of \(\hat{\theta}\), the posterior expectations of the parameters are used.
- loo_approximation_draws
The number of posterior draws used when integrating over the posterior. This is used if
loo_approximationis set to"lpd","waic", or"tis".- estimator
How should
elpd_loo,p_looandlooicbe estimated? The default is"diff_srs"."diff_srs": uses the difference estimator with simple random sampling without replacement (srs).p_loois estimated using standard srs. (Magnusson et al., 2020)"hh": uses the Hansen-Hurwitz estimator with sampling with replacement proportional to size, whereabsof loo_approximation is used as size. (Magnusson et al., 2019)"srs": uses simple random sampling and ordinary estimation.
- llgrad
The gradient of the log-likelihood. This is only used when
loo_approximationis"waic_grad","waic_grad_marginal", or"waic_hess". The default isNULL.- llhess
The Hessian of the log-likelihood. This is only used with
loo_approximation = "waic_hess". The default isNULL.
Value
loo_subsample() returns a named list with class c("psis_loo_ss", "psis_loo", "loo"). This has the same structure as objects returned by
loo() but with the additional slot:
loo_subsampling: A list with two vectors,log_pandlog_g, of the same length containing the posterior density and the approximation density for the individual draws.
Details
The loo_subsample() function is an S3 generic and a methods is
currently provided for log-likelihood functions. The implementation works
for both MCMC and for posterior approximations where it is possible to
compute the log density for the approximation.
Methods (by class)
loo_subsample(`function`): A functionf()that takes argumentsdata_ianddrawsand returns a vector containing the log-likelihood for a single observationievaluated at each posterior draw. The function should be written such that, for each observationiin1:N, evaluatingresults in a vector of length
S(size of posterior sample). The log-likelihood function can also have additional arguments butdata_ianddrawsare required.If using the function method then the arguments
dataanddrawsmust also be specified in the call toloo():data: A data frame or matrix containing the data (e.g. observed outcome and predictors) needed to compute the pointwise log-likelihood. For each observationi, theith row ofdatawill be passed to thedata_iargument of the log-likelihood function.draws: An object containing the posterior draws for any parameters needed to compute the pointwise log-likelihood. Unlikedata, which is indexed by observation, for each observation the entire objectdrawswill be passed to thedrawsargument of the log-likelihood function.The
...can be used if your log-likelihood function takes additional arguments. These arguments are used like thedrawsargument in that they are recycled for each observation.
References
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2019). Leave-One-Out Cross-Validation for Large Data. In Thirty-sixth International Conference on Machine Learning, PMLR 97:4244-4253.
Magnusson, M., Riis Andersen, M., Jonasson, J. and Vehtari, A. (2020). Leave-One-Out Cross-Validation for Model Comparison in Large Data. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 108:341-351.