The E_loo() function computes weighted expectations (means, variances,
quantiles) using the importance weights obtained from the
PSIS smoothing procedure. The expectations estimated by the
E_loo() function assume that the PSIS approximation is working well.
A small Pareto k estimate is necessary,
but not sufficient, for E_loo() to give reliable estimates. Additional
diagnostic checks for gauging the reliability of the estimates are in
development and will be added in a future release.
A numeric vector or matrix.
An object returned by psis().
Arguments passed to individual methods.
The type of expectation to compute. The options are
"mean", "variance", "sd", and "quantile".
For computing quantiles, a vector of probabilities.
Optionally, a vector or matrix (the same dimensions as x)
of raw (not smoothed) log ratios. If working with log-likelihood values,
the log ratios are the negative of those values. If log_ratios is
specified we are able to compute more accurate Pareto k
diagnostics specific to E_loo().
A named list with the following components:
valueThe result of the computation.
For the matrix method, value is a vector with ncol(x)
elements, with one exception: when type="quantile" and
multiple values are specified in probs the value component of
the returned object is a length(probs) by ncol(x) matrix.
For the default/vector method the value component is scalar, with
one exception: when type="quantile" and multiple values
are specified in probs the value component is a vector with
length(probs) elements.
pareto_kFunction-specific diagnostic.
For the matrix method it will be a vector of length ncol(x)
containing estimates of the shape parameter \(k\) of the
generalized Pareto distribution. For the default/vector method,
the estimate is a scalar. If log_ratios is not specified when
calling E_loo(), the smoothed log-weights are used to estimate
Pareto-k's, which may produce optimistic estimates.
For type="mean", type="var", and type="sd", the returned Pareto-k is
the maximum of the Pareto-k's for the left and right tail of \(hr\) and
the right tail of \(r\), where \(r\) is the importance ratio and
\(h=x\) for type="mean" and \(h=x^2\) for type="var" and
type="sd". For type="quantile", the returned Pareto-k is the Pareto-k
for the right tail of \(r\).
# \donttest{
# Use rstanarm package to quickly fit a model and get both a log-likelihood
# matrix and draws from the posterior predictive distribution
library("rstanarm")
#> Loading required package: Rcpp
#> This is rstanarm version 2.32.1
#> - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
#> - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores())
# data from help("lm")
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
d <- data.frame(
weight = c(ctl, trt),
group = gl(2, 10, 20, labels = c("Ctl","Trt"))
)
fit <- stan_glm(weight ~ group, data = d, refresh = 0)
yrep <- posterior_predict(fit)
dim(yrep)
#> [1] 4000 20
log_ratios <- -1 * log_lik(fit)
dim(log_ratios)
#> [1] 4000 20
r_eff <- relative_eff(exp(-log_ratios), chain_id = rep(1:4, each = 1000))
psis_object <- psis(log_ratios, r_eff = r_eff, cores = 2)
E_loo(yrep, psis_object, type = "mean")
#> Error in is_constant(x_i): could not find function "is_constant"
E_loo(yrep, psis_object, type = "var")
#> Error in is_constant(x_i): could not find function "is_constant"
E_loo(yrep, psis_object, type = "sd")
#> Error in is_constant(x_i): could not find function "is_constant"
E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median
#> $value
#> [1] 5.143318 4.990419 5.045964 4.893522 5.078653 5.088694 4.995732 5.106827
#> [9] 4.985569 5.044546 4.638359 4.719482 4.693008 4.770725 4.530231 4.727474
#> [17] 4.532588 4.644159 4.692495 4.642926
#>
#> $pareto_k
#> [1] 0.15656938 0.21659000 0.13761689 0.30380728 0.15870293 0.13844575
#> [7] 0.13090888 0.13384083 0.20959799 0.11930881 0.18298672 0.21539893
#> [13] 0.07619255 0.26787598 0.33112990 0.27326698 0.42694305 0.16502194
#> [19] 0.11863410 0.12681907
#>
E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9))
#> $value
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 4.202947 4.024490 4.069320 3.975519 4.072362 4.100821 3.999408 4.086386
#> [2,] 6.058854 5.971274 6.083626 5.812561 6.065001 6.079247 5.974741 6.080411
#> [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16]
#> [1,] 3.974495 3.985170 3.685336 3.742536 3.720220 3.838895 3.661197 3.758993
#> [2,] 6.000139 6.057105 5.667853 5.731417 5.646776 5.683174 5.429682 5.687527
#> [,17] [,18] [,19] [,20]
#> [1,] 3.652822 3.657359 3.702788 3.663643
#> [2,] 5.394626 5.655873 5.676459 5.635945
#>
#> $pareto_k
#> [1] 0.15656938 0.21659000 0.13761689 0.30380728 0.15870293 0.13844575
#> [7] 0.13090888 0.13384083 0.20959799 0.11930881 0.18298672 0.21539893
#> [13] 0.07619255 0.26787598 0.33112990 0.27326698 0.42694305 0.16502194
#> [19] 0.11863410 0.12681907
#>
# We can get more accurate Pareto k diagnostic if we also provide
# the log_ratios argument
E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios)
#> Error in is_constant(x_i): could not find function "is_constant"
# }