Compute weighted expectations

The E_loo() function computes weighted expectations (means, variances, quantiles) using the importance weights obtained from the PSIS smoothing procedure. The expectations estimated by the E_loo() function assume that the PSIS approximation is working well. A small Pareto k estimate is necessary, but not sufficient, for E_loo() to give reliable estimates. Additional diagnostic checks for gauging the reliability of the estimates are in development and will be added in a future release.

E_loo(x, psis_object, ...)

# S3 method for default
E_loo(
  x,
  psis_object,
  ...,
  type = c("mean", "variance", "sd", "quantile"),
  probs = NULL,
  log_ratios = NULL
)

# S3 method for matrix
E_loo(
  x,
  psis_object,
  ...,
  type = c("mean", "variance", "sd", "quantile"),
  probs = NULL,
  log_ratios = NULL
)

Arguments

x: A numeric vector or matrix.
psis_object: An object returned by psis().
...: Arguments passed to individual methods.
type: The type of expectation to compute. The options are "mean", "variance", "sd", and "quantile".
probs: For computing quantiles, a vector of probabilities.
log_ratios: Optionally, a vector or matrix (the same dimensions as x) of raw (not smoothed) log ratios. If working with log-likelihood values, the log ratios are the negative of those values. If log_ratios is specified we are able to compute more accurate Pareto k diagnostics specific to E_loo().

Value

A named list with the following components:

value

The result of the computation.

For the matrix method, value is a vector with ncol(x) elements, with one exception: when type="quantile" and multiple values are specified in probs the value component of the returned object is a length(probs) by ncol(x) matrix.

For the default/vector method the value component is scalar, with one exception: when type="quantile" and multiple values are specified in probs the value component is a vector with length(probs) elements.

pareto_k

Function-specific diagnostic.

For the matrix method it will be a vector of length ncol(x) containing estimates of the shape parameter $k$ of the generalized Pareto distribution. For the default/vector method, the estimate is a scalar. If log_ratios is not specified when calling E_loo(), the smoothed log-weights are used to estimate Pareto-k's, which may produce optimistic estimates.

For type="mean", type="var", and type="sd", the returned Pareto-k is the maximum of the Pareto-k's for the left and right tail of $hr$ and the right tail of $r$ , where $r$ is the importance ratio and $h=x$ for type="mean" and $h=x^2$ for type="var" and type="sd". For type="quantile", the returned Pareto-k is the Pareto-k for the right tail of $r$ .

Examples

# \donttest{
# Use rstanarm package to quickly fit a model and get both a log-likelihood
# matrix and draws from the posterior predictive distribution
library("rstanarm")
#> Loading required package: Rcpp
#> This is rstanarm version 2.32.1
#> - See https://mc-stan.org/rstanarm/articles/priors for changes to default priors!
#> - Default priors may change, so it's safest to specify priors, even if equivalent to the defaults.
#> - For execution on a local, multicore CPU with excess RAM we recommend calling
#>   options(mc.cores = parallel::detectCores())

# data from help("lm")
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
d <- data.frame(
  weight = c(ctl, trt),
  group = gl(2, 10, 20, labels = c("Ctl","Trt"))
)
fit <- stan_glm(weight ~ group, data = d, refresh = 0)
yrep <- posterior_predict(fit)
dim(yrep)
#> [1] 4000   20

log_ratios <- -1 * log_lik(fit)
dim(log_ratios)
#> [1] 4000   20

r_eff <- relative_eff(exp(-log_ratios), chain_id = rep(1:4, each = 1000))
psis_object <- psis(log_ratios, r_eff = r_eff, cores = 2)

E_loo(yrep, psis_object, type = "mean")
#> Error in is_constant(x_i): could not find function "is_constant"
E_loo(yrep, psis_object, type = "var")
#> Error in is_constant(x_i): could not find function "is_constant"
E_loo(yrep, psis_object, type = "sd")
#> Error in is_constant(x_i): could not find function "is_constant"
E_loo(yrep, psis_object, type = "quantile", probs = 0.5) # median
#> $value
#>  [1] 5.143318 4.990419 5.045964 4.893522 5.078653 5.088694 4.995732 5.106827
#>  [9] 4.985569 5.044546 4.638359 4.719482 4.693008 4.770725 4.530231 4.727474
#> [17] 4.532588 4.644159 4.692495 4.642926
#> 
#> $pareto_k
#>  [1] 0.15656938 0.21659000 0.13761689 0.30380728 0.15870293 0.13844575
#>  [7] 0.13090888 0.13384083 0.20959799 0.11930881 0.18298672 0.21539893
#> [13] 0.07619255 0.26787598 0.33112990 0.27326698 0.42694305 0.16502194
#> [19] 0.11863410 0.12681907
#> 
E_loo(yrep, psis_object, type = "quantile", probs = c(0.1, 0.9))
#> $value
#>          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
#> [1,] 4.202947 4.024490 4.069320 3.975519 4.072362 4.100821 3.999408 4.086386
#> [2,] 6.058854 5.971274 6.083626 5.812561 6.065001 6.079247 5.974741 6.080411
#>          [,9]    [,10]    [,11]    [,12]    [,13]    [,14]    [,15]    [,16]
#> [1,] 3.974495 3.985170 3.685336 3.742536 3.720220 3.838895 3.661197 3.758993
#> [2,] 6.000139 6.057105 5.667853 5.731417 5.646776 5.683174 5.429682 5.687527
#>         [,17]    [,18]    [,19]    [,20]
#> [1,] 3.652822 3.657359 3.702788 3.663643
#> [2,] 5.394626 5.655873 5.676459 5.635945
#> 
#> $pareto_k
#>  [1] 0.15656938 0.21659000 0.13761689 0.30380728 0.15870293 0.13844575
#>  [7] 0.13090888 0.13384083 0.20959799 0.11930881 0.18298672 0.21539893
#> [13] 0.07619255 0.26787598 0.33112990 0.27326698 0.42694305 0.16502194
#> [19] 0.11863410 0.12681907
#> 

# We can get more accurate Pareto k diagnostic if we also provide
# the log_ratios argument
E_loo(yrep, psis_object, type = "mean", log_ratios = log_ratios)
#> Error in is_constant(x_i): could not find function "is_constant"
# }