Estimate Pareto k value by fitting a Generalized Pareto Distribution to one or two tails of x. This can be used to estimate the number of fractional moments that is useful for convergence diagnostics. For further details see Vehtari et al. (2024).
pareto_khat(x, ...)
# S3 method for default
pareto_khat(
x,
tail = c("both", "right", "left"),
r_eff = NULL,
ndraws_tail = NULL,
verbose = FALSE,
are_log_weights = FALSE,
...
)
# S3 method for rvar
pareto_khat(x, ...)
(multiple options) One of:
A matrix of draws for a single variable (iterations x chains). See
extract_variable_matrix()
.
An rvar
.
Arguments passed to individual methods (if applicable).
(string) The tail to diagnose/smooth:
"right"
: diagnose/smooth only the right (upper) tail
"left"
: diagnose/smooth only the left (lower) tail
"both"
: diagnose/smooth both tails and return the maximum k-hat value
The default is "both"
.
(numeric) relative effective sample size estimate. If
r_eff
is NULL, it will be calculated assuming the draws are
from MCMC. Default is NULL.
(numeric) number of draws for the tail. If
ndraws_tail
is not specified, it will be calculated as
ceiling(3 * sqrt(length(x) / r_eff)) if length(x) > 225 and
length(x) / 5 otherwise (see Appendix H in Vehtari et
al. (2024)).
(logical) Should diagnostic messages be printed? If
TRUE
, messages related to Pareto diagnostics will be
printed. Default is FALSE
.
(logical) Are the draws log weights? Default is
FALSE
. If TRUE
computation will take into account that the
draws are log weights, and only right tail will be smoothed.
If the input is an array, returns a single numeric value. If any of the draws
is non-finite, that is, NA
, NaN
, Inf
, or -Inf
, the returned output
will be (numeric) NA
. Also, if all draws within any of the chains of a
variable are the same (constant), the returned output will be (numeric) NA
as well. The reason for the latter is that, for constant draws, we cannot distinguish between variables that are supposed to be constant (e.g., a diagonal element of a correlation matrix is always 1) or variables that just happened to be constant because of a failure of convergence or other problems in the sampling process.
If the input is an rvar
, returns an array of the same dimensions as the
rvar
, where each element is equal to the value that would be returned by
passing the draws array for that element of the rvar
to this function.
Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao and Jonah Gabry (2024). Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
pareto_diags
for additional related diagnostics, and
pareto_smooth
for Pareto smoothed draws.
Other diagnostics:
ess_basic()
,
ess_bulk()
,
ess_quantile()
,
ess_sd()
,
ess_tail()
,
mcse_mean()
,
mcse_quantile()
,
mcse_sd()
,
pareto_diags()
,
rhat()
,
rhat_basic()
,
rhat_nested()
,
rstar()
mu <- extract_variable_matrix(example_draws(), "mu")
pareto_khat(mu)
#> [1] 0.1979001
d <- as_draws_rvars(example_draws("multi_normal"))
pareto_khat(d$Sigma)
#> [,1] [,2] [,3]
#> [1,] 0.05601935 0.04156719 0.05091481
#> [2,] 0.04156719 0.10157218 0.06191862
#> [3,] 0.05091481 0.06191862 -0.08123058