Compute diagnostics for Pareto smoothing the tail draws of x by replacing tail draws by order statistics of a generalized Pareto distribution fit to the tail(s).
pareto_diags(x, ...)
# S3 method for default
pareto_diags(
x,
tail = c("both", "right", "left"),
r_eff = NULL,
ndraws_tail = NULL,
verbose = FALSE,
are_log_weights = FALSE,
...
)
# S3 method for rvar
pareto_diags(x, ...)
pareto_khat_threshold(x, ...)
# S3 method for default
pareto_khat_threshold(x, ...)
# S3 method for rvar
pareto_khat_threshold(x, ...)
pareto_min_ss(x, ...)
# S3 method for default
pareto_min_ss(x, ...)
# S3 method for rvar
pareto_min_ss(x, ...)
pareto_convergence_rate(x, ...)
# S3 method for default
pareto_convergence_rate(x, ...)
# S3 method for rvar
pareto_convergence_rate(x, ...)
(multiple options) One of:
A matrix of draws for a single variable (iterations x chains). See
extract_variable_matrix()
.
An rvar
.
Arguments passed to individual methods (if applicable).
(string) The tail to diagnose/smooth:
"right"
: diagnose/smooth only the right (upper) tail
"left"
: diagnose/smooth only the left (lower) tail
"both"
: diagnose/smooth both tails and return the maximum k-hat value
The default is "both"
.
(numeric) relative effective sample size estimate. If
r_eff
is NULL, it will be calculated assuming the draws are
from MCMC. Default is NULL.
(numeric) number of draws for the tail. If
ndraws_tail
is not specified, it will be calculated as
ceiling(3 * sqrt(length(x) / r_eff)) if length(x) > 225 and
length(x) / 5 otherwise (see Appendix H in Vehtari et
al. (2024)).
(logical) Should diagnostic messages be printed? If
TRUE
, messages related to Pareto diagnostics will be
printed. Default is FALSE
.
(logical) Are the draws log weights? Default is
FALSE
. If TRUE
computation will take into account that the
draws are log weights, and only right tail will be smoothed.
List of Pareto smoothing diagnostics:
khat
: estimated Pareto k shape parameter,
min_ss
: minimum sample size for reliable Pareto smoothed estimate,
khat_threshold
: khat-threshold for reliable Pareto smoothed estimate,
convergence_rate
: Pareto smoothed estimate RMSE convergence rate.
When the fitted Generalized Pareto Distribution is used to smooth the tail values and these smoothed values are used to compute expectations, the following diagnostics can give further information about the reliability of these estimates.
min_ss
: Minimum sample size for reliable Pareto smoothed
estimate. If the actual sample size is greater than min_ss
, then
Pareto smoothed estimates can be considered reliable. If the actual
sample size is lower than min_ss
, increasing the sample size
might result in more reliable estimates. For further details, see
Section 3.2.3, Equation 11 in Vehtari et al. (2024).
khat_threshold
: Threshold below which k-hat values result in
reliable Pareto smoothed estimates. The threshold is lower for
smaller effective sample sizes. If k-hat is larger than the
threshold, increasing the total sample size may improve reliability
of estimates. For further details, see Section 3.2.4, Equation 13
in Vehtari et al. (2024).
convergence_rate
: Relative convergence rate compared to the
central limit theorem. Applicable only if the actual sample size
is sufficiently large (greater than min_ss
). The convergence
rate tells the rate at which the variance of an estimate reduces
when the sample size is increased, compared to the central limit
theorem convergence rate. See Appendix B in Vehtari et al. (2024).
Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao and Jonah Gabry (2024). Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72):1-58. PDF
pareto_khat
, pareto_min_ss
,
pareto_khat_threshold
, and pareto_convergence_rate
for
individual diagnostics; and pareto_smooth
for Pareto smoothing
draws.
Other diagnostics:
ess_basic()
,
ess_bulk()
,
ess_quantile()
,
ess_sd()
,
ess_tail()
,
mcse_mean()
,
mcse_quantile()
,
mcse_sd()
,
pareto_khat()
,
rhat()
,
rhat_basic()
,
rhat_nested()
,
rstar()
mu <- extract_variable_matrix(example_draws(), "mu")
pareto_diags(mu)
#> $khat
#> [1] 0.1979001
#>
#> $min_ss
#> [1] 17.6493
#>
#> $khat_threshold
#> [1] 0.6156891
#>
#> $convergence_rate
#> [1] 0.9858796
#>
d <- as_draws_rvars(example_draws("multi_normal"))
pareto_diags(d$Sigma)
#> $khat
#> [,1] [,2] [,3]
#> [1,] 0.05601935 0.04156719 0.05091481
#> [2,] 0.04156719 0.10157218 0.06191862
#> [3,] 0.05091481 0.06191862 -0.08123058
#>
#> $min_ss
#> [,1] [,2] [,3]
#> [1,] 11.46420 11.05020 11.31478
#> [2,] 11.05020 12.97345 11.64141
#> [3,] 11.31478 11.64141 10.00000
#>
#> $khat_threshold
#> [,1] [,2] [,3]
#> [1,] 0.6156891 0.6156891 0.6156891
#> [2,] 0.6156891 0.6156891 0.6156891
#> [3,] 0.6156891 0.6156891 0.6156891
#>
#> $convergence_rate
#> [,1] [,2] [,3]
#> [1,] 0.9981412 0.9987187 0.9983542
#> [2,] 0.9987187 0.9957205 0.9978820
#> [3,] 0.9983542 0.9978820 1.0000000
#>