Pareto smoothing diagnostics

Compute diagnostics for Pareto smoothing the tail draws of x by replacing tail draws by order statistics of a generalized Pareto distribution fit to the tail(s).

pareto_diags(x, ...)

# S3 method for default
pareto_diags(
  x,
  tail = c("both", "right", "left"),
  r_eff = NULL,
  ndraws_tail = NULL,
  verbose = FALSE,
  are_log_weights = FALSE,
  ...
)

# S3 method for rvar
pareto_diags(x, ...)

pareto_khat_threshold(x, ...)

# S3 method for default
pareto_khat_threshold(x, ...)

# S3 method for rvar
pareto_khat_threshold(x, ...)

pareto_min_ss(x, ...)

# S3 method for default
pareto_min_ss(x, ...)

# S3 method for rvar
pareto_min_ss(x, ...)

pareto_convergence_rate(x, ...)

# S3 method for default
pareto_convergence_rate(x, ...)

# S3 method for rvar
pareto_convergence_rate(x, ...)

Arguments

x

(multiple options) One of:

A matrix of draws for a single variable (iterations x chains). See extract_variable_matrix().
An rvar.

...

Arguments passed to individual methods (if applicable).

tail

(string) The tail to diagnose/smooth:

"right": diagnose/smooth only the right (upper) tail
"left": diagnose/smooth only the left (lower) tail
"both": diagnose/smooth both tails and return the maximum k-hat value

The default is "both".

r_eff

(numeric) relative effective sample size estimate. If r_eff is NULL, it will be calculated assuming the draws are from MCMC. Default is NULL.

ndraws_tail

(numeric) number of draws for the tail. If ndraws_tail is not specified, it will be calculated as ceiling(3 * sqrt(length(x) / r_eff)) if length(x) > 225 and length(x) / 5 otherwise (see Appendix H in Vehtari et al. (2024)).

verbose

(logical) Should diagnostic messages be printed? If TRUE, messages related to Pareto diagnostics will be printed. Default is FALSE.

are_log_weights

(logical) Are the draws log weights? Default is FALSE. If TRUE computation will take into account that the draws are log weights, and only right tail will be smoothed.

Value

List of Pareto smoothing diagnostics:

khat: estimated Pareto k shape parameter,
min_ss: minimum sample size for reliable Pareto smoothed estimate,
khat_threshold: khat-threshold for reliable Pareto smoothed estimate,
convergence_rate: Pareto smoothed estimate RMSE convergence rate.

Details

When the fitted Generalized Pareto Distribution is used to smooth the tail values and these smoothed values are used to compute expectations, the following diagnostics can give further information about the reliability of these estimates.

min_ss: Minimum sample size for reliable Pareto smoothed estimate. If the actual sample size is greater than min_ss, then Pareto smoothed estimates can be considered reliable. If the actual sample size is lower than min_ss, increasing the sample size might result in more reliable estimates. For further details, see Section 3.2.3, Equation 11 in Vehtari et al. (2024).
khat_threshold: Threshold below which k-hat values result in reliable Pareto smoothed estimates. The threshold is lower for smaller effective sample sizes. If k-hat is larger than the threshold, increasing the total sample size may improve reliability of estimates. For further details, see Section 3.2.4, Equation 13 in Vehtari et al. (2024).
convergence_rate: Relative convergence rate compared to the central limit theorem. Applicable only if the actual sample size is sufficiently large (greater than min_ss). The convergence rate tells the rate at which the variance of an estimate reduces when the sample size is increased, compared to the central limit theorem convergence rate. See Appendix B in Vehtari et al. (2024).

References

Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao and Jonah Gabry (2024). Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72):1-58. PDF

Examples

mu <- extract_variable_matrix(example_draws(), "mu")
pareto_diags(mu)
#> $khat
#> [1] 0.1979001
#> 
#> $min_ss
#> [1] 17.6493
#> 
#> $khat_threshold
#> [1] 0.6156891
#> 
#> $convergence_rate
#> [1] 0.9858796
#> 

d <- as_draws_rvars(example_draws("multi_normal"))
pareto_diags(d$Sigma)
#> $khat
#>            [,1]       [,2]        [,3]
#> [1,] 0.05601935 0.04156719  0.05091481
#> [2,] 0.04156719 0.10157218  0.06191862
#> [3,] 0.05091481 0.06191862 -0.08123058
#> 
#> $min_ss
#>          [,1]     [,2]     [,3]
#> [1,] 11.46420 11.05020 11.31478
#> [2,] 11.05020 12.97345 11.64141
#> [3,] 11.31478 11.64141 10.00000
#> 
#> $khat_threshold
#>           [,1]      [,2]      [,3]
#> [1,] 0.6156891 0.6156891 0.6156891
#> [2,] 0.6156891 0.6156891 0.6156891
#> [3,] 0.6156891 0.6156891 0.6156891
#> 
#> $convergence_rate
#>           [,1]      [,2]      [,3]
#> [1,] 0.9981412 0.9987187 0.9983542
#> [2,] 0.9987187 0.9957205 0.9978820
#> [3,] 0.9983542 0.9978820 1.0000000
#>

Arguments

Value

Details

References

See also

Examples