Estimate Pareto k value by fitting a Generalized Pareto Distribution to one or two tails of x. This can be used to estimate the number of fractional moments that is useful for convergence diagnostics. For further details see Vehtari et al. (2024).

pareto_khat(x, ...)

# S3 method for default
pareto_khat(
  x,
  tail = c("both", "right", "left"),
  r_eff = NULL,
  ndraws_tail = NULL,
  verbose = FALSE,
  are_log_weights = FALSE,
  ...
)

# S3 method for rvar
pareto_khat(x, ...)

Arguments

x

(multiple options) One of:

...

Arguments passed to individual methods (if applicable).

tail

(string) The tail to diagnose/smooth:

  • "right": diagnose/smooth only the right (upper) tail

  • "left": diagnose/smooth only the left (lower) tail

  • "both": diagnose/smooth both tails and return the maximum k-hat value

The default is "both".

r_eff

(numeric) relative effective sample size estimate. If r_eff is NULL, it will be calculated assuming the draws are from MCMC. Default is NULL.

ndraws_tail

(numeric) number of draws for the tail. If ndraws_tail is not specified, it will be calculated as ceiling(3 * sqrt(length(x) / r_eff)) if length(x) > 225 and length(x) / 5 otherwise (see Appendix H in Vehtari et al. (2024)).

verbose

(logical) Should diagnostic messages be printed? If TRUE, messages related to Pareto diagnostics will be printed. Default is FALSE.

are_log_weights

(logical) Are the draws log weights? Default is FALSE. If TRUE computation will take into account that the draws are log weights, and only right tail will be smoothed.

Value

If the input is an array, returns a single numeric value. If any of the draws is non-finite, that is, NA, NaN, Inf, or -Inf, the returned output will be (numeric) NA. Also, if all draws within any of the chains of a variable are the same (constant), the returned output will be (numeric) NA

as well. The reason for the latter is that, for constant draws, we cannot distinguish between variables that are supposed to be constant (e.g., a diagonal element of a correlation matrix is always 1) or variables that just happened to be constant because of a failure of convergence or other problems in the sampling process.

If the input is an rvar, returns an array of the same dimensions as the rvar, where each element is equal to the value that would be returned by passing the draws array for that element of the rvar to this function.

References

Aki Vehtari, Daniel Simpson, Andrew Gelman, Yuling Yao and Jonah Gabry (2024). Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72):1-58. PDF

See also

pareto_diags for additional related diagnostics, and pareto_smooth for Pareto smoothed draws.

Other diagnostics: ess_basic(), ess_bulk(), ess_quantile(), ess_sd(), ess_tail(), mcse_mean(), mcse_quantile(), mcse_sd(), pareto_diags(), rhat(), rhat_basic(), rhat_nested(), rstar()

Examples

mu <- extract_variable_matrix(example_draws(), "mu")
pareto_khat(mu)
#> [1] 0.1979001

d <- as_draws_rvars(example_draws("multi_normal"))
pareto_khat(d$Sigma)
#>            [,1]       [,2]        [,3]
#> [1,] 0.05601935 0.04156719  0.05091481
#> [2,] 0.04156719 0.10157218  0.06191862
#> [3,] 0.05091481 0.06191862 -0.08123058