Compute PIT values using the empirical CDF, then refine values in the tails by fitting a generalized Pareto distribution (GPD) to the tail draws. This gives smoother, more accurate PIT values in the tails where the ECDF is coarse, and avoids PIT values of 0 and 1. Due to use of generalized Pareto distribution CDF in tails, the PIT values are not anymore rank based and continuous uniformity test is appropriate.
Usage
pareto_pit(x, y, ...)
# Default S3 method
pareto_pit(x, y, weights = NULL, log = FALSE, ndraws_tail = NULL, ...)
# S3 method for class 'draws_matrix'
pareto_pit(x, y, weights = NULL, log = FALSE, ndraws_tail = NULL, ...)
# S3 method for class 'rvar'
pareto_pit(x, y, weights = NULL, log = FALSE, ndraws_tail = NULL, ...)Arguments
- x
(draws) A
draws_matrixobject or one coercible to adraws_matrixobject, or anrvarobject.- y
(observations) A 1D vector, or an array of dim(x), if x is
rvar. Each element ofycorresponds to a variable inx.- ...
Arguments passed to individual methods (if applicable).
- weights
A matrix of weights for each draw and variable.
weightsshould have one column per variable inx, andndraws(x)rows.- log
(logical) Are the weights passed already on the log scale? The default is
FALSE, that is, expectingweightsto be on the standard (non-log) scale.- ndraws_tail
(integer) Number of tail draws to use for GPD fitting. If
NULL(the default), computed usingps_tail_length().
Value
A numeric vector of length length(y) containing the PIT values, or
an array of shape dim(y), if x is an rvar.
Details
The function first computes raw PIT values identically to
pit() (including support for weighted draws). It then fits a
GPD to both tails of the draws (using the same approach as
pareto_smooth()) and replaces PIT values for observations falling in
the tail regions:
For a right-tail observation \(y_i > c_R\) (where \(c_R\) is the right-tail cutoff):
$$PIT(y_i) = 1 - p_{tail}(1 - F_{GPD}(y_i; c_R, \sigma_R, k_R))$$
For a left-tail observation \(y_i < c_L\):
$$PIT(y_i) = p_{tail}(1 - F_{GPD}(-y_i; -c_L, \sigma_L, k_L))$$
where \(p_{tail}\) is the proportion of (weighted) mass in the tail.
When (log-)weights in weights are provided, they are used for
the raw PIT computation (as in pit()) and for GPD fit.
See also
pit() for the unsmoothed version, pareto_smooth() for
Pareto smoothing of draws.
Examples
x <- example_draws()
y <- rnorm(nvariables(x), 5, 5)
pareto_pit(x, y)
#> mu tau theta[1] theta[2] theta[3] theta[4] theta[5] theta[6]
#> 0.1117828 0.1575000 0.3200000 0.9680722 0.8300000 0.8707061 0.5875000 0.1074550
#> theta[7] theta[8]
#> 0.7025000 0.4475000