`sbc.Rd`

Check whether a model is well-calibrated with respect to the prior distribution and hence possibly amenable to obtaining a posterior distribution conditional on observed data.

sbc(stanmodel, data, M, ..., save_progress, load_incomplete=FALSE) # S3 method for sbc plot(x, thin = 3, ...) # S3 method for sbc print(x, ...)

stanmodel | An object of |
---|---|

data | A named |

M | The number of times to condition on draws from the prior predictive distribution |

... | Additional arguments that are passed to |

x | An object produced by |

thin | An integer vector of length one indicating the thinning interval when plotting, which defaults to 3 |

save_progress | If a directory is provided, stanfit objects
are saved to disk making it easy to resume a partial |

load_incomplete | When |

This function assumes adherence to the following conventions in the underlying Stan program:

Realizations of the unknown parameters are drawn in the

`transformed data`

block of the Stan program and are postfixed with an underscore, such as`theta_`

. These are considered the “true” parameters being estimated by the corresponding symbol declared in the`parameters`

block, which should have the same name except for the trailing underscore, such as`theta`

.The realizations of the unknown parameters are then conditioned on when drawing from the prior predictive distribution, also in the

`transformed data`

block. There is no restriction on the symbol name that holds the realizations from the prior predictive distribution but for clarity, it should not end with a trailing underscore.The realizations of the unknown parameters should be copied into a

`vector`

in the`generated quantities`

block named`pars_`

.The realizations from the prior predictive distribution should be copied into an object (of the same type) in the

`generated quantities`

block named`y_`

. Technically, this step is optional and could be omitted to conserve RAM, but inspecting the realizations from the prior predictive distribution is a good way to judge whether the priors are reasonable.The

`generated quantities`

block must contain an integer array named`ranks_`

whose only values are zero or one, depending on whether the realization of a parameter from the posterior distribution exceeds the corresponding “true” realization, such as`theta > theta_;`

. These are not actually "ranks" but can be used afterwards to reconstruct (thinned) ranks.The

`generated quantities`

block may contain a vector named`log_lik`

whose values are the contribution to the log-likelihood by each observation. This is optional but facilitates calculating Pareto k shape parameters to judge whether the posterior distribution is sensitive to particular observations.

Although the user can pass additional arguments to `sampling`

through the ...,
the following arguments are hard-coded and should not be passed through the ...:

`pars = "ranks_"`

because nothing else needs to be stored for each posterior draw`include = TRUE`

to ensure that`"ranks_"`

is included rather than excluded`chains = 1`

because only one chain is run for each integer less than`M`

`seed`

because a sequence of seeds is used across the`M`

runs to preserve independence across runs`save_warmup = FALSE`

because the warmup realizations are not relevant`thin = 1`

because thinning can and should be done after the Markov Chain is finished, as is done by the`thin`

argument to the`plot`

method in order to make the histograms consist of approximately independent realizations

Other arguments will take the default values used by `sampling`

unless
passed through the .... Specifying `refresh = 0`

is recommended to avoid printing
a lot of intermediate progress reports to the screen. It may be necessary to pass a
list to the `control`

argument of `sampling`

with elements `adapt_delta`

and / or `max_treedepth`

in order to obtain adequate results.

Ideally, users would want to see the absence of divergent transitions (which is shown
by the `print`

method) and other warnings, plus an approximately uniform histogram
of the ranks for each parameter (which are shown by the `plot`

method). See the
vignette for more details.

The `sbc`

function outputs a list of S3 class `"sbc"`

, which contains the
following elements:

`ranks`

A list of`M`

matrices, each with number of rows equal to the number of saved iterations and number of columns equal to the number of unknown parameters. These matrices contain the realizations of the`ranks_`

object from the`generated quantities`

block of the Stan program.`Y`

If present, a matrix of realizations from the prior predictive distribution whose rows are equal to the number of observations and whose columns are equal to`M`

, which are taken from the`y_`

object in the`generated quantities`

block of the Stan program.`pars`

A matrix of realizations from the prior distribution whose rows are equal to the number of parameters and whose columns are equal to`M`

, which are taken from the`pars_`

object in the`generated quantities`

block of the Stan program.`pareto_k`

A matrix of Pareto k shape parameter estimates or`NULL`

if there is no`log_lik`

symbol in the`generated quantities`

block of the Stan program`sampler_params`

A three-dimensional array that results from combining calls to`get_sampler_params`

for each of the`M`

runs. The resulting matrix has rows equal to the number of post-warmup iterations, columns equal to six, and`M`

floors. The columns are named`"accept_stat__"`

,`"stepsize__"`

,`"treedepth__"`

,`"n_leapfrog__"`

,`"divergent__"`

, and`"energy__"`

. The most important of which is`"divergent__"`

, which should be all zeros and perhaps`"treedepth__"`

, which should only rarely get up to the value of`max_treedepth`

passed as an element of the`control`

list to`sampling`

or otherwise defaults to \(10\).

The `print`

method outputs the number of divergent transitions and
returns `NULL`

invisibly.
The `plot`

method returns a `ggplot`

object
with histograms whose appearance can be further customized.

The Stan Development Team
*Stan Modeling Language User's Guide and Reference Manual*.
http://mc-stan.org.

Talts, S., Betancourt, M., Simpson, D., Vehtari, A., and Gelman, A. (2018). Validating Bayesian Inference Algorithms with Simulation-Based Calibration. arXiv preprint arXiv:1804.06788. https://arxiv.org/abs/1804.06788

`stan_model`

and `sampling`

scode <- " data { int<lower = 1> N; real<lower = 0> a; real<lower = 0> b; } transformed data { // these adhere to the conventions above real pi_ = beta_rng(a, b); int y = binomial_rng(N, pi_); } parameters { real<lower = 0, upper = 1> pi; } model { target += beta_lpdf(pi | a, b); target += binomial_lpmf(y | N, pi); } generated quantities { // these adhere to the conventions above int y_ = y; vector[1] pars_; int ranks_[1] = {pi > pi_}; vector[N] log_lik; pars_[1] = pi_; for (n in 1:y) log_lik[n] = bernoulli_lpmf(1 | pi); for (n in (y + 1):N) log_lik[n] = bernoulli_lpmf(0 | pi); } "