27.3 SBC in Stan
Running simulation-based calibration in Stan will test whether Stan’s sampling algorithm can sample from the posterior associated with data generated according to the model. The data simulation and posterior fitting and rank calculation can all be done within a single Stan program. Then Stan’s posterior sampler has to be run multiple times. Each run produces a rank for each parameter being assessed for uniformity. The total set of ranks can then be tested for uniformity.
27.3.1 Example model
For illustration, a very simple model will suffice. Suppose there are two parameters \((\mu, \sigma)\) with independent priors, \[ \mu \sim \textrm{normal}(0, 1), \] and \[ \sigma \sim \textrm{lognormal}(0, 1). \] The data \(y = y_1, \ldots, y_N\) is drawn conditionally independently given the parameters, \[ y_n \sim \textrm{normal}(\mu, \sigma). \] The joint prior density is thus \[ p(\mu, \sigma) = \textrm{normal}(\mu \mid 0, 1) \cdot \textrm{lognormal}(\sigma \mid 0, 1), \] and the sampling density is \[ p(y \mid \mu, \sigma) = \prod_{n=1}^N \textrm{normal}(y_n \mid \mu, \sigma). \]
For example, suppose the following two parameter values are drawn from the prior in the first simulation, \[ (\mu^{\textrm{sim(1)}}, \sigma^{\textrm{sim(1)}}) = (1.01, 0.23). \] Then data \(y^{\textrm{sim}(1)} \sim p(y \mid \mu^{\textrm{sim(1)}}, \sigma^{\textrm{sim(1)}})\) is drawn according to the sampling distribution. Next, \(M = 4\) draws are taken from the posterior \(\mu^{(1,m)}, \sigma^{(1,m)} \sim p(\mu, \sigma \mid y^{\textrm{sim}(1)})\), \[ \begin{array}{r|rr} m & \mu^{(1,m)} & \sigma^{(1,m)} \\ \hline 1 & 1.07 & 0.33 \\ 2 & -0.32 & 0.14 \\ 3 & -0.99 & 0.26 \\ 4 & 1.51 & 0.31 \end{array} \] Then the comparisons on which ranks are based look as follows, \[ \begin{array}{r|cc} m & \textrm{I}(\mu^{(1,m)} < \mu^{\textrm{sim}(1)}) & \textrm{I}(\sigma^{(1,m)} < \sigma^{\textrm{sim}(1)}) \\ \hline 1 & 0 & 0 \\ 2 & 1 & 1 \\ 3 & 1 & 0 \\ 4 & 0 & 0 \end{array} \] The ranks are the column sums, \(r_{1,1} = 2\) and \(r_{1,2} = 1\). Because the simulated parameters are distributed according to the posterior, these ranks should be distributed uniformly between \(0\) and \(M\), the number of posterior draws.
27.3.2 Testing a Stan program with simulation-based calibration
To code simulation-based calibration in a Stan program, the transformed data block can be used to simulate parameters and data from the model. The parameters, transformed parameters, and model block then define the model over the simulated data. Then, in the generated quantities block, the program records an indicator for whether each parameter is less than the simulated value. As shown above, the rank is then the sum of the simulated indicator variables.
transformed data {
real mu_sim = normal_rng(0, 1);
real<lower=0> sigma_sim = lognormal_rng(0, 1);
int<lower=0> J = 10;
for (j in 1:J) {
y_sim[j] = normal_rng(mu_sim, sigma_sim);
}
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
mu ~ normal(0, 1);
sigma ~ lognormal(0, 1);
y_sim ~ normal(mu, sigma);
}
generated quantities {
array[2] int<lower=0, upper=1> lt_sim
= { mu < mu_sim, sigma < sigma_sim };
}
To avoid confusion with the number of simulated data sets used
for simulation-based calibration, J
is used for the number of
simulated data points.
The model is implemented twice—once as a data generating process using random number generators in the transformed data block, then again in the parameters and model block. This duplication is a blessing and a curse. The curse is that it’s more work and twice the chance for errors. The blessing is that by implementing the model twice and comparing results, the chance of there being a mistake in the model is reduced.
27.3.3 Pseudocode for simulation-based calibration
The entire simulation-based calibration process is as follows, where
p(theta)
is the prior densityp(y | theta)
is the sampling densityK
is the number of parametersN
is the total number of simulated data sets and fitsM
is the number of posterior draws per simulated data set
SBC(p(theta), p(y | theta), K, N, M)
------------------------------------
for (n in 1:N) {
// simulate parameters and data
theta(sim(n)) ~ p(theta)
y(sim(n)) ~ p(y | theta(sim(n)))
// posterior draws given simulated data
for (m in 1:M) {
theta(n, m) ~ p(theta | y(sim(n)))
}
// calculate rank of sim among posterior draws
for (k in 1:K) {
rank(n, k) = SUM_m I(theta[k](n,m) < theta[k](sim(n)))
}
}
// test uniformity of each parameter
for (k in 1:K) {
test uniformity of rank(1:N, k)
}
27.3.4 The importance of thinning
The draws from the posterior are assumed to be roughly independent.
If they are not, artifacts may arise in the uniformity tests due to
correlation in the posterior draws. Thus it is best to think the
posterior draws down to the point where the effective sample size is
roughly the same as the number of thinned draws. This may require
running the code a few times to judge the number of draws required to
produce a target effective sample size. This operation that can be
put into a loop that doubles the number of iterations until all
parameters have an effective sample size of M
, then thinning down to
M
draws.