25.2 Simulation-based calibration

This is an old version, view current version.

25.2 Simulation-based calibration

Suppose the Bayesian model to test has joint density $p(y, \theta) = p(y \mid \theta) \cdot p(\theta),$ with data $y$ and parameters $\theta$ (both are typically multivariate). Simulation-based calibration works by generating $N$ simulated parameter and data pairs according to the joint density, $(y^{\textrm{sim}(1)}, \theta^{\textrm{sim}(1)}), \ldots, (y^{\textrm{sim}(N)}, \theta^{\textrm{sim}(N)}), \sim p(y, \theta).$ For each simulated data set $y^{\textrm{sim}(n)}$ , use the algorithm to be tested to generate $M$ posterior draws, which if everything is working properly, will be distributed marginally as $\theta^{(n, 1)}, \ldots, \theta^{(n, M)} \sim p(\theta \mid y^{\textrm{sim}(n)}).$ For a simulation $n$ and parameter $k$ , the rank of the simulated parameter among the posterior draws is $\begin{eqnarray*} r_{n, k} & = & \textrm{rank}(\theta_k^{\textrm{sim}(n)}, (\theta^{(n, 1)}, \ldots, \theta^{(n,M)})) \\[4pt] & = & \sum_{m = 1}^M \textrm{I}[\theta_k^{(n,m)} < \theta_k^{\textrm{sim}(n)}]. \end{eqnarray*}$ That is, the rank is the number of posterior draws $\theta^{(n,m)}_k$ that are less than the simulated draw $\theta^{\textrm{sim}(n)}_k.$

If the algorithm generates posterior draws according to the posterior, the ranks should be uniformly distributed from $0$ to $M$ , so that the ranks plus one are uniformly distributed from $1$ to $M + 1$ , $r_{n, k} + 1 \sim \textrm{categorical}\! \left(\frac{1}{M + 1}, \ldots, \frac{1}{M + 1}\right).$ Simulation-based calibration uses this expected behavior to test the calibration of each parameter of a model on simulated data. Talts et al. (2018) suggest plotting binned counts of $r_{1:N, k}$ for different parameters $k$ ; Cook, Gelman, and Rubin (2006) automate the process with a hypothesis test for uniformity.

References

Cook, Samantha R., Andrew Gelman, and Donald B Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” Journal of Computational and Graphical Statistics 15 (3): 675–92. https://doi.org/10.1198/106186006X136976.

Talts, Sean, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. 2018. “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” arXiv, no. 1804.06788.