This is an old version, view current version.

28.2 Simulation-based calibration

Suppose the Bayesian model to test has joint density p(y,θ)=p(yθ)p(θ), with data y and parameters θ (both are typically multivariate). Simulation-based calibration works by generating N simulated parameter and data pairs according to the joint density, (ysim(1),θsim(1)),,(ysim(N),θsim(N)),p(y,θ). For each simulated data set ysim(n), use the algorithm to be tested to generate M posterior draws, which if everything is working properly, will be distributed marginally as θ(n,1),,θ(n,M)p(θysim(n)). For a simulation n and parameter k, the rank of the simulated parameter among the posterior draws is rn,k=rank(θsim(n)k,(θ(n,1),,θ(n,M)))=Mm=1I[θ(n,m)k<θsim(n)k]. That is, the rank is the number of posterior draws θ(n,m)k that are less than the simulated draw θsim(n)k.

If the algorithm generates posterior draws according to the posterior, the ranks should be uniformly distributed from 0 to M, so that the ranks plus one are uniformly distributed from 1 to M+1, rn,k+1categorical(1M+1,,1M+1). Simulation-based calibration uses this expected behavior to test the calibration of each parameter of a model on simulated data. Talts et al. (2018) suggest plotting binned counts of r1:N,k for different parameters k; Cook, Gelman, and Rubin (2006) automate the process with a hypothesis test for uniformity.

References

Cook, Samantha R., Andrew Gelman, and Donald B Rubin. 2006. “Validation of Software for Bayesian Models Using Posterior Quantiles.” Journal of Computational and Graphical Statistics 15 (3): 675–92. https://doi.org/10.1198/106186006X136976.
Talts, Sean, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. 2018. “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” arXiv, no. 1804.06788.