26.1 Posterior predictive distribution

Given a full Bayesian model \(p(y, \theta)\), the posterior predictive density for new data \(\tilde{y}\) given observed data \(y\) is \[ p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \theta) \cdot p(\theta \mid y) \, \textrm{d}\theta. \] The product under the integral reduces to the joint posterior density \(p(\tilde{y}, \theta \mid y),\) so that the integral is simply marginalizing out the parameters \(\theta,\) leaving the predictive density \(p(\tilde{y} \mid y)\) of future observations given past observations.