27.1 Simulating from the posterior predictive distribution
The posterior predictive distribution is the distribution over new observations given previous observations. It’s predictive in the sense that it’s predicting behavior on new data that is not part of the training set. It’s posterior in that everything is conditioned on observed data \(y\).
The posterior predictive distribution for replications \(y^{\textrm{rep}}\) of the original data set \(y\) given model parameters \(\theta\) is defined by \[ p(y^{\textrm{rep}} \mid y) = \int p(y^{\textrm{rep}} \mid \theta) \cdot p(\theta \mid y) \, \textrm{d}\theta. \]
As with other posterior predictive quantities, generating a replicated data set \(y^{\textrm{rep}}\) from the posterior predictive distribution is straightforward using the generated quantities block. Consider a simple regression model with parameters \(\theta = (\alpha, \beta, \sigma).\)
data {
int<lower = 0> N;
vector[N] x;
vector[N] y;
}
parameters {
real alpha;
real beta;
real<lower = 0> sigma;
}
model {
alpha ~ normal(0, 2);
beta ~ normal(0, 1);
sigma ~ normal(0, 1);
y ~ normal(alpha + beta * x, sigma);
}
To generate a replicated data set y_rep
for this simple model, the
following generated quantities block suffices.
generated quantities {
real y_rep[N] = normal_rng(alpha + beta * x, sigma);
}
The vectorized form of the normal random
number generator is used with the original predictors x
and the
model parameters alpha, beta
, and sigma.
The replicated data variable y_rep
is declared to be the same size
as the original data y
, but instead of a vector type, it is
declared to be an array of reals to match
the return type of the function normal_rng
.
Because the vector and real array types have the same dimensions and layout,
they can be plotted against one another and otherwise compared during
downstream processing.
The posterior predictive sampling for posterior predictive checks is different from usual posterior predictive sampling discussed in the chapter on posterior predictions in that the original predictors \(x\) are used. That is, the posterior predictions are for the original data.