28.1 Simulating from the posterior predictive distribution

The posterior predictive distribution is the distribution over new observations given previous observations. It’s predictive in the sense that it’s predicting behavior on new data that is not part of the training set. It’s posterior in that everything is conditioned on observed data \(y\).

The posterior predictive distribution for replications \(y^{\textrm{rep}}\) of the original data set \(y\) given model parameters \(\theta\) is defined by \[ p(y^{\textrm{rep}} \mid y) = \int p(y^{\textrm{rep}} \mid \theta) \cdot p(\theta \mid y) \, \textrm{d}\theta. \]

As with other posterior predictive quantities, generating a replicated data set \(y^{\textrm{rep}}\) from the posterior predictive distribution is straightforward using the generated quantities block. Consider a simple regression model with parameters \(\theta = (\alpha, \beta, \sigma).\)

data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
model {
  alpha ~ normal(0, 2);
  beta ~ normal(0, 1);
  sigma ~ normal(0, 1);
  y ~ normal(alpha + beta * x, sigma);
}

To generate a replicated data set y_rep for this simple model, the following generated quantities block suffices.

generated quantities {
  array[N] real y_rep = normal_rng(alpha + beta * x, sigma);
}

The vectorized form of the normal random number generator is used with the original predictors x and the model parameters alpha, beta, and sigma. The replicated data variable y_rep is declared to be the same size as the original data y, but instead of a vector type, it is declared to be an array of reals to match the return type of the function normal_rng. Because the vector and real array types have the same dimensions and layout, they can be plotted against one another and otherwise compared during downstream processing.

The posterior predictive sampling for posterior predictive checks is different from usual posterior predictive sampling discussed in the chapter on posterior predictions in that the original predictors \(x\) are used. That is, the posterior predictions are for the original data.