27.5 Posterior prediction for regressions
27.5.1 Posterior predictive distributions for regressions
Consider a regression with a single predictor \(x_n\) for the training outcome \(y_n\) and \(\tilde{x}_n\) for the test outcome \(\tilde{y}_n.\) Without considering the parametric form of any of the distributions, the posterior predictive distribution for a general regression in \[\begin{eqnarray} p(\tilde{y} \mid \tilde{x}, y, x) & = & \int p(\tilde{y} \mid x, \theta) \cdot p(\theta \mid y, x) \, \textrm{d}\theta \\[4pt] & \approx & \frac{1}{M} \sum_{m=1}^M \, p(\tilde{y} \mid \tilde{x}, \theta^{(m)}), \end{eqnarray}\] where \(\theta^{(m)} \sim p(\theta \mid x, y).\)
27.5.2 Stan program
The following program defines a Poisson regression with a single predictor. These predictors are all coded as data, as are their sizes. Only the observed \(y\) values are coded as data. The predictive quantities \(\tilde{y}\) appear in the generated quantities block, where they are generated by random number generation.
data {
int<lower=0> N;
vector[N] x;
array[N] int<lower=0> y;
int<lower=0> N_tilde;
vector[N_tilde] x_tilde;
}parameters {
real alpha;
real beta;
}model {
y ~ poisson_log(alpha + beta * x);0, 1);
{ alpha, beta } ~ normal(
}generated quantities {
array[N_tilde] int<lower=0> y_tilde
= poisson_log_rng(alpha + beta * x_tilde); }
The Poisson distributions in both the model and generated quantities
block are coded using the log rate as a parameter (that’s
poisson_log
vs. poisson
, with the suffixes defining the scale of
the parameter). The regression coefficients, an intercept alpha
and
slope beta
, are given standard normal priors.
In the model block, the log rate for the Poisson is a linear function of the training data \(x\), whereas in the generated quantities block it is a function of the test data \(\tilde{x}\). Because the generated quantities block does not affect the posterior draws, the model fits \(\alpha\) and \(\beta\) using only the training data, reserving \(\tilde{x}\) to generate \(\tilde{y}.\)
The result from running Stan is a predictive sample \(\tilde{y}^{(1)}, \ldots \tilde{y}^{(M)}\) where each \(\tilde{y}^{(m)} \sim p(\tilde{y} \mid \tilde{x}, x, y).\)
The mean of the posterior predictive distribution is the expected value
\[\begin{align}
\mathbb{E}[\tilde{y} \mid \tilde{x}, x, y]
& = &
\int
\tilde{y}
\cdot p(\tilde{y} \mid \tilde{x}, \theta)
\cdot p(\theta \mid x, y)
\, \textrm{d}\theta
\\[4pt]
& \approx & \frac{1}{M} \sum_{m = 1}^M \tilde{y}^{(m)},
\end{align}\]
where the \(\tilde{y}^{(m)} \sim p(\tilde{y} \mid \tilde{x}, x, y)\) are
drawn from the posterior predictive distribution. Thus the posterior
mean of y_tilde[n]
after running Stan is the expected value of
\(\tilde{y}_n\) conditioned on the training data \(x, y\) and
predictor \(\tilde{x}_n.\) This is the Bayesian estimate for \(\tilde{y}\)
with minimum expected squared error. The posterior draws can also be
used to estimate quantiles for the median and any posterior intervals
of interest for \(\tilde{y}\), as well as covariance of the \(\tilde{y_n}.\)
The posterior draws \(\tilde{y}^{(m)}\) may also be used to estimate
predictive event probabilities, such as \(\mbox{Pr}[\tilde{y}_1 > 0]\) or
\(\mbox{Pr}[\prod_{n = 1}^{\tilde{N}}(\tilde{y_n}) > 1],\) as expectations of indicator
functions.
All of this can be carried out by running Stan only a single time to draw a single sample of \(M\) draws, \[ \tilde{y}^{(1)}, \ldots, \tilde{y}^{(M)} \sim p(\tilde{y} \mid \tilde{x}, x, y). \] It’s only when moving to cross-validation where multiple runs are required.