This is an old version, view current version.

## 24.4 Posterior predictive simulation in Stan

Posterior predictive quantities can be coded in Stan using the generated quantities block.

### 24.4.1 Simple Poisson model

For example, consider a simple Poisson model for count data with a rate parameter $$\lambda > 0$$ following a gamma-distributed prior, $\lambda \sim \textrm{gamma}(1, 1).$ The likelihood for $$N$$ observations $$y_1, \ldots, y_N$$ is modeled as Poisson, $y_n \sim \textrm{poisson}(\lambda).$

### 24.4.2 Stan code

The following Stan program defines a variable for $$\tilde{y}$$ by random number generation in the generated quantities block.

data {
int<lower = 0> N;
int<lower = 0> y[N];
}
parameters {
real<lower = 0> lambda;
}
model {
lambda ~ gamma(1, 1);
y ~ poisson(lambda);
}
generated quantities {
int<lower = 0> y_tilde = poisson_rng(lambda);
}

The random draw from the sampling distribution for $$\tilde{y}$$ is coded using Stan’s Poisson random number generator in the generated quantities block. This accounts for the sampling component of the uncertainty; Stan’s posterior sampler will account for the estimation uncertainty, generating a new $$\tilde{y}^{(m)} \sim p(y \mid \lambda^{(m)})$$ for each posterior draw $$\lambda^{(m)} \sim p(\theta \mid y).$$

The posterior draws $$\tilde{y}^{(m)}$$ may be used to estimate the expected value of $$\tilde{y}$$ or any of its quantiles or posterior intervals, as well as event probabilities involving $$\tilde{y}$$. In general, $$\mathbb{E}[f(\tilde{y}, \theta) \mid y]$$ may be evaluated as $\mathbb{E}[f(\tilde{y}, \theta) \mid y] \approx \frac{1}{M} \sum_{m=1}^M f(\tilde{y}^{(m)}, \theta^{(m)}),$ which is just the posterior mean of $$f(\tilde{y}, \theta).$$ This quantity is computed by Stan if the value of $$f(\tilde{y}, \theta)$$ is assigned to a variable in the generated quantities block. That is, if we have

generated quantities {
real f = f(y_tilde, theta);
...

in the Stan program, then the posterior mean of f will be the expectation $$\mathbb{E}[f(\tilde{y}, \theta) \mid y]$$.

### 24.4.3 Analytic posterior and posterior predictive

The gamma distribution is the conjugate prior distribution for the Poisson distribution, so the posterior density $$p(\lambda \mid y)$$ will also follow a gamma distribution.

Because the posterior follows a gamma distribution and the sampling distribution is Poisson, the posterior predictive $$p(\tilde{y} \mid y)$$ will follow a negative binomial distribution, because the negative binomial is defined as a compound gamma-Poisson. That is, $$y \sim \textrm{negative-binomial}(\alpha, \beta)$$ if $$\lambda \sim \textrm{gamma}(\alpha, \beta)$$ and $$y \sim \textrm{poisson}(\lambda).$$ Rather than marginalizing out the rate parameter $$\lambda$$ analytically as can be done to define the negative binomial probability mass function, the rate $$\lambda^{(m)} \sim p(\lambda \mid y)$$ is sampled from the posterior and then used to generate a draw of $$\tilde{y}^{(m)} \sim p(y \mid \lambda^{(m)}).$$