26.6 Estimating event probabilities
Event probabilities involving either parameters or predictions or both may be coded in the generated quantities block. For example, to evaluate \(\textrm{Pr}[\lambda > 5 \mid y]\) in the simple Poisson example with only a rate parameter \(\lambda\), it suffices to define a generated quantity
generated quantities {
int<lower=0, upper=1> lambda_gt_5 = lambda > 5;
// ...
}
The value of the expression lambda > 5
is 1 if the condition is
true and 0 otherwise. The posterior mean of this parameter is the
event probability
\[\begin{eqnarray*}
\mbox{Pr}[\lambda > 5 \mid y]
& = &
\int \textrm{I}(\lambda > 5) \cdot p(\lambda \mid y)
\, \textrm{d}\lambda
\\[4pt]
& \approx &
\frac{1}{M} \sum_{m = 1}^M \textrm{I}[\lambda^{(m)} > 5],
\end{eqnarray*}\]
where each \(\lambda^{(m)} \sim p(\lambda \mid y)\) is distributed
according to the posterior. In Stan, this is recovered as
the posterior mean of the parameter lambda_gt_5
.
In general, event probabilities may be expressed as expectations of indicator functions. For example, \[\begin{eqnarray*} \textrm{Pr}[\lambda > 5 \mid y] & = & \mathbb{E}[\textrm{I}[\lambda > 5] \mid y] \\[4pt] & = & \int \textrm{I}(\lambda > 5) \cdot p(\lambda \mid y) \, \textrm{d}\lambda \\[4pt] & \approx & \frac{1}{M} \sum_{m = 1}^M \textrm{I}(\lambda^{(m)} > 5). \end{eqnarray*}\] The last line above is the posterior mean of the indicator function as coded in Stan.
Event probabilities involving posterior predictive quantities \(\tilde{y}\) work exactly the same way as those for parameters. For example, if \(\tilde{y}_n\) is the prediction for the \(n\)-th unobserved outcome (such as the score of a team in a game or a level of expression of a protein in a cell), then \[\begin{eqnarray*} \mbox{Pr}[\tilde{y}_3 > \tilde{y}_7 \mid \tilde{x}, x, y] & = & \mathbb{E}\!\left[I[\tilde{y}_3 > \tilde{y}_7] \mid \tilde{x}, x, y\right] \\[4pt] & = & \int \textrm{I}(\tilde{y}_3 > \tilde{y}_7) \cdot p(\tilde{y} \mid \tilde{x}, x, y) \, \textrm{d}\tilde{y} \\[4pt] & \approx & \frac{1}{M} \sum_{m = 1}^M \textrm{I}(\tilde{y}^{(m)}_3 > \tilde{y}^{(m)}_7), \end{eqnarray*}\] where \(\tilde{y}^{(m)} \sim p(\tilde{y} \mid \tilde{x}, x, y).\)