24.6 Estimating event probabilities

This is an old version, view current version.

24.6 Estimating event probabilities

Event probabilities involving either parameters or predictions or both may be coded in the generated quantities block. For example, to evaluate $\textrm{Pr}[\lambda > 5 \mid y]$ in the simple Poisson example with only a rate parameter $\lambda$ , it suffices to define a generated quantity

generated quantities {
  int<lower = 0, upper = 1> lambda_gt_5 = lambda > 5;
  ...

The value of the expression lambda > 5 is 1 if the condition is true and 0 otherwise. The posterior mean of this parameter is the event probability $\begin{eqnarray*} \mbox{Pr}[\lambda > 5 \mid y] & = & \int \textrm{I}(\lambda > 5) \cdot p(\lambda \mid y) \, \textrm{d}\lambda \\[4pt] & \approx & \frac{1}{M} \sum_{m = 1}^M \textrm{I}[\lambda^{(m)} > 5], \end{eqnarray*}$ where each $\lambda^{(m)} \sim p(\lambda \mid y)$ is distributed according to the posterior. In Stan, this is recovered as the posterior mean of the parameter lambda_gt_5.

In general, event probabilities may be expressed as expectations of indicator functions. For example, $\begin{eqnarray*} \textrm{Pr}[\lambda > 5 \mid y] & = & \mathbb{E}[\textrm{I}[\lambda > 5] \mid y] \\[4pt] & = & \int \textrm{I}(\lambda > 5) \cdot p(\lambda \mid y) \, \textrm{d}\lambda \\[4pt] & \approx & \frac{1}{M} \sum_{m = 1}^M \textrm{I}(\lambda^{(m)} > 5). \end{eqnarray*}$ The last line above is the posterior mean of the indicator function as coded in Stan.

Event probabilities involving posterior predictive quantities $\tilde{y}$ work exactly the same way as those for parameters. For example, if $\tilde{y}_n$ is the prediction for the $n$ -th unobserved outcome (such as the score of a team in a game or a level of expression of a protein in a cell), then $\begin{eqnarray*} \mbox{Pr}[\tilde{y}_3 > \tilde{y}_7 \mid \tilde{x}, x, y] & = & \mathbb{E}\!\left[I[\tilde{y}_3 > \tilde{y}_7] \mid \tilde{x}, x, y\right] \\[4pt] & = & \int \textrm{I}(\tilde{y}_3 > \tilde{y}_7) \cdot p(\tilde{y} \mid \tilde{x}, x, y) \, \textrm{d}\tilde{y} \\[4pt] & \approx & \frac{1}{M} \sum_{m = 1}^M \textrm{I}(\tilde{y}^{(m)}_3 > \tilde{y}^{(m)}_7), \end{eqnarray*}$ where $\tilde{y}^{(m)} \sim p(\tilde{y} \mid \tilde{x}, x, y).$