7.4 Sampling statements

Stan supports writing probability statements also in sampling notation, such as

y ~ normal(mu, sigma);

The name “sampling statement” is meant to be suggestive, not interpreted literally. Conceptually, the variable y, which may be an unknown parameter or known, modeled data, is being declared to have the distribution indicated by the right-hand side of the sampling statement.

Executing such a statement does not perform any sampling. In Stan, a sampling statement is merely a notational convenience. The above sampling statement could be expressed as a direct increment on the total log probability as

target += normal_lpdf(y | mu, sigma);

In general, a sampling statement of the form

y ~ dist(theta1, ..., thetaN);

involving subexpressions y and theta1 through thetaN (including the case where N is zero) will be well formed if and only if the corresponding assignment statement is well-formed. For densities allowing real y values, the log probability density function is used,

target += dist_lpdf(y | theta1, ..., thetaN);

For those restricted to integer y values, the log probability mass function is used,

target += dist_lpmf(y | theta1, ..., thetaN);

This will be well formed if and only if dist_lpdf(y | theta1, ..., thetaN) or dist_lpmf(y | theta1, ..., thetaN) is a well-formed expression of type real.

Log probability increment vs. sampling statement

Although both lead to the same sampling behavior in Stan, there is one critical difference between using the sampling statement, as in

y ~ normal(mu, sigma);

and explicitly incrementing the log probability function, as in

target += normal_lpdf(y | mu, sigma);

The sampling statement drops all the terms in the log probability function that are constant, whereas the explicit call to normal_lpdf adds all of the terms in the definition of the log normal probability function, including all of the constant normalizing terms. Therefore, the explicit increment form can be used to recreate the exact log probability values for the model. Otherwise, the sampling statement form will be faster if any of the input expressions, y, mu, or sigma, involve only constants, data variables, and transformed data variables.

User-transformed variables

The left-hand side of a sampling statement may be a complex expression. For instance, it is legal syntactically to write

parameters {
  real<lower=0> beta;
// ...
model {
  log(beta) ~ normal(mu, sigma);

Unfortunately, this is not enough to properly model beta as having a lognormal distribution. Whenever a nonlinear transform is applied to a parameter, such as the logarithm function being applied to beta here, and then used on the left-hand side of a sampling statement or on the left of a vertical bar in a log pdf function, an adjustment must be made to account for the differential change in scale and ensure beta gets the correct distribution. The correction required is to add the log Jacobian of the transform to the target log density; see the change of variables section for full definitions. For the case above, the following adjustment will account for the log transform.7

target += - log(fabs(y));

Truncated distributions

Stan supports truncating distributions with lower bounds, upper bounds, or both.

Truncating with lower and upper bounds

A probability density function \(p(x)\) for a continuous distribution may be truncated to an interval \([a, b]\) to define a new density \(p_{[a, b]}(x)\) with support \([a, b]\) by setting

\[ p_{[a, b]}(x) = \frac{p(x)} {\int_a^b p(u) \, du}. \]

A probability mass function \(p(x)\) for a discrete distribution may be truncated to the closed interval \([a, b]\) by

\[ p_{[a, b]}(x) = \frac{p(x)} {\sum_{u = a}^b p(u)}. \]

Truncating with a lower bound

A probability density function \(p(x)\) can be truncated to \([a, \infty]\) by defining

\[ p_{[a, \infty]}(x) = \frac{p(x)} {\int_a^{\infty} p(u) \, du}. \]

A probability mass function \(p(x)\) is truncated to \([a, \infty]\) by defining

\[ p_{[a, \infty]}(x) = \frac{p(x)} {\sum_{a <= u} p(u)}. \]

Truncating with an upper bound

A probability density function \(p(x)\) can be truncated to \([-\infty, b]\) by defining

\[ p_{[-\infty, b]}(x) = \frac{p(x)} {\int_{-\infty}^b p(u) \, du}. \]

A probability mass function \(p(x)\) is truncated to \([-\infty, b]\) by defining

\[ p_{[-\infty,b]}(x) = \frac{p(x)} {\sum_{u <= b} p(u)}. \]

Cumulative distribution functions

Given a probability function \(p_X(x)\) for a random variable \(X\), its cumulative distribution function (cdf) \(F_X(x)\) is defined to be the probability that \(X \leq x\),

\[ F_X(x) = \mathrm{Pr}[X \leq x]. \]

The upper-case variable \(X\) is the random variable whereas the lower-case variable \(x\) is just an ordinary bound variable. For continuous random variables, the definition of the cdf works out to

\[ F_X(x) \ = \ \int_{-\infty}^{x} p_X(u) \, du, \]

For discrete variables, the cdf is defined to include the upper bound given by the argument,

\[ F_X(x) = \sum_{u \leq x} p_X(u). \]

Complementary cumulative distribution functions

The complementary cumulative distribution function (ccdf) in both the continuous and discrete cases is given by

\[ F^C_X(x) \ = \ \mathrm{Pr}[X > x] \ = \ 1 - F_X(x). \]

Unlike the cdf, the ccdf is exclusive of the bound, hence the event \(X > x\) rather than the cdf’s event \(X \leq x\).

For continuous distributions, the ccdf works out to

\[ F^C_X(x) \ = \ 1 - \int_{-\infty}^x p_X(u) \, du \ = \ \int_x^{\infty} p_X(u) \, du. \]

The lower boundary can be included in the integration bounds because it is a single point on a line and hence has no probability mass. For the discrete case, the lower bound must be excluded in the summation explicitly by summing over \(u > x\),

\[ F^C_X(x) \ = \ 1 - \sum_{u \leq x} p_X(u) \ = \ \sum_{u > x} p_X(u). \]

Cumulative distribution functions provide the necessary integral calculations to define truncated distributions. For truncation with lower and upper bounds, the denominator is defined by \[ \int_a^b p(u) \, du = F_X(b) - F_X(a). \] This allows truncated distributions to be defined as \[ p_{[a,b]}(x) = \frac{p_X(x)} {F_X(b) - F_X(a)}. \]

For discrete distributions, a slightly more complicated form is required to explicitly insert the lower truncation point, which is otherwise excluded from \(F_X(b) - F_X(a)\),

\[ p_{[a,b]}(x) = \frac{p_X(x)} {F_X(b) - F_X(a) + p_X(a)}. \]

Truncation with lower and upper bounds in Stan

Stan allows probability functions to be truncated. For example, a truncated unit normal distributions restricted to \([-0.5, 2.1]\) can be coded with the following sampling statement.

y ~ normal(0, 1) T[-0.5, 2.1];

Truncated distributions are translated as an additional term in the accumulated log density function plus error checking to make sure the variate in the sampling statement is within the bounds of the truncation.

In general, the truncation bounds and parameters may be parameters or local variables.

Because the example above involves a continuous distribution, it behaves the same way as the following more verbose form.

y ~ normal(0, 1);
if (y < -0.5 || y > 2.1) {
  target += negative_infinity();
} else {
  target += -log_diff_exp(normal_lcdf(2.1 | 0, 1),
                          normal_lcdf(-0.5 | 0, 1));

Because a Stan program defines a log density function, all calculations are on the log scale. The function normal_lcdf is the log of the cumulative normal distribution function and the function log_diff_exp(a, b) is a more arithmetically stable form of log(exp(a) - exp(b)).

For a discrete distribution, another term is necessary in the denominator to account for the excluded boundary. The truncated discrete distribution

y ~ poisson(3.7) T[2, 10];

behaves in the same way as the following code.

y ~ poisson(3.7);
if (y < 2 || y > 10) {
  target += negative_infinity();
} else {
  target += -log_sum_exp(poisson_lpmf(2 | 3.7),
                         log_diff_exp(poisson_lcdf(10 | 3.7),
                                      poisson_lcdf(2 | 3.7)));

Recall that log_sum_exp(a, b) is just the arithmetically stable form of log(exp(a) + exp(b)).

Truncation with lower bounds in Stan

For truncating with only a lower bound, the upper limit is left blank.

y ~ normal(0, 1) T[-0.5, ];

This truncated sampling statement has the same behavior as the following code.

y ~ normal(0, 1);
if (y < -0.5) {
  target += negative_infinity();
} else {
  target += -normal_lccdf(-0.5 | 0, 1);

The normal_lccdf function is the normal complementary cumulative distribution function.

As with lower and upper truncation, the discrete case requires a more complicated denominator to add back in the probability mass for the lower bound. Thus

y ~ poisson(3.7) T[2, ];

behaves the same way as

y ~ poisson(3.7);
if (y < 2) {
  target += negative_infinity();
} else {
  target += -log_sum_exp(poisson_lpmf(2 | 3.7),
                         poisson_lccdf(2 | 3.7));

Truncation with upper bounds in Stan

To truncate with only an upper bound, the lower bound is left blank. The upper truncated sampling statement

y ~ normal(0, 1) T[ , 2.1];

produces the same result as the following code.

target += normal_lpdf(y | 0, 1);
if (y > 2.1) {
  target += negative_infinity();
} else {
  target += -normal_lcdf(2.1 | 0, 1);

With only an upper bound, the discrete case does not need a boundary adjustment. The upper-truncated sampling statement

y ~ poisson(3.7) T[ , 10];

behaves the same way as the following code.

y ~ poisson(3.7);
if (y > 10) {
  target += negative_infinity();
} else {
  target += -poisson_lcdf(10 | 3.7);

Cumulative distributions must be defined

In all cases, the truncation is only well formed if the appropriate log density or mass function and necessary log cumulative distribution functions are defined. Not every distribution built into Stan has log cdf and log ccdfs defined, nor will every user-defined distribution. The discrete probability function documentations describes the available discrete and continuous cumulative distribution functions; most univariate distributions have log cdf and log ccdf functions.

Type constraints on bounds

For continuous distributions, truncation points must be expressions of type int or real. For discrete distributions, truncation points must be expressions of type int.

Variates outside of truncation bounds

For a truncated sampling statement, if the value sampled is not within the bounds specified by the truncation expression, the result is zero probability and the entire statement adds \(-\infty\) to the total log probability, which in turn results in the sample being rejected.

Vectorizing truncated distributions

Vectorization of distribution functions with truncation is available if the underlying distribution, lcdf, and lccdf functions meet the required signatures.

The equivalent code for a vectorized truncation depends on which of the variables are non-scalars (arrays, vectors, etc.):

  1. If the variate y is the only non-scalar, the result is the same as described in the above sections, but the lcdf/lccdf calculation is multiplied by size(y).

  2. If the other arguments to the distribution are non-scalars, then the vectorized version of the lcdf/lccdf is used. These functions return the sum of their terms, so no multiplication by the size is needed.

  3. The exception to the above is when a non-variate is a vector and both a lower and upper bound are specified in the truncation. In this case, a for loop is generated over the elements of the non-scalar arguments. This is required since the log_diff_exp of two sums is not the same as the sum of the pairwise log_diff_exp operations.

Note that while a lower-and-upper truncated distrubution may generate a for-loop internally as part of translating the truncation statement, this is still preferable to manually constructing a loop, since the distribution function itself can still be evaluated in a vectorized manner.

  1. Because \(\log | \frac{d}{dy} \log y | = \log | 1/y | = - \log |y|\).↩︎