## 7.4 Sampling statements

Stan supports writing probability statements also in sampling notation, such as

` y ~ normal(mu, sigma);`

The name “sampling statement” is meant to be suggestive, not
interpreted literally. Conceptually, the variable `y`

, which may
be an unknown parameter or known, modeled data, is being declared
to have the distribution indicated by the right-hand side of the
sampling statement.

Executing such a statement does not perform any sampling. In Stan, a sampling statement is merely a notational convenience. The above sampling statement could be expressed as a direct increment on the total log probability as

`target += normal_lpdf(y | mu, sigma);`

In general, a sampling statement of the form

` y ~ dist(theta1, ..., thetaN);`

involving subexpressions `y`

and `theta1`

through
`thetaN`

(including the case where `N`

is zero) will be well
formed if and only if the corresponding assignment statement is
well-formed. For densities allowing real `y`

values, the log
probability density function is used,

`target += dist_lpdf(y | theta1, ..., thetaN);`

For those restricted to integer `y`

values, the log probability
mass function is used,

`target += dist_lpmf(y | theta1, ..., thetaN);`

This will be well formed if and only if `dist_lpdf(y | theta1, ..., thetaN)`

or `dist_lpmf(y | theta1, ..., thetaN)`

is a
well-formed expression of type `real`

.

### Log probability increment vs. sampling statement

Although both lead to the same sampling behavior in Stan, there is one critical difference between using the sampling statement, as in

` y ~ normal(mu, sigma);`

and explicitly incrementing the log probability function, as in

`target += normal_lpdf(y | mu, sigma);`

The sampling statement drops all the terms in the log probability
function that are constant, whereas the explicit call to
`normal_lpdf`

adds all of the terms in the definition of the log
normal probability function, including all of the constant normalizing
terms. Therefore, the explicit increment form can be used to recreate
the exact log probability values for the model. Otherwise, the
sampling statement form will be faster if any of the input expressions,
`y`

, `mu`

, or `sigma`

, involve only constants, data
variables, and transformed data variables.

### User-transformed variables

The left-hand side of a sampling statement may be a complex expression. For instance, it is legal syntactically to write

```
parameters {
real<lower=0> beta;
}// ...
model {
log(beta) ~ normal(mu, sigma); }
```

Unfortunately, this is not enough to properly model `beta`

as
having a lognormal distribution. Whenever a nonlinear transform is
applied to a parameter, such as the logarithm function being applied
to `beta`

here, and then used on the left-hand side of a sampling
statement or on the left of a vertical bar in a log pdf function, an
adjustment must be made to account for the differential change in
scale and ensure `beta`

gets the correct distribution. The
correction required is to add the log Jacobian of the transform to the
target log density; see the change of variables section for full
definitions. For the case above, the following adjustment will
account for the log transform.^{7}

`target += - log(fabs(y));`

### Truncated distributions

Stan supports truncating distributions with lower bounds, upper bounds, or both.

#### Truncating with lower and upper bounds

A probability density function \(p(x)\) for a continuous distribution may be truncated to an interval \([a, b]\) to define a new density \(p_{[a, b]}(x)\) with support \([a, b]\) by setting

\[ p_{[a, b]}(x) = \frac{p(x)} {\int_a^b p(u) \, du}. \]

A probability mass function \(p(x)\) for a discrete distribution may be truncated to the closed interval \([a, b]\) by

\[ p_{[a, b]}(x) = \frac{p(x)} {\sum_{u = a}^b p(u)}. \]

#### Truncating with a lower bound

A probability density function \(p(x)\) can be truncated to \([a, \infty]\) by defining

\[ p_{[a, \infty]}(x) = \frac{p(x)} {\int_a^{\infty} p(u) \, du}. \]

A probability mass function \(p(x)\) is truncated to \([a, \infty]\) by defining

\[ p_{[a, \infty]}(x) = \frac{p(x)} {\sum_{a <= u} p(u)}. \]

#### Truncating with an upper bound

A probability density function \(p(x)\) can be truncated to \([-\infty, b]\) by defining

\[ p_{[-\infty, b]}(x) = \frac{p(x)} {\int_{-\infty}^b p(u) \, du}. \]

A probability mass function \(p(x)\) is truncated to \([-\infty, b]\) by defining

\[ p_{[-\infty,b]}(x) = \frac{p(x)} {\sum_{u <= b} p(u)}. \]

#### Cumulative distribution functions

Given a probability function \(p_X(x)\) for a random variable \(X\), its cumulative distribution function (cdf) \(F_X(x)\) is defined to be the probability that \(X \leq x\),

\[ F_X(x) = \mathrm{Pr}[X \leq x]. \]

The upper-case variable \(X\) is the random variable whereas the lower-case variable \(x\) is just an ordinary bound variable. For continuous random variables, the definition of the cdf works out to

\[ F_X(x) \ = \ \int_{-\infty}^{x} p_X(u) \, du, \]

For discrete variables, the cdf is defined to include the upper bound given by the argument,

\[ F_X(x) = \sum_{u \leq x} p_X(u). \]

#### Complementary cumulative distribution functions

The complementary cumulative distribution function (ccdf) in both the continuous and discrete cases is given by

\[ F^C_X(x) \ = \ \mathrm{Pr}[X > x] \ = \ 1 - F_X(x). \]

Unlike the cdf, the ccdf is exclusive of the bound, hence the event \(X > x\) rather than the cdf’s event \(X \leq x\).

For continuous distributions, the ccdf works out to

\[ F^C_X(x) \ = \ 1 - \int_{-\infty}^x p_X(u) \, du \ = \ \int_x^{\infty} p_X(u) \, du. \]

The lower boundary can be included in the integration bounds because it is a single point on a line and hence has no probability mass. For the discrete case, the lower bound must be excluded in the summation explicitly by summing over \(u > x\),

\[ F^C_X(x) \ = \ 1 - \sum_{u \leq x} p_X(u) \ = \ \sum_{u > x} p_X(u). \]

Cumulative distribution functions provide the necessary integral calculations to define truncated distributions. For truncation with lower and upper bounds, the denominator is defined by \[ \int_a^b p(u) \, du = F_X(b) - F_X(a). \] This allows truncated distributions to be defined as \[ p_{[a,b]}(x) = \frac{p_X(x)} {F_X(b) - F_X(a)}. \]

For discrete distributions, a slightly more complicated form is required to explicitly insert the lower truncation point, which is otherwise excluded from \(F_X(b) - F_X(a)\),

\[ p_{[a,b]}(x) = \frac{p_X(x)} {F_X(b) - F_X(a) + p_X(a)}. \]

#### Truncation with lower and upper bounds in Stan

Stan allows probability functions to be truncated. For example, a truncated unit normal distributions restricted to \([-0.5, 2.1]\) can be coded with the following sampling statement.

`0, 1) T[-0.5, 2.1]; y ~ normal(`

Truncated distributions are translated as an additional term in the accumulated log density function plus error checking to make sure the variate in the sampling statement is within the bounds of the truncation.

In general, the truncation bounds and parameters may be parameters or local variables.

Because the example above involves a continuous distribution, it behaves the same way as the following more verbose form.

```
0, 1);
y ~ normal(if (y < -0.5 || y > 2.1) {
target += negative_infinity();
else {
} target += -log_diff_exp(normal_lcdf(2.1 | 0, 1),
}0.5 | 0, 1)); normal_lcdf(-
```

Because a Stan program defines a log density function, all
calculations are on the log scale. The function `normal_lcdf`

is the log of the cumulative normal distribution function and the
function `log_diff_exp(a, b)`

is a more arithmetically stable
form of `log(exp(a) - exp(b))`

.

For a discrete distribution, another term is necessary in the denominator to account for the excluded boundary. The truncated discrete distribution

`3.7) T[2, 10]; y ~ poisson(`

behaves in the same way as the following code.

```
3.7);
y ~ poisson(if (y < 2 || y > 10) {
target += negative_infinity();
else {
} target += -log_sum_exp(poisson_lpmf(2 | 3.7),
10 | 3.7),
log_diff_exp(poisson_lcdf(2 | 3.7)));
poisson_lcdf( }
```

Recall that `log_sum_exp(a, b)`

is just the arithmetically stable form
of `log(exp(a) + exp(b))`

.

#### Truncation with lower bounds in Stan

For truncating with only a lower bound, the upper limit is left blank.

`0, 1) T[-0.5, ]; y ~ normal(`

This truncated sampling statement has the same behavior as the following code.

```
0, 1);
y ~ normal(if (y < -0.5) {
target += negative_infinity();
else {
} target += -normal_lccdf(-0.5 | 0, 1);
}
```

The `normal_lccdf`

function is the normal complementary cumulative
distribution function.

As with lower and upper truncation, the discrete case requires a more complicated denominator to add back in the probability mass for the lower bound. Thus

`3.7) T[2, ]; y ~ poisson(`

behaves the same way as

```
3.7);
y ~ poisson(if (y < 2) {
target += negative_infinity();
else {
} target += -log_sum_exp(poisson_lpmf(2 | 3.7),
2 | 3.7));
poisson_lccdf( }
```

#### Truncation with upper bounds in Stan

To truncate with only an upper bound, the lower bound is left blank. The upper truncated sampling statement

`0, 1) T[ , 2.1]; y ~ normal(`

produces the same result as the following code.

```
target += normal_lpdf(y | 0, 1);
if (y > 2.1) {
target += negative_infinity();
else {
} target += -normal_lcdf(2.1 | 0, 1);
}
```

With only an upper bound, the discrete case does not need a boundary adjustment. The upper-truncated sampling statement

`3.7) T[ , 10]; y ~ poisson(`

behaves the same way as the following code.

```
3.7);
y ~ poisson(if (y > 10) {
target += negative_infinity();
else {
} target += -poisson_lcdf(10 | 3.7);
}
```

#### Cumulative distributions must be defined

In all cases, the truncation is only well formed if the appropriate log density or mass function and necessary log cumulative distribution functions are defined. Not every distribution built into Stan has log cdf and log ccdfs defined, nor will every user-defined distribution. The discrete probability function documentations describes the available discrete and continuous cumulative distribution functions; most univariate distributions have log cdf and log ccdf functions.

#### Type constraints on bounds

For continuous distributions, truncation points must be expressions of
type `int`

or `real`

. For discrete distributions, truncation
points must be expressions of type `int`

.

#### Variates outside of truncation bounds

For a truncated sampling statement, if the value sampled is not within the bounds specified by the truncation expression, the result is zero probability and the entire statement adds \(-\infty\) to the total log probability, which in turn results in the sample being rejected.

#### Vectorizing truncated distributions

Vectorization of distribution functions with truncation
is available if the underlying distribution, `lcdf`

, and `lccdf`

functions
meet the required signatures.

The equivalent code for a vectorized truncation depends on which of the variables are non-scalars (arrays, vectors, etc.):

If the variate

`y`

is the only non-scalar, the result is the same as described in the above sections, but the`lcdf`

/`lccdf`

calculation is multiplied by`size(y)`

.If the other arguments to the distribution are non-scalars, then the vectorized version of the

`lcdf`

/`lccdf`

is used. These functions return the sum of their terms, so no multiplication by the size is needed.The exception to the above is when a non-variate is a vector and both a lower and upper bound are specified in the truncation. In this case, a for loop is generated over the elements of the non-scalar arguments. This is required since the

`log_diff_exp`

of two sums is not the same as the sum of the pairwise`log_diff_exp`

operations.

Note that while a lower-and-upper truncated distrubution may generate a for-loop internally as part of translating the truncation statement, this is still preferable to manually constructing a loop, since the distribution function itself can still be evaluated in a vectorized manner.

Because \(\log | \frac{d}{dy} \log y | = \log | 1/y | = - \log |y|\).↩︎