7.4 Sampling Statements
Stan supports writing probability statements also in sampling notation, such as
y ~ normal(mu,sigma);
The name “sampling statement” is meant to be suggestive, not
interpreted literally. Conceptually, the variable y
, which may
be an unknown parameter or known, modeled data, is being declared
to have the distribution indicated by the right-hand side of the
sampling statement.
Executing such a statement does not perform any sampling. In Stan, a sampling statement is merely a notational convenience. The above sampling statement could be expressed as a direct increment on the total log probability as
target += normal_lpdf(y | mu, sigma);
In general, a sampling statement of the form
y ~ dist(theta1, ..., thetaN);
involving subexpressions y
and theta1
through
thetaN
(including the case where N
is zero) will be well
formed if and only if the corresponding assignment statement is
well-formed. For densities allowing real y
values, the log
probability density function is used,
target += dist_lpdf(y | theta1, ..., thetaN);
For those restricted to integer y
values, the log probability
mass function is used,
target += dist_lpmf(y | theta1, ..., thetaN);
This will be well formed if and only if dist_lpdf(y | theta1, ..., thetaN)
or dist_lpmf(y | theta1, ..., thetaN)
is a
well-formed expression of type real
.
Log Probability Increment vs. Sampling Statement
Although both lead to the same sampling behavior in Stan, there is one critical difference between using the sampling statement, as in
y ~ normal(mu, sigma);
and explicitly incrementing the log probability function, as in
target += normal_lpdf(y | mu,sigma);
The sampling statement drops all the terms in the log probability
function that are constant, whereas the explicit call to
normal_lpdf
adds all of the terms in the definition of the log
normal probability function, including all of the constant normalizing
terms. Therefore, the explicit increment form can be used to recreate
the exact log probability values for the model. Otherwise, the
sampling statement form will be faster if any of the input expressions,
y
, mu
, or sigma
, involve only constants, data
variables, and transformed data variables.
User-Transformed Variables
The left-hand side of a sampling statement may be a complex expression. For instance, it is legal syntactically to write
parameters {
real<lower=0> beta;
}
// ...
model {
log(beta) ~ normal(mu, sigma);
}
Unfortunately, this is not enough to properly model beta
as
having a lognormal distribution. Whenever a nonlinear transform is
applied to a parameter, such as the logarithm function being applied
to beta
here, and then used on the left-hand side of a sampling
statement or on the left of a vertical bar in a log pdf function, an
adjustment must be made to account for the differential change in
scale and ensure beta
gets the correct distribution. The
correction required is to add the log Jacobian of the transform to the
target log density; see the change of variables section for full
definitions. For the case above, the following adjustment will
account for the log transform.8
target += - log(fabs(y));
Truncated Distributions
Stan supports truncating distributions with lower bounds, upper bounds, or both.
Truncating with lower and upper bounds
A probability density function \(p(x)\) for a continuous distribution may be truncated to an interval \([a, b]\) to define a new density \(p_{[a, b]}(x)\) with support \([a, b]\) by setting
\[ p_{[a, b]}(x) = \frac{p(x)} {\int_a^b p(u) \, du}. \]
A probability mass function \(p(x)\) for a discrete distribution may be truncated to the closed interval \([a, b]\) by
\[ p_{[a, b]}(x) = \frac{p(x)} {\sum_{u = a}^b p(u)}. \]
Truncating with a lower bound
A probability density function \(p(x)\) can be truncated to \([a, \infty]\) by defining
\[ p_{[a, \infty]}(x) = \frac{p(x)} {\int_a^{\infty} p(u) \, du}. \]
A probability mass function \(p(x)\) is truncated to \([a, \infty]\) by defining
\[ p_{[a, \infty]}(x) = \frac{p(x)} {\sum_{a <= u} p(u)}. \]
Truncating with an upper bound
A probability density function \(p(x)\) can be truncated to \([-\infty, b]\) by defining
\[ p_{[-\infty, b]}(x) = \frac{p(x)} {\int_{-\infty}^b p(u) \, du}. \]
A probability mass function \(p(x)\) is truncated to \([-\infty, b]\) by defining
\[ p_{[-\infty,b]}(x) = \frac{p(x)} {\sum_{u <= b} p(u)}. \]
Cumulative distribution functions
Given a probability function \(p_X(x)\) for a random variable \(X\), its cumulative distribution function (cdf) \(F_X(x)\) is defined to be the probability that \(X \leq x\),
\[ F_X(x) = \mathrm{Pr}[X \leq x]. \]
The upper-case variable \(X\) is the random variable whereas the lower-case variable \(x\) is just an ordinary bound variable. For continuous random variables, the definition of the cdf works out to
\[ F_X(x) \ = \ \int_{-\infty}^{x} p_X(u) \, du, \]
For discrete variables, the cdf is defined to include the upper bound given by the argument,
\[ F_X(x) = \sum_{u \leq x} p_X(u). \]
Complementary cumulative distribution functions
The complementary cumulative distribution function (ccdf) in both the continuous and discrete cases is given by
\[ F^C_X(x) \ = \ \mathrm{Pr}[X > x] \ = \ 1 - F_X(x). \]
Unlike the cdf, the ccdf is exclusive of the bound, hence the event \(X > x\) rather than the cdf’s event \(X \leq x\).
For continuous distributions, the ccdf works out to
\[ F^C_X(x) \ = \ 1 - \int_{-\infty}^x p_X(u) \, du \ = \ \int_x^{\infty} p_X(u) \, du. \]
The lower boundary can be included in the integration bounds because it is a single point on a line and hence has no probability mass. For the discrete case, the lower bound must be excluded in the summation explicitly by summing over \(u > x\),
\[ F^C_X(x) \ = \ 1 - \sum_{u \leq x} p_X(u) \ = \ \sum_{u > x} p_X(u). \]
Cumulative distribution functions provide the necessary integral calculations to define truncated distributions. For truncation with lower and upper bounds, the denominator is defined by \[ \int_a^b p(u) \, du = F_X(b) - F_X(a). \] This allows truncated distributions to be defined as \[ p_{[a,b]}(x) = \frac{p_X(x)} {F_X(b) - F_X(a)}. \]
For discrete distributions, a slightly more complicated form is required to explicitly insert the lower truncation point, which is otherwise excluded from \(F_X(b) - F_X(a)\),
\[ p_{[a,b]}(x) = \frac{p_X(x)} {F_X(b) - F_X(a) + p_X(a)}. \]
Truncation with lower and upper bounds in Stan
Stan allows probability functions to be truncated. For example, a truncated unit normal distributions restricted to \([-0.5, 2.1]\) can be coded with the following sampling statement.
y ~ normal(0, 1) T[-0.5, 2.1];
Truncated distributions are translated as an additional term in the accumulated log density function plus error checking to make sure the variate in the sampling statement is within the bounds of the truncation.
In general, the truncation bounds and parameters may be parameters or local variables.
Because the example above involves a continuous distribution, it behaves the same way as the following more verbose form.
y ~ normal(0, 1);
if (y < -0.5 || y > 2.1)
target += negative_infinity();
else
target += -log_diff_exp(normal_lcdf(2.1 | 0, 1),
normal_lcdf(-0.5 | 0, 1));
Because a Stan program defines a log density function, all
calculations are on the log scale. The function normal_lcdf
is the log of the cumulative normal distribution function and the
function log_diff_exp(a, b)
is a more arithmetically stable
form of log(exp(a) - exp(b))
.
For a discrete distribution, another term is necessary in the denominator to account for the excluded boundary. The truncated discrete distribution
y ~ poisson(3.7) T[2, 10];
behaves in the same way as the following code.
y ~ poisson(3.7);
if (y < 2 || y > 10)
target += negative_infinity();
else
target += -log_sum_exp(poisson_lpmf(2 | 3.7),
log_diff_exp(poisson_lcdf(10 | 3.7),
poisson_lcdf(2 | 3.7)));
Recall that log_sum_exp(a, b)
is just the arithmetically stable form
of log(exp(a) + exp(b))
.
Truncation with lower bounds in Stan
For truncating with only a lower bound, the upper limit is left blank.
y ~ normal(0, 1) T[-0.5, ];
This truncated sampling statement has the same behavior as the following code.
y ~ normal(0, 1);
if (y < -0.5)
target += negative_infinity();
else
target += -normal_lccdf(-0.5 | 0, 1);
The normal_lccdf
function is the normal complementary cumulative
distribution function.
As with lower and upper truncation, the discrete case requires a more complicated denominator to add back in the probability mass for the lower bound. Thus
y ~ poisson(3.7) T[2, ];
behaves the same way as
y ~ poisson(3.7);
if (y < 2)
target += negative_infinity();
else
target += -log_sum_exp(poisson_lpmf(2 | 3.7),
poisson_lccdf(2 | 3.7));
Truncation with upper bounds in Stan
To truncate with only an upper bound, the lower bound is left blank. The upper truncated sampling statement
y ~ normal(0, 1) T[ , 2.1];
produces the same result as the following code.
target += normal_lpdf(y | 0, 1);
if (y > 2.1)
target += negative_infinity();
else
target += -normal_lcdf(2.1 | 0, 1);
With only an upper bound, the discrete case does not need a boundary adjustment. The upper-truncated sampling statement
y ~ poisson(3.7) T[ , 10];
behaves the same way as the following code.
y ~ poisson(3.7);
if (y > 10)
target += negative_infinity();
else
target += -poisson_lcdf(10 | 3.7);
Cumulative distributions must be defined
In all cases, the truncation is only well formed if the appropriate log density or mass function and necessary log cumulative distribution functions are defined. Not every distribution built into Stan has log cdf and log ccdfs defined, nor will every user-defined distribution. The discrete probability function documentations describes the available discrete and continuous cumulative distribution functions; most univariate distributions have log cdf and log ccdf functions.
Type constraints on bounds
For continuous distributions, truncation points must be expressions of
type int
or real
. For discrete distributions, truncation
points must be expressions of type int
.
Variates outside of truncation bounds
For a truncated sampling statement, if the value sampled is not within the bounds specified by the truncation expression, the result is zero probability and the entire statement adds \(-\infty\) to the total log probability, which in turn results in the sample being rejected.
Vectorizing Truncated Distributions
Stan does not (yet) support vectorization of distribution functions with truncation.
Because \(\log | \frac{d}{dy} \log y | = \log | 1/y | = - \log |y|\).↩