20.1 Examples
Triangle distribution
A simple example is the triangle distribution, whose density is shaped like an isosceles triangle with corners at specified bounds and height determined by the constraint that a density integrate to 1. If \(\alpha \in \mathbb{R}\) and \(\beta \in \mathbb{R}\) are the bounds, with \(\alpha < \beta\), then \(y \in (\alpha,\beta)\) has a density defined as follows. \[ \textsf{triangle}(y \mid \alpha,\beta) = \frac{2}{\beta - \alpha} \ \left( 1 - \left| y - \frac{\alpha + \beta}{\beta - \alpha} \right| \right) \]
If \(\alpha = -1\), \(\beta = 1\), and \(y \in (-1,1)\), this reduces to \[ \textsf{triangle}(y \mid -1,1) = 1 - |y|. \] Consider the following Stan implementation of \(\textsf{triangle}(-1,1)\) for sampling.
parameters {
real<lower=-1, upper=1> y;
}
model {
target += log1m(fabs(y));
}
The single scalar parameter y
is declared as lying in the
interval (-1,1)
. The total log probability is
incremented with the joint log probability of all parameters, i.e.,
\(\log \mathsf{Triangle}(y \mid -1,1)\). This value is coded in Stan as
log1m(fabs(y))
. The function log1m
is defined so
that log1m(x)
has the same value as \(\log(1-x)\), but the
computation is faster, more accurate, and more stable.
The constrained type real<lower=-1, upper=1>
declared for y
is
critical for correct sampling behavior. If the constraint on y
is removed from the program, say by declaring y
as having the
unconstrained scalar type real
, the program would compile, but
it would produce arithmetic exceptions at run time when the sampler
explored values of y
outside of \((-1,1)\).
Now suppose the log probability function were extended to all of
\(\mathbb{R}\) as follows by defining the probability to be log(0.0)
,
i.e., \(-\infty\), for values outside of \((-1,1)\).
target += log(fmax(0.0,1 - fabs(y)));
With the constraint on y
in place, this is just a less
efficient, slower, and less arithmetically stable version of the
original program. But if the constraint on y
is removed,
the model will compile and run without arithmetic errors, but will not
sample properly.32
Exponential distribution
If Stan didn’t happen to include the exponential distribution, it
could be coded directly using the following assignment statement,
where lambda
is the inverse scale and y
the sampled
variate.
target += log(lambda) - y * lambda;
This encoding will work for any lambda
and y
; they can
be parameters, data, or one of each, or even local variables.
The assignment statement in the previous paragraph generates C++ code that is similar to that generated by the following sampling statement.
y ~ exponential(lambda);
There are two notable differences. First, the sampling statement will
check the inputs to make sure both lambda
is positive and
y
is non-negative (which includes checking that neither is the
special not-a-number value).
The second difference is that if lambda
is not a parameter,
transformed parameter, or local model variable, the sampling statement
is clever enough to drop the log(lambda)
term. This results in
the same posterior because Stan only needs the log probability up to
an additive constant. If lambda
and y
are both
constants, the sampling statement will drop both terms (but still
check for out-of-domain errors on the inputs).
Bivariate normal cumulative distribution function
For another example of user-defined functions, consider the following
definition of the bivariate normal cumulative distribution function
(CDF) with location zero, unit variance, and correlation rho
.
That is, it computes
\[
\texttt{binormal}\mathtt{\_}\texttt{cdf}(z_1, z_2, \rho) = \textsf{Pr}[Z_1 > z_1 \text{ and } Z_2 > z_2]
\]
where the random 2-vector \(Z\) has the distribution
\[
Z \sim \textsf{multivariate normal}\left(
\begin{bmatrix}
0 \\
0
\end{bmatrix}, \
\begin{bmatrix}
1 & \rho
\\
\rho & 1
\end{bmatrix}
]\right).
\]
The following Stan program implements this function,
real binormal_cdf(real z1, real z2, real rho) {
if (z1 != 0 || z2 != 0) {
real denom = fabs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho))
: not_a_number();
real a1 = (z2 / z1 - rho) / denom;
real a2 = (z1 / z2 - rho) / denom;
real product = z1 * z2;
real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
return 0.5 * (Phi(z1) + Phi(z2) - delta)
- owens_t(z1, a1) - owens_t(z2, a2);
}
return 0.25 + asin(rho) / (2 * pi());
}
The problem is the (extremely!) light tails of the triangle distribution. The standard HMC and NUTS samplers can’t get into the corners of the triangle properly. Because the Stan code declares
y
to be of typereal<lower=-1, upper=1>
, the inverse logit transform is applied to the unconstrained variable and its log absolute derivative added to the log probability. The resulting distribution on the logit-transformedy
is well behaved.↩︎