Custom Probability Functions
Custom distributions may also be implemented directly within Stan’s programming language. The only thing that is needed is to increment the total log probability. The rest of the chapter provides examples.
Examples
Triangle distribution
A simple example is the triangle distribution, whose density is shaped like an isosceles triangle with corners at specified bounds and height determined by the constraint that a density integrate to 1. If \(\alpha \in \mathbb{R}\) and \(\beta \in \mathbb{R}\) are the bounds, with \(\alpha < \beta\), then \(y \in (\alpha,\beta)\) has a density defined as follows. \[ \textsf{triangle}(y \mid \alpha,\beta) = \frac{2}{\beta - \alpha} \ \left( 1 - \left| y - \frac{\alpha + \beta}{\beta - \alpha} \right| \right) \]
If \(\alpha = -1\), \(\beta = 1\), and \(y \in (-1,1)\), this reduces to \[ \textsf{triangle}(y \mid -1,1) = 1 - |y|. \] Consider the following Stan implementation of \(\textsf{triangle}(-1,1)\) for sampling.
parameters {
real<lower=-1, upper=1> y;
}model {
target += log1m(abs(y));
}
The single scalar parameter y
is declared as lying in the interval (-1,1)
. The total log probability is incremented with the joint log probability of all parameters, i.e., \(\log \mathsf{Triangle}(y \mid -1,1)\). This value is coded in Stan as log1m(abs(y))
. The function log1m
is defined so that log1m(x)
has the same value as \(\log(1-x)\), but the computation is faster, more accurate, and more stable.
The constrained type real<lower=-1, upper=1>
declared for y
is critical for correct sampling behavior. If the constraint on y
is removed from the program, say by declaring y
as having the unconstrained scalar type real
, the program would compile, but it would produce arithmetic exceptions at run time when the sampler explored values of y
outside of \((-1,1)\).
Now suppose the log probability function were extended to all of \(\mathbb{R}\) as follows by defining the probability to be log(0.0)
, i.e., \(-\infty\), for values outside of \((-1,1)\).
target += log(fmax(0.0,1 - abs(y)));
With the constraint on y
in place, this is just a less efficient, slower, and less arithmetically stable version of the original program. But if the constraint on y
is removed, the model will compile and run without arithmetic errors, but will not sample properly.1
Exponential distribution
If Stan didn’t happen to include the exponential distribution, it could be coded directly using the following assignment statement, where lambda
is the inverse scale and y
the sampled variate.
target += log(lambda) - y * lambda;
This encoding will work for any lambda
and y
; they can be parameters, data, or one of each, or even local variables.
The assignment statement in the previous paragraph generates C++ code that is similar to that generated by the following distribution statement.
y ~ exponential(lambda);
There are two notable differences. First, the distribution statement will check the inputs to make sure both lambda
is positive and y
is non-negative (which includes checking that neither is the special not-a-number value).
The second difference is that if lambda
is not a parameter, transformed parameter, or local model variable, the distribution statement is clever enough to drop the log(lambda)
term. This results in the same posterior because Stan only needs the log probability up to an additive constant. If lambda
and y
are both constants, the distribution statement will drop both terms (but still check for out-of-domain errors on the inputs).
Bivariate normal cumulative distribution function
For another example of user-defined functions, consider the following definition of the bivariate normal cumulative distribution function (CDF) with location zero, unit variance, and correlation rho
. That is, it computes \[
\texttt{binormal}\mathtt{\_}\texttt{cdf}(z_1, z_2, \rho) = \Pr[Z_1 > z_1 \text{ and } Z_2 > z_2]
\] where the random 2-vector \(Z\) has the distribution \[
Z \sim \textsf{multivariate normal}\left(
\begin{bmatrix}
0 \\
0
\end{bmatrix}, \
\begin{bmatrix}
1 & \rho
\\
\rho & 1
\end{bmatrix}
\right).
\]
The following Stan program implements this function,
real binormal_cdf(real z1, real z2, real rho) {
if (z1 != 0 || z2 != 0) {
real denom = abs(rho) < 1.0 ? sqrt((1 + rho) * (1 - rho))
: not_a_number();real a1 = (z2 / z1 - rho) / denom;
real a2 = (z1 / z2 - rho) / denom;
real product = z1 * z2;
real delta = product < 0 || (product == 0 && (z1 + z2) < 0);
return 0.5 * (Phi(z1) + Phi(z2) - delta)
- owens_t(z1, a1) - owens_t(z2, a2);
}return 0.25 + asin(rho) / (2 * pi());
}
Footnotes
The problem is the (extremely!) light tails of the triangle distribution. The standard HMC and NUTS samplers can’t get into the corners of the triangle properly. Because the Stan code declares
y
to be of typereal<lower=-1, upper=1>
, the inverse logit transform is applied to the unconstrained variable and its log absolute derivative added to the log probability. The resulting distribution on the logit-transformedy
is well behaved.↩︎