Conventions for Probability Functions
Functions associated with distributions are set up to follow the same naming conventions for both built-in distributions and for user-defined distributions.
Suffix marks type of function
The suffix is determined by the type of function according to the following table.
function | outcome | suffix |
---|---|---|
log probability mass function | discrete | _lpmf |
log probability density function | continuous | _lpdf |
log cumulative distribution function | any | _lcdf |
log complementary cumulative distribution function | any | _lccdf |
random number generator | any | _rng |
For example, normal_lpdf
is the log of the normal probability density function (pdf) and bernoulli_lpmf
is the log of the bernoulli probability mass function (pmf). The log of the corresponding cumulative distribution functions (cdf) use the same suffix, normal_lcdf
and bernoulli_lcdf
.
Argument order and the vertical bar
Each probability function has a specific outcome value and a number of parameters. Following conditional probability notation, probability density and mass functions use a vertical bar to separate the outcome from the parameters of the distribution. For example, normal_lpdf(y | mu, sigma)
returns the value of mathematical formula \(\log \text{Normal}(y \, | \, \mu, \sigma)\). Cumulative distribution functions separate the outcome from the parameters in the same way (e.g., normal_lcdf(y_low | mu, sigma)
Sampling notation
The notation
y ~ normal(mu, sigma);
provides the same (proportional) contribution to the model log density as the explicit target density increment,
target += normal_lpdf(y | mu, sigma);
In both cases, the effect is to add terms to the target log density. The only difference is that the example with the sampling (~
) notation drops all additive constants in the log density; the constants are not necessary for any of Stan’s sampling, approximation, or optimization algorithms.
Finite inputs
All of the distribution functions are configured to throw exceptions (effectively rejecting samples or optimization steps) when they are supplied with non-finite arguments. The two cases of non-finite arguments are the infinite values and not-a-number value—these are standard in floating-point arithmetic.
Boundary conditions
Many distributions are defined with support or constraints on parameters forming an open interval. For example, the normal density function accepts a scale parameter \(\sigma > 0\). If \(\sigma = 0\), the probability function will throw an exception.
This is true even for (complementary) cumulative distribution functions, which will throw exceptions when given input that is out of the support.
Pseudorandom number generators
For most of the probability functions, there is a matching pseudorandom number generator (PRNG) with the suffix _rng
. For example, the function normal_rng(real, real)
accepts two real arguments, an unconstrained location \(\mu\) and positive scale \(\sigma > 0\), and returns an unconstrained pseudorandom value drawn from \(\text{Normal}(\mu,\sigma)\). There are also vectorized forms of random number generators which return more than one random variate at a time.
Restricted to transformed data and generated quantities
Unlike regular functions, the PRNG functions may only be used in the transformed data or generated quantities blocks.
Limited vectorization
Unlike the probability functions, only some of the PRNG functions are vectorized.
Cumulative distribution functions
For most of the univariate probability functions, there is a corresponding cumulative distribution function, log cumulative distribution function, and log complementary cumulative distribution function.
For a univariate random variable \(Y\) with probability function \(p_Y(y \, | \, \theta)\), the cumulative distribution function (CDF) \(F_Y\) is defined by \[\begin{equation*} F_Y(y) \ = \ \text{Pr}[Y \le y] \ = \ \int_{-\infty}^y p(y\, | \, \theta) \ \text{d}y. \end{equation*}\] The complementary cumulative distribution function (CCDF) is defined as \[\begin{equation*} \text{Pr}[Y > y] \ = \ 1 - F_Y(y). \end{equation*}\] The reason to use CCDFs instead of CDFs in floating-point arithmetic is that it is possible to represent numbers very close to 0 (the closest you can get is roughly \(10^{-300}\)), but not numbers very close to 1 (the closest you can get is roughly \(1 - 10^{-15}\)).
In Stan, there is a cumulative distribution function for each probability function. For instance, normal_cdf(y | mu, sigma)
is defined by \[\begin{equation*}
\int_{-\infty}^y \text{Normal}(y \, | \, \mu, \sigma) \ \text{d}y.
\end{equation*}\] There are also log forms of the CDF and CCDF for most univariate distributions. For example, normal_lcdf(y | mu, sigma)
is defined by \[\begin{equation*}
\log \left( \int_{-\infty}^y \text{Normal}(y \, | \, \mu, \sigma) \ \text{d}y \right)
\end{equation*}\] and normal_lccdf(y | mu, sigma)
is defined by \[\begin{equation*}
\log \left( 1 - \int_{-\infty}^y \text{Normal}(y \, | \, \mu, \sigma) \ \text{d}y \right).
\end{equation*}\]
Vectorization
Stan’s univariate log probability functions, including the log density functions, log mass functions, log CDFs, and log CCDFs, all support vectorized function application, with results defined to be the sum of the elementwise application of the function. Some of the PRNG functions support vectorization, see section vectorized PRNG functions for more details.
In all cases, matrix operations are at least as fast and usually faster than loops and vectorized log probability functions are faster than their equivalent form defined with loops. This isn’t because loops are slow in Stan, but because more efficient automatic differentiation can be used. The efficiency comes from the fact that a vectorized log probability function only introduces one new node into the expression graph, thus reducing the number of virtual function calls required to compute gradients in C++, as well as from allowing caching of repeated computations.
Stan also overloads the multivariate normal distribution, including the Cholesky-factor form, allowing arrays of row vectors or vectors for the variate and location parameter. This is a huge savings in speed because the work required to solve the linear system for the covariance matrix is only done once.
Stan also overloads some scalar functions, such as log
and exp
, to apply to vectors (arrays) and return vectors (arrays). These vectorizations are defined elementwise and unlike the probability functions, provide only minimal efficiency speedups over repeated application and assignment in a loop.
Vectorized function signatures
Vectorized scalar arguments
The normal probability function is specified with the signature
normal_lpdf(reals | reals, reals);
The pseudotype reals
is used to indicate that an argument position may be vectorized. Argument positions declared as reals
may be filled with a real, a one-dimensional array, a vector, or a row-vector. If there is more than one array or vector argument, their types can be anything but their size must match. For instance, it is legal to use normal_lpdf(row_vector | vector, real)
as long as the vector and row vector have the same size.
Vectorized vector and row vector arguments
The multivariate normal distribution accepting vector or array of vector arguments is written as
matrix); multi_normal_lpdf(vectors | vectors,
These arguments may be row vectors, column vectors, or arrays of row vectors or column vectors.
Vectorized integer arguments
The pseudotype ints
is used for vectorized integer arguments. Where it appears either an integer or array of integers may be used.
Evaluating vectorized log probability functions
The result of a vectorized log probability function is equivalent to the sum of the evaluations on each element. Any non-vector argument, namely real
or int
, is repeated. For instance, if y
is a vector of size N
, mu
is a vector of size N
, and sigma
is a scalar, then
ll = normal_lpdf(y | mu, sigma);
is just a more efficient way to write
0;
ll = for (n in 1:N) {
ll = ll + normal_lpdf(y[n] | mu[n], sigma); }
With the same arguments, the vectorized sampling statement
y ~ normal(mu, sigma);
has the same effect on the total log probability as
for (n in 1:N) {
y[n] ~ normal(mu[n], sigma); }
Evaluating vectorized PRNG functions
Some PRNG functions accept sequences as well as scalars as arguments. Such functions are indicated by argument pseudotypes reals
or ints
. In cases of sequence arguments, the output will also be a sequence. For example, the following is allowed in the transformed data and generated quantities blocks.
vector[3] mu = // ...
array[3] real x = normal_rng(mu, 3);
Argument types
In the case of PRNG functions, arguments marked ints
may be integers or integer arrays, whereas arguments marked reals
may be integers or reals, integer or real arrays, vectors, or row vectors.
pseudotype | allowable PRNG arguments |
---|---|
ints |
int , array[] int |
reals |
int , array[] int , real , array[] real , vector , row_vector |
Dimension matching
In general, if there are multiple non-scalar arguments, they must all have the same dimensions, but need not have the same type. For example, the normal_rng
function may be called with one vector argument and one real array argument as long as they have the same number of elements.
vector[3] mu = // ...
array[3] real sigma = // ...
array[3] real x = normal_rng(mu, sigma);
Return type
The result of a vectorized PRNG function depends on the size of the arguments and the distribution’s support. If all arguments are scalars, then the return type is a scalar. For a continuous distribution, if there are any non-scalar arguments, the return type is a real array (array[] real
) matching the size of any of the non-scalar arguments, as all non-scalar arguments must have matching size. Discrete distributions return ints
and continuous distributions return reals
, each of appropriate size. The symbol R
denotes such a return type.