# Conventions for Probability Functions

Functions associated with distributions are set up to follow the same naming conventions for both built-in distributions and for user-defined distributions.

## Suffix marks type of function

The suffix is determined by the type of function according to the following table.

function outcome suffix
log probability mass function discrete _lpmf
log probability density function continuous _lpdf
log cumulative distribution function any _lcdf
log complementary cumulative distribution function any _lccdf
random number generator any _rng

For example, normal_lpdf is the log of the normal probability density function (pdf) and bernoulli_lpmf is the log of the bernoulli probability mass function (pmf). The log of the corresponding cumulative distribution functions (cdf) use the same suffix, normal_lcdf and bernoulli_lcdf.

## Argument order and the vertical bar

Each probability function has a specific outcome value and a number of parameters. Following conditional probability notation, probability density and mass functions use a vertical bar to separate the outcome from the parameters of the distribution. For example, normal_lpdf(y | mu, sigma) returns the value of mathematical formula $$\log \text{Normal}(y \, | \, \mu, \sigma)$$. Cumulative distribution functions separate the outcome from the parameters in the same way (e.g., normal_lcdf(y_low | mu, sigma)

## Sampling notation

The notation

 y ~ normal(mu, sigma);

provides the same (proportional) contribution to the model log density as the explicit target density increment,

 target += normal_lpdf(y | mu, sigma);

In both cases, the effect is to add terms to the target log density. The only difference is that the example with the sampling (~) notation drops all additive constants in the log density; the constants are not necessary for any of Stan’s sampling, approximation, or optimization algorithms.

## Finite inputs

All of the distribution functions are configured to throw exceptions (effectively rejecting samples or optimization steps) when they are supplied with non-finite arguments. The two cases of non-finite arguments are the infinite values and not-a-number value—these are standard in floating-point arithmetic.

## Boundary conditions

Many distributions are defined with support or constraints on parameters forming an open interval. For example, the normal density function accepts a scale parameter $$\sigma > 0$$. If $$\sigma = 0$$, the probability function will throw an exception.

This is true even for (complementary) cumulative distribution functions, which will throw exceptions when given input that is out of the support.

## Pseudorandom number generators

For most of the probability functions, there is a matching pseudorandom number generator (PRNG) with the suffix _rng. For example, the function normal_rng(real, real) accepts two real arguments, an unconstrained location $$\mu$$ and positive scale $$\sigma > 0$$, and returns an unconstrained pseudorandom value drawn from $$\text{Normal}(\mu,\sigma)$$. There are also vectorized forms of random number generators which return more than one random variate at a time.

### Restricted to transformed data and generated quantities

Unlike regular functions, the PRNG functions may only be used in the transformed data or generated quantities blocks.

### Limited vectorization

Unlike the probability functions, only some of the PRNG functions are vectorized.

## Cumulative distribution functions

For most of the univariate probability functions, there is a corresponding cumulative distribution function, log cumulative distribution function, and log complementary cumulative distribution function.

For a univariate random variable $$Y$$ with probability function $$p_Y(y \, | \, \theta)$$, the cumulative distribution function (CDF) $$F_Y$$ is defined by $\begin{equation*} F_Y(y) \ = \ \text{Pr}[Y \le y] \ = \ \int_{-\infty}^y p(y\, | \, \theta) \ \text{d}y. \end{equation*}$ The complementary cumulative distribution function (CCDF) is defined as $\begin{equation*} \text{Pr}[Y > y] \ = \ 1 - F_Y(y). \end{equation*}$ The reason to use CCDFs instead of CDFs in floating-point arithmetic is that it is possible to represent numbers very close to 0 (the closest you can get is roughly $$10^{-300}$$), but not numbers very close to 1 (the closest you can get is roughly $$1 - 10^{-15}$$).

In Stan, there is a cumulative distribution function for each probability function. For instance, normal_cdf(y | mu, sigma) is defined by $\begin{equation*} \int_{-\infty}^y \text{Normal}(y \, | \, \mu, \sigma) \ \text{d}y. \end{equation*}$ There are also log forms of the CDF and CCDF for most univariate distributions. For example, normal_lcdf(y | mu, sigma) is defined by $\begin{equation*} \log \left( \int_{-\infty}^y \text{Normal}(y \, | \, \mu, \sigma) \ \text{d}y \right) \end{equation*}$ and normal_lccdf(y | mu, sigma) is defined by $\begin{equation*} \log \left( 1 - \int_{-\infty}^y \text{Normal}(y \, | \, \mu, \sigma) \ \text{d}y \right). \end{equation*}$

## Vectorization

Stan’s univariate log probability functions, including the log density functions, log mass functions, log CDFs, and log CCDFs, all support vectorized function application, with results defined to be the sum of the elementwise application of the function. Some of the PRNG functions support vectorization, see section vectorized PRNG functions for more details.

In all cases, matrix operations are at least as fast and usually faster than loops and vectorized log probability functions are faster than their equivalent form defined with loops. This isn’t because loops are slow in Stan, but because more efficient automatic differentiation can be used. The efficiency comes from the fact that a vectorized log probability function only introduces one new node into the expression graph, thus reducing the number of virtual function calls required to compute gradients in C++, as well as from allowing caching of repeated computations.

Stan also overloads the multivariate normal distribution, including the Cholesky-factor form, allowing arrays of row vectors or vectors for the variate and location parameter. This is a huge savings in speed because the work required to solve the linear system for the covariance matrix is only done once.

Stan also overloads some scalar functions, such as log and exp, to apply to vectors (arrays) and return vectors (arrays). These vectorizations are defined elementwise and unlike the probability functions, provide only minimal efficiency speedups over repeated application and assignment in a loop.

### Vectorized function signatures

#### Vectorized scalar arguments

The normal probability function is specified with the signature

 normal_lpdf(reals | reals, reals);

The pseudotype reals is used to indicate that an argument position may be vectorized. Argument positions declared as reals may be filled with a real, a one-dimensional array, a vector, or a row-vector. If there is more than one array or vector argument, their types can be anything but their size must match. For instance, it is legal to use normal_lpdf(row_vector | vector, real) as long as the vector and row vector have the same size.

#### Vectorized vector and row vector arguments

The multivariate normal distribution accepting vector or array of vector arguments is written as

 multi_normal_lpdf(vectors | vectors, matrix);

These arguments may be row vectors, column vectors, or arrays of row vectors or column vectors.

#### Vectorized integer arguments

The pseudotype ints is used for vectorized integer arguments. Where it appears either an integer or array of integers may be used.

### Evaluating vectorized log probability functions

The result of a vectorized log probability function is equivalent to the sum of the evaluations on each element. Any non-vector argument, namely real or int, is repeated. For instance, if y is a vector of size N, mu is a vector of size N, and sigma is a scalar, then

 ll = normal_lpdf(y | mu, sigma);

is just a more efficient way to write

 ll = 0;
for (n in 1:N) {
ll = ll + normal_lpdf(y[n] | mu[n], sigma);
}

With the same arguments, the vectorized sampling statement

 y ~ normal(mu, sigma);

has the same effect on the total log probability as

 for (n in 1:N) {
y[n] ~ normal(mu[n], sigma);
}

### Evaluating vectorized PRNG functions

Some PRNG functions accept sequences as well as scalars as arguments. Such functions are indicated by argument pseudotypes reals or ints. In cases of sequence arguments, the output will also be a sequence. For example, the following is allowed in the transformed data and generated quantities blocks.

 vector[3] mu = // ...
array[3] real x = normal_rng(mu, 3);

#### Argument types

In the case of PRNG functions, arguments marked ints may be integers or integer arrays, whereas arguments marked reals may be integers or reals, integer or real arrays, vectors, or row vectors.

pseudotype allowable PRNG arguments
ints int, array[] int
reals int, array[] int, real, array[] real, vector, row_vector

#### Dimension matching

In general, if there are multiple non-scalar arguments, they must all have the same dimensions, but need not have the same type. For example, the normal_rng function may be called with one vector argument and one real array argument as long as they have the same number of elements.

 vector[3] mu = // ...
array[3] real sigma = // ...
array[3] real x = normal_rng(mu, sigma);

#### Return type

The result of a vectorized PRNG function depends on the size of the arguments and the distribution’s support. If all arguments are scalars, then the return type is a scalar. For a continuous distribution, if there are any non-scalar arguments, the return type is a real array (array[] real) matching the size of any of the non-scalar arguments, as all non-scalar arguments must have matching size. Discrete distributions return ints and continuous distributions return reals, each of appropriate size. The symbol R denotes such a return type.