10.8 Vectorization

This is an old version, view current version.

Stan’s univariate log probability functions, including the log density functions, log mass functions, log CDFs, and log CCDFs, all support vectorized function application, with results defined to be the sum of the elementwise application of the function. Some of the PRNG functions support vectorization, see section vectorized PRNG functions for more details.

In all cases, matrix operations are at least as fast and usually faster than loops and vectorized log probability functions are faster than their equivalent form defined with loops. This isn’t because loops are slow in Stan, but because more efficient automatic differentiation can be used. The efficiency comes from the fact that a vectorized log probability function only introduces one new node into the expression graph, thus reducing the number of virtual function calls required to compute gradients in C++, as well as from allowing caching of repeated computations.

Stan also overloads the multivariate normal distribution, including the Cholesky-factor form, allowing arrays of row vectors or vectors for the variate and location parameter. This is a huge savings in speed because the work required to solve the linear system for the covariance matrix is only done once.

Stan also overloads some scalar functions, such as log and exp, to apply to vectors (arrays) and return vectors (arrays). These vectorizations are defined elementwise and unlike the probability functions, provide only minimal efficiency speedups over repeated application and assignment in a loop.

10.8.1 Vectorized Function Signatures

10.8.1.1 Vectorized Scalar Arguments

The normal probability function is specified with the signature

 normal_lpdf(reals | reals, reals);

The pseudotype reals is used to indicate that an argument position may be vectorized. Argument positions declared as reals may be filled with a real, a one-dimensional array, a vector, or a row-vector. If there is more than one array or vector argument, their types can be anything but their size must match. For instance, it is legal to use normal_lpdf(row_vector | vector, real) as long as the vector and row vector have the same size.

10.8.1.2 Vectorized Vector and Row Vector Arguments

The multivariate normal distribution accepting vector or array of vector arguments is written as

 multi_normal_lpdf(vectors | vectors, matrix);

These arguments may be row vectors, column vectors, or arrays of row vectors or column vectors.

10.8.1.3 Vectorized Integer Arguments

The pseudotype ints is used for vectorized integer arguments. Where it appears either an integer or array of integers may be used.

10.8.2 Evaluating Vectorized Log Probability Functions

The result of a vectorized log probability function is equivalent to the sum of the evaluations on each element. Any non-vector argument, namely real or int, is repeated. For instance, if y is a vector of size N, mu is a vector of size N, and sigma is a scalar, then

 ll = normal_lpdf(y | mu, sigma);

is just a more efficient way to write

 ll = 0;
 for (n in 1:N)
   ll = ll + normal_lpdf(y[n] | mu[n], sigma);

With the same arguments, the vectorized sampling statement

 y ~ normal(mu, sigma);

has the same effect on the total log probability as

 for (n in 1:N)
   y[n] ~ normal(mu[n], sigma);

10.8.3 Evaluating Vectorized PRNG Functions

Some PRNG functions accept sequences as well as scalars as arguments. Such functions are indicated by argument pseudotypes reals or ints. In cases of sequence arguments, the output will also be a sequence. For example, the following is allowed in the generated quantities block.

 vector[3] mu = ...;
 real x[3] = normal_rng(mu, 3);

10.8.3.1 Argument types

In the case of PRNG functions, arguments marked ints may be integers or integer arrays, whereas arguments marked reals may be integers or reals, integer or real arrays, vectors, or row vectors.

pseudotype	allowable PRNG arguments
`ints`	`int`, `int[]`
`reals`	`int`, `int[]`, `real`, `real[]`, `vector`, `row_vector`

10.8.3.2 Dimension matching

In general, if there are multiple non-scalar arguments, they must all have the same dimensions, but need not have the same type. For example, the normal_rng function may be called with one vector argument and one real array argument as long as they have the same number of elements.

 vector[3] mu = ...;
 real sigma[3] = ...;
 real x[3] = normal_rng(mu, sigma);

10.8.3.3 Return type

The result of a vectorized PRNG function depends on the size of the arguments and the distribution’s support. If all arguments are scalars, then the return type is a scalar. For a continuous distribution, if there are any non-scalar arguments, the return type is a real array (real[]) matching the size of any of the non-scalar arguments, as all non-scalar arguments must have matching size. Discrete distributions return ints and continuous distributions return reals, each of appropriate size. The symbol R denotes such a return type.