5.1 Overview of data types

This is an old version, view current version.

Arguments for built-in and user-defined functions and local variables are required to be basic data types, meaning an unconstrained primitive, vector, or matrix type or an array of such.

Passing arguments to functions in Stan works just like assignment to basic types. Stan functions are only specified for the basic data types of their arguments, including array dimensionality, but not for sizes or constraints. Of course, functions often check constraints as part of their behavior.

Primitive types

Stan provides two primitive data types, real for continuous values and int for integer values.

Vector and matrix types

Stan provides three matrix-based data types, vector for column vectors, row_vector for row vectors, and matrix for matrices.

Array types

Any type (including the constrained types discussed in the next section) can be made into an array type by declaring array arguments. For example,

array[10] real x;
array[6,7] matrix[3, 3] m;

declares x to be a one-dimensional array of size 10 containing real values, and declares m to be a two-dimensional array of size \(6 \times 7\) containing values that are \(3 \times 3\) matrices.

Prior to 2.26 Stan models used a different syntax, with the array dimensions defined after the variable identifier. Equivalent declarations of x and m with the pre-2.26 syntax:

real x[10];
matrix[3, 3] m[6, 7];

Using the syntax with the array keyword is advised.

Constrained data types

Declarations of variables other than local variables may be provided with constraints. These constraints are not part of the underlying data type for a variable, but determine error checking in the transformed data, transformed parameter, and generated quantities block, and the transform from unconstrained to constrained space in the parameters block.

All of the basic data types may be given lower and upper bounds using syntax such as

int<lower = 1> N;
real<upper = 0> log_p;
vector<lower = -1, upper = 1>[3] rho;

There are also special data types for structured vectors and matrices. There are four constrained vector data types, simplex for unit simplexes, unit_vector for unit-length vectors, ordered for ordered vectors of scalars and positive_ordered for vectors of positive ordered scalars. There are specialized matrix data types corr_matrix and cov_matrix for correlation matrices (symmetric, positive definite, unit diagonal) and covariance matrices (symmetric, positive definite). The type cholesky_factor_cov is for Cholesky factors of covariance matrices (lower triangular, positive diagonal, product with own transpose is a covariance matrix). The type cholesky_factor_corr is for Cholesky factors of correlation matrices (lower triangular, positive diagonal, unit-length rows).

Constraints provide error checking for variables defined in the data, transformed data, transformed parameters, and generated quantities blocks. Constraints are critical for variables declared in the parameters block, where they determine the transformation from constrained variables (those satisfying the declared constraint) to unconstrained variables (those ranging over all of \(\mathbb{R}^n\)).

It is worth calling out the most important aspect of constrained data types:

The model must have support (non-zero density, equivalently finite log density) at parameter values that satisfy the declared constraints.

If this condition is violated with parameter values that satisfy declared constraints but do not have finite log density, then the samplers and optimizers may have any of a number of pathologies including just getting stuck, failure to initialize, excessive Metropolis rejection, or biased draws due to inability to explore the tails of the distribution.