5.1 Overview of data types
Arguments for built-in and user-defined functions and local variables are required to be basic data types, meaning an unconstrained primitive, vector, or matrix type or an array of such.
Passing arguments to functions in Stan works just like assignment to basic types. Stan functions are only specified for the basic data types of their arguments, including array dimensionality, but not for sizes or constraints. Of course, functions often check constraints as part of their behavior.
Primitive types
Stan provides two primitive data types, real
for continuous
values and int
for integer values.
Vector and matrix types
Stan provides three matrix-based data types, vector
for column
vectors, row_vector
for row vectors, and matrix
for
matrices.
Array types
Any type (including the constrained types discussed in the next section) can be made into an array type by declaring array arguments. For example,
array[10] real x;
array[6,7] matrix[3, 3] m;
declares x
to be a one-dimensional array of size 10 containing
real values, and declares m
to be a two-dimensional array of
size \(6 \times 7\) containing values that are \(3 \times 3\) matrices.
Prior to 2.26 Stan models used a different syntax, with the array
dimensions defined after the variable identifier.
Equivalent declarations of x
and m
with the pre-2.26 syntax:
real x[10];
matrix[3, 3] m[6, 7];
Using the syntax with the array
keyword is advised.
Constrained data types
Declarations of variables other than local variables may be provided with constraints. These constraints are not part of the underlying data type for a variable, but determine error checking in the transformed data, transformed parameter, and generated quantities block, and the transform from unconstrained to constrained space in the parameters block.
All of the basic data types may be given lower and upper bounds using syntax such as
int<lower = 1> N;
real<upper = 0> log_p;
vector<lower = -1, upper = 1>[3] rho;
There are also special data types for structured vectors and
matrices. There are four constrained vector data types, simplex
for unit simplexes, unit_vector
for unit-length vectors,
ordered
for ordered vectors of scalars and
positive_ordered
for vectors of positive ordered
scalars. There are specialized matrix data types corr_matrix
and cov_matrix
for correlation matrices (symmetric, positive
definite, unit diagonal) and covariance matrices (symmetric, positive
definite). The type cholesky_factor_cov
is for Cholesky
factors of covariance matrices (lower triangular, positive diagonal,
product with own transpose is a covariance matrix). The type
cholesky_factor_corr
is for Cholesky factors of correlation
matrices (lower triangular, positive diagonal, unit-length rows).
Constraints provide error checking for variables defined in the
data
, transformed data
, transformed parameters
,
and generated quantities
blocks. Constraints are critical for
variables declared in the parameters
block, where they
determine the transformation from constrained variables (those
satisfying the declared constraint) to unconstrained variables (those
ranging over all of \(\mathbb{R}^n\)).
It is worth calling out the most important aspect of constrained data types:
The model must have support (non-zero density, equivalently finite log density) at parameter values that satisfy the declared constraints.
If this condition is violated with parameter values that satisfy declared constraints but do not have finite log density, then the samplers and optimizers may have any of a number of pathologies including just getting stuck, failure to initialize, excessive Metropolis rejection, or biased draws due to inability to explore the tails of the distribution.