5.4 Vector and Matrix Data Types
Stan provides three types of container objects: arrays, vectors, and matrices. Vectors and matrices are more limited kinds of data structures than arrays. Vectors are intrinsically one-dimensional collections of reals, whereas matrices are intrinsically two dimensional. Vectors, matrices, and arrays are not assignable to one another, even if their dimensions are identical. A \(3 \times 4\) matrix is a different kind of object in Stan than a \(3 \times 4\) array.
The intention of using matrix types is to call out their usage in the code. There are three situations in Stan where only vectors and matrices may be used,
- matrix arithmetic operations (e.g., matrix multiplication)
- linear algebra functions (e.g., eigenvalues and determinants), and
- multivariate function parameters and outcomes (e.g., multivariate normal distribution arguments).
Vectors and matrices cannot be typed to return integer values. They
are restricted to real
values.2
Indexing from 1
Vectors and matrices, as well as arrays, are indexed starting from one in Stan. This follows the convention in statistics and linear algebra as well as their implementations in the statistical software packages R, MATLAB, BUGS, and JAGS. General computer programming languages, on the other hand, such as C++ and Python, index arrays starting from zero.
Vectors
Vectors in Stan are column vectors; see the next subsection for
information on row vectors. Vectors are declared with a size (i.e., a
dimensionality). For example, a 3-dimensional vector is declared with
the keyword vector
, as follows.
vector[3] u;
Vectors may also be declared with constraints, as in the following declaration of a 3-vector of non-negative values.
vector<lower=0>[3] u;
Unit Simplexes
A unit simplex is a vector with non-negative values whose entries sum
to 1. For instance, \([0.2,0.3,0.4,0.1]^{\top}\) is a unit 4-simplex.
Unit simplexes are most often used as parameters in categorical
or multinomial distributions, and they are also the sampled variate in
a Dirichlet distribution. Simplexes are declared with their full
dimensionality. For instance, theta
is declared to
be a unit \(5\)-simplex by
simplex[5] theta;
Unit simplexes are implemented as vectors and may be assigned to other vectors and vice-versa. Simplex variables, like other constrained variables, are validated to ensure they contain simplex values; for simplexes, this is only done up to a statically specified accuracy threshold \(\epsilon\) to account for errors arising from floating-point imprecision.
In high dimensional problems, simplexes may require smaller step sizes in the inference algorithms in order to remain stable; this can be achieved through higher target acceptance rates for samplers and longer warmup periods, tighter tolerances for optimization with more iterations, and in either case, with less dispersed parameter initialization or custom initialization if there are informative priors for some parameters.
Unit Vectors
A unit vector is a vector with a norm of one. For instance, \([0.5, 0.5, 0.5, 0.5]^{\top}\) is a unit 4-vector. Unit vectors are sometimes
used in directional statistics. Unit vectors are declared with their
full dimensionality. For instance, theta
is declared to be a unit
\(5\)-vector by
unit_vector[5] theta;
Unit vectors are implemented as vectors and may be assigned to other vectors and vice-versa. Unit vector variables, like other constrained variables, are validated to ensure that they are indeed unit length; for unit vectors, this is only done up to a statically specified accuracy threshold \(\epsilon\) to account for errors arising from floating-point imprecision.
Ordered Vectors
An ordered vector type in Stan represents a vector whose entries are sorted in ascending order. For instance, \((-1.3,2.7,2.71)^{\top}\) is an ordered 3-vector. Ordered vectors are most often employed as cut points in ordered logistic regression models (see section).
The variable c
is declared as an ordered 5-vector by
ordered[5] c;
After their declaration, ordered vectors, like unit simplexes, may be assigned to other vectors and other vectors may be assigned to them. Constraints will be checked after executing the block in which the variables were declared.
Positive, Ordered Vectors
There is also a positive, ordered vector type which operates similarly to ordered vectors, but all entries are constrained to be positive. For instance, \((2,3.7,4,12.9)\) is a positive, ordered 4-vector.
The variable d
is declared as a positive, ordered 5-vector by
positive_ordered[5] d;
Like ordered vectors, after their declaration, positive ordered vectors may be assigned to other vectors and other vectors may be assigned to them. Constraints will be checked after executing the block in which the variables were declared.
Row Vectors
Row vectors are declared with the keyword row_vector
.
Like (column) vectors, they are declared with a size. For example,
a 1093-dimensional row vector u
would be declared as
row_vector[1093] u;
Constraints are declared as for vectors, as in the following example of a 10-vector with values between -1 and 1.
row_vector<lower=-1,upper=1>[10] u;
Row vectors may not be assigned to column vectors, nor may column vectors be assigned to row vectors. If assignments are required, they may be accommodated through the transposition operator.
Matrices
Matrices are declared with the keyword matrix
along with a
number of rows and number of columns. For example,
matrix[3, 3] A;
matrix[M, N] B;
declares A
to be a \(3 \times 3\) matrix and B
to be a \(M \times N\) matrix. For the second declaration to be well formed, the
variables M
and N
must be declared as integers in either
the data or transformed data block and before the matrix declaration.
Matrices may also be declared with constraints, as in this (\(3 \times 4\)) matrix of non-positive values.
matrix<upper=0>[3, 4] B;
Assigning to Rows of a Matrix
Rows of a matrix can be assigned by indexing the left-hand side of an assignment statement. For example, this is possible.
matrix[M, N] a;
row_vector[N] b;
// ...
a[1] = b;
This copies the values from row vector b
to a[1]
, which
is the first row of the matrix a
. If the number of columns in
a
is not the same as the size of b
, a run-time error is
raised; the number of columns of a
is N
, which is also the
number of columns of b
.
Assignment works by copying values in Stan. That means any subsequent
assignment to a[1]
does not affect b
, nor does an
assignment to b
affect a
.
Covariance Matrices
Matrix variables may be constrained to represent covariance matrices. A matrix is a covariance matrix if it is symmetric and positive definite. Like correlation matrices, covariance matrices only need a single dimension in their declaration. For instance,
cov_matrix[K] Omega;
declares Omega
to be a \(K \times K\) covariance matrix, where
\(K\) is the value of the data variable K
.
Correlation Matrices
Matrix variables may be constrained to represent correlation matrices. A matrix is a correlation matrix if it is symmetric and positive definite, has entries between \(-1\) and \(1\), and has a unit diagonal. Because correlation matrices are square, only one dimension needs to be declared. For example,
corr_matrix[3] Sigma;
declares Sigma
to be a \(3 \times 3\) correlation matrix.
Correlation matrices may be assigned to other matrices, including unconstrained matrices, if their dimensions match, and vice-versa.
Cholesky Factors of Covariance Matrices
Matrix variables may be constrained to represent the Cholesky factors of a covariance matrix. This is often more convenient or more efficient than representing covariance matrices directly.
A Cholesky factor \(L\) is an \(M \times N\) lower-triangular matrix (if \(m < n\) then \(L[m, n] =0\)) with a strictly positive diagonal (\(L[k, k] > 0\)) and \(M \geq N\). If \(L\) is a Cholesky factor, then \(\Sigma = L \, L^{\top}\) is a covariance matrix (i.e., it is positive definite). The mapping between positive definite matrixes and their Cholesky factors is bijective—every covariance matrix has a unique Cholesky factorization.
The typical case of a square Cholesky factor may be declared with a single dimension,
cholesky_factor_cov[4] L;
Cholesky factors of positive semi-definite matrices
In general, two dimensions may be declared, with the above being equal to
cholesky_factor_cov[4, 4]
. The
type cholesky_factor_cov[M, N]
may be used for the general
\(M \times N\) case to produce positive semi-definite matrices of rank \(M\).
Cholesky Factors of Correlation Matrices
Matrix variables may be constrained to represent the Cholesky factors of a correlation matrix.
A Cholesky factor for a correlation matrix \(L\) is a \(K \times K\) lower-triangular matrix with positive diagonal entries and rows that are of length 1 (i.e., \(\sum_{n=1}^K L_{m,n}^2 = 1\)). If \(L\) is a Cholesky factor for a correlation matrix, then \(L\,L^{\top}\) is a correlation matrix (i.e., symmetric positive definite with a unit diagonal).
To declare the variable L
to be a K
by K
Cholesky factor of a
correlation matrix, the following code may be used.
cholesky_factor_corr[K] L;
Assigning Constrained Variables
Constrained variables of all types may be assigned to other variables
of the same unconstrained type and vice-versa. Matching is interpreted strictly
as having the same basic type and number of array dimensions.
Constraints are not considered, but basic data types are. For instance, a
variable declared to be real<lower=0,upper=1>
could be assigned
to a variable declared as real
and vice-versa. Similarly, a
variable declared as matrix[3, 3]
may be assigned to a variable
declared as cov_matrix[3]
or
cholesky_factor_cov[3]
, and vice-versa.
Checks are carried out at the end of each relevant block of statements to ensure constraints are enforced. This includes run-time size checks. The Stan compiler isn’t able to catch the fact that an attempt may be made to assign a matrix of one dimensionality to a matrix of mismatching dimensionality.
Expressions as Size Declarations
Variables may be declared with sizes given by expressions. Such expressions are constrained to only contain data or transformed data variables. This ensures that all sizes are determined once the data is read in and transformed data variables defined by their statements. For example, the following is legal.
data {
int<lower=0> N_observed; int<lower=0> N_missing;
// ...
transformed parameters {
vector[N_observed + N_missing] y;
// ...
Accessing Vector and Matrix Elements
If v
is a column vector or row vector, then v[2]
is the
second element in the vector. If m
is a matrix, then
m[2, 3]
is the value in the second row and third column.
Providing a matrix with a single index returns the specified row. For
instance, if m
is a matrix, then m[2]
is the second row.
This allows Stan blocks such as
matrix[M, N] m;
row_vector[N] v;
real x;
// ...
v = m[2];
x = v[3]; // x == m[2][3] == m[2, 3]
The type of m[2]
is row_vector
because it is the second
row of m
. Thus it is possible to write m[2][3]
instead
of m[2, 3]
to access the third element in the second row. When
given a choice, the form m[2, 3]
is preferred.
Array index style
The form m[2, 3]
is more efficient because it does not require the creation and use of an intermediate expression template for m[2]
. In later versions, explicit calls to m[2][3]
may be optimized to be as efficient as m[2, 3]
by the Stan compiler.
Size Declaration Restrictions
An integer expression is used to pick out the sizes of vectors,
matrices, and arrays. For instance, we can declare a vector of size
M + N
using
vector[M + N] y;
Any integer-denoting expression may be used for the size declaration, providing all variables involved are either data, transformed data, or local variables. That is, expressions used for size declarations may not include parameters or transformed parameters or generated quantities.
This may change if Stan is called upon to do complicated integer matrix operations or boolean matrix operations. Integers are not appropriate inputs for linear algebra functions.↩