Matrices, Vectors, Arrays, and Tuples
This chapter provides pointers as to how to choose among the various container types (matrix, vector, array, and tuple) provided by Stan.
Basic motivation
Stan provides three basic scalar types, int
, real
, and complex
, as well as three basic linear algebra types, vector
, row_vector
, and matrix
. Stan allows arrays of any dimensionality, containing any type of element (though that type must be declared and must be the same for all elements).
This leaves us in the awkward situation of having three one-dimensional containers, as exemplified by the following declarations.
array[N] real a;
vector[N] a;
row_vector[N] a;
These distinctions matter. Matrix types, like vector and row vector, are required for linear algebra operations. There is no automatic promotion of arrays to vectors because the target, row vector or column vector, is ambiguous. Similarly, row vectors are separated from column vectors because multiplying a row vector by a column vector produces a scalar, whereas multiplying in the opposite order produces a matrix.
The following code fragment shows all four ways to declare a two-dimensional container of size \(M \times N\).
array[M, N] real b; // b[m] : array[] real (efficient)
array[M] vector[N] b; // b[m] : vector (efficient)
array[M] row_vector[N] b; // b[m] : row_vector (efficient)
matrix[M, N] b; // b[m] : row_vector (inefficient)
The main differences among these choices involve efficiency for various purposes and the type of b[m]
, which is shown in comments to the right of the declarations. Thus the only way to efficiently iterate over row vectors is to use the third declaration, but if you need linear algebra on matrices, but the only way to use matrix operations is to use the fourth declaration.
The inefficiencies due to any manual reshaping of containers is usually slight compared to what else is going on in a Stan program (typically a lot of gradient calculations).
Tuple types
Arrays may contain entries of any type, but the types must be the same for all entries. Matrices and vectors contain either real numbers or complex numbers, but all the contained types are the same (e.g., if a vector has a single complex
typed entry, all the entries are complex
).
With arrays or vectors, we can represent pairs of real numbers or pairs of complex numbers. For example, a complex_vector[3]
holds exactly three complex numbers. With arrays and vectors, there is no way to represent a pair consisting of an integer and a real number.
Tuples provide a way to represent a sequence of values of heterogeneous types. For example, tuple(int, real)
is the type of a pair consisting of an integer and a real number and tuple(array[5] int, vector[6])
is the type of pairs where the first element is a five-element array of integers, the second entry is an integer, and the third is a six-element vector.
Tuple syntax
Tuples are declared using the keyword tuple
followed by a sequence of type declarations in parentheses. Tuples are constructed using only parentheses. The following example illustrations both declaration and construction.
tuple(int, vector[3]) ny = (5, [3, 2.9, 1.8]');
The elements of a tuple are accessed by position, starting from 1. For example, we can extract the elements of the tuple above using
int n = ny.1;
vector[3] y = ny.2;
We can also assign into the elements of a tuple.
tuple(int, vector[3], complex) abc;
.1 = 5;
abc.2[1] = 3;
abc.2[2] = 2.9;
abc.2[3] = 1.4798;
abc.3 = 2 + 1.9j; abc
As the cascaded indexing example shows, the result of abc.1
is an lvalue (i.e., something to which values may be assigned), and we can further index into it to create new lvalues (e.g., abc.2[1]
pulls out the first element of the vector value of the second element of the tuple.)
There are two efficiency considerations for tuples. First, like the other container types, tuples are passed to functions by constant reference, which means only a pointer gets passed rather than copying the data. Second, like the array types, creating a tuple requires copying the data for all of its elements. For example, in the following code, the matrix is copied, entailing 1000 copies of scalar values.
int a = 5;
matrix[10, 100] b = ...;
tuple(int, matrix[10, 100]) ab = (a, b); // COPIES b
1,1] = 10.3; // does NOT change ab b[
Applications of tuples
Tuples are primarily useful for two things. First, they provide a way to encapsulate a group of heterogeneous items so that they may be passed as a group. This lets us define arrays of structures as well as structures of arrays. For example, array[N] tuple(int, real, vector[5])
is an array of tuples, each of which has an integer, real, and vector component. Alternatively, we can represent the same information using a tuple of parallel arrays as tuple(array[N] int, array[N] real, array[N] vector[5])
.
The second use is for function return values. Here, if a function computes two different things with different types, and the computation shares work, it’s best to write one function that returns both things. For example, an eigendecomposition returns a pair consisting of a vector of eigenvalues and a matrix of eigenvectors, whereas a singular value decomposition returns three matrices of different shapes. Before introducing tuples in version 2.33, the QR decomposition of matrix \(A = Q \cdot R\), where \(Q\) is orthonormal and \(R\) is upper triangular. In the past, this required two function calls.
matrix[M, N] A = ...;
matrix[M, M] Q = qr_Q(A);
matrix[M, N] R = qr_R(A);
With tuples, this can be simplified to the following,
tuple(matrix[M, M], matrix[M, N]) QR = qr(A);
with QR.1
being Q
and QR.2
giving R
.
Fixed sizes and indexing out of bounds
Stan’s matrices, vectors, and array variables are sized when they are declared and may not be dynamically resized. Function arguments do not have sizes, but these sizes are fixed when the function is called and the container is instantiated. Also, declarations may be inside loops and thus may change over the course of running a program, but each time a declaration is visited, it declares a fixed size object.
When an index is provided that is out of bounds, Stan throws a rejection error and computation on the current log density and gradient evaluation is halted and the algorithm is left to clean up the error. All of Stan’s containers check the sizes of all indexes.
Data type and indexing efficiency
The underlying matrix and linear algebra operations are implemented in terms of data types from the Eigen C++ library. By having vectors and matrices as basic types, no conversion is necessary when invoking matrix operations or calling linear algebra functions.
Arrays, on the other hand, are implemented as instances of the C++
std::vector
class (not to be confused with Eigen’s Eigen::Vector
class or Stan vectors). By implementing arrays this way, indexing is efficient because values can be returned by reference rather than copied by value.
Matrices vs. two-dimensional arrays
In Stan models, there are a few minor efficiency considerations in deciding between a two-dimensional array and a matrix, which may seem interchangeable at first glance.
First, matrices use a bit less memory than two-dimensional arrays. This is because they don’t store a sequence of arrays, but just the data and the two dimensions.
Second, matrices store their data in column-major order. Furthermore, all of the data in a matrix is guaranteed to be contiguous in memory. This is an important consideration for optimized code because bringing in data from memory to cache is much more expensive than performing arithmetic operations with contemporary CPUs. Arrays, on the other hand, only guarantee that the values of primitive types are contiguous in memory; otherwise, they hold copies of their values (which are returned by reference wherever possible).
Third, both data structures are best traversed in the order in which they are stored. This also helps with memory locality. This is column-major for matrices, so the following order is appropriate.
matrix[M, N] a;
//...
for (n in 1:N) {
for (m in 1:M) {
// ... do something with a[m, n] ...
} }
Arrays, on the other hand, should be traversed in row-major order (i.e., last index fastest), as in the following example.
array[M, N] real a;
// ...
for (m in 1:M) {
for (n in 1:N) {
// ... do something with a[m, n] ...
} }
The first use of a[m ,n]
should bring a[m]
into memory. Overall, traversing matrices is more efficient than traversing arrays.
This is true even for arrays of matrices. For example, the ideal order in which to traverse a two-dimensional array of matrices is
array[I, J] matrix[M, N] b;
// ...
for (i in 1:I) {
for (j in 1:J) {
for (n in 1:N) {
for (m in 1:M) {
// ... do something with b[i, j, m, n] ...
}
}
} }
If a
is a matrix, the notation a[m]
picks out row m
of that matrix. This is a rather inefficient operation for matrices. If indexing of vectors is needed, it is much better to declare an array of vectors. That is, this
array[M] row_vector[N] b;
// ...
for (m in 1:M) {
// ... do something with row vector b[m] ...
}
is much more efficient than the pure matrix version
matrix[M, N] b;
// ...
for (m in 1:M) {
// ... do something with row vector b[m] ...
}
Similarly, indexing an array of column vectors is more efficient than using the col
function to pick out a column of a matrix.
In contrast, whatever can be done as pure matrix algebra will be the fastest. So if I want to create a row of predictor-coefficient dot-products, it’s more efficient to do this
matrix[N, k] x; // predictors (aka covariates)
// ...
vector[K] beta; // coeffs
// ...
vector[N] y_hat; // linear prediction
// ...
y_hat = x * beta;
than it is to do this
array[N] row_vector[K] x; // predictors (aka covariates)
// ...
vector[K] beta; // coeffs
// ...
vector[N] y_hat; // linear prediction
// ...
for (n in 1:N) {
y_hat[n] = x[n] * beta; }
(Row) vectors vs. one-dimensional arrays
For use purely as a container, there is really nothing to decide among vectors, row vectors and one-dimensional arrays. The Eigen::Vector
template specialization and the std::vector
template class are implemented similarly as containers of double
values (the type real
in Stan). Only arrays in Stan are allowed to store integer values.
Memory locality
The key to understanding efficiency of matrix and vector representations is memory locality and reference passing versus copying.
Memory locality
CPUs on computers bring in memory in blocks through layers of caches. Fetching from memory is much slower than performing arithmetic operations. The only way to make container operations fast is to respect memory locality and access elements that are close together in memory sequentially in the program.
Matrices
Matrices are stored internally in column-major order. That is, an \(M \times N\) matrix stores its elements in the order \[ (1,1), (2, 1), \dotsc, (M, 1), (1, 2), \dotsc, (M, 2), \dotsc, (1, N), \dotsc, (M, N). \]
This means that it’s much more efficient to write loops over matrices column by column, as in the following example.
matrix[M, N] a;
// ...
for (n in 1:N) {
for (m in 1:M) {
// ... do something with a[m, n] ...
} }
It also follows that pulling a row out of a matrix is not memory local, as it has to stride over the whole sequence of values. It also requires a copy operation into a new data structure as it is not stored internally as a unit in a matrix. For sequential access to row vectors in a matrix, it is much better to use an array of row vectors, as in the following example.
array[M] row_vector[N] a;
// ...
for (m in 1:M) {
// ... do something with row vector a[m] ...
}
Even if what is done involves a function call, the row vector a[m]
will not have to be copied.
Arrays
Arrays are stored internally following their data structure. That means a two dimensional array is stored in row-major order. Thus it is efficient to pull out a “row” of a two-dimensional array.
array[M, N] real a;
// ...
for (m in 1:M) {
// ... do something with a[m] ...
}
A difference with matrices is that the entries a[m]
in the two dimensional array are not necessarily adjacent in memory, so there are no guarantees on iterating over all the elements in a two-dimensional array will provide memory locality across the “rows.”
Converting among matrix, vector, and array types
There is no automatic conversion among matrices, vectors, and arrays in Stan. But there are a wide range of conversion functions to convert a matrix into a vector, or a multi-dimensional array into a one-dimensional array, or convert a vector to an array. See the section on mixed matrix and array operations in the functions reference manual for a complete list of conversion operators and the multi-indexing chapter for some reshaping operations involving multiple indexing and range indexing.
Aliasing in Stan containers
Stan expressions are all evaluated before assignment happens, so there is no danger of so-called aliasing in array, vector, or matrix operations. Contrast the behavior of the assignments to u
and x
, which start with the same values.
The loop assigning to u
and the compound slicing assigning to x
.
the following trivial Stan program.
transformed data {
vector[4] x = [ 1, 2, 3, 4 ]';
vector[4] u = [ 1, 2, 3, 4 ]';
for (t in 2:4) {
1] * 3;
u[t] = u[t -
}
2:4] = x[1:3] * 3;
x[
print("u = ", u);
print("x = ", x);
}
The output it produces is,
u = [1, 3, 9, 27]
x = [1, 3, 6, 9]
In the loop version assigning to u
, the values are updated before being used to define subsequent values; in the sliced expression assigning to x
, the entire right-hand side is evaluated before assigning to the left-hand side.