Matrix Operations
Integer-valued matrix size functions
int
num_elements
(vector x)
The total number of elements in the vector x (same as function rows
)
int
num_elements
(row_vector x)
The total number of elements in the vector x (same as function cols
)
int
num_elements
(matrix x)
The total number of elements in the matrix x. For example, if x
is a \(5 \times 3\) matrix, then num_elements(x)
is 15
int
rows
(vector x)
The number of rows in the vector x
int
rows
(row_vector x)
The number of rows in the row vector x, namely 1
int
rows
(matrix x)
The number of rows in the matrix x
int
cols
(vector x)
The number of columns in the vector x, namely 1
int
cols
(row_vector x)
The number of columns in the row vector x
int
cols
(matrix x)
The number of columns in the matrix x
int
size
(vector x)
The size of x
, i.e., the number of elements
int
size
(row_vector x)
The size of x
, i.e., the number of elements
int
size
(matrix x)
The size of the matrix x
. For example, if x
is a \(5 \times 3\) matrix, then size(x)
is 15
Matrix arithmetic operators
Stan supports the basic matrix operations using infix, prefix and postfix operations. This section lists the operations supported by Stan along with their argument and result types.
Negation prefix operators
vector
operator-
(vector x)
The negation of the vector x.
row_vector
operator-
(row_vector x)
The negation of the row vector x.
matrix
operator-
(matrix x)
The negation of the matrix x.
T
operator-
(T x)
Vectorized version of operator-
. If T x
is a (possibly nested) array of matrix types, -x
is the same shape array where each individual value is negated.
Infix matrix operators
vector
operator+
(vector x, vector y)
The sum of the vectors x and y.
row_vector
operator+
(row_vector x, row_vector y)
The sum of the row vectors x and y.
matrix
operator+
(matrix x, matrix y)
The sum of the matrices x and y
vector
operator-
(vector x, vector y)
The difference between the vectors x and y.
row_vector
operator-
(row_vector x, row_vector y)
The difference between the row vectors x and y
matrix
operator-
(matrix x, matrix y)
The difference between the matrices x and y
vector
operator*
(real x, vector y)
The product of the scalar x and vector y
row_vector
operator*
(real x, row_vector y)
The product of the scalar x and the row vector y
matrix
operator*
(real x, matrix y)
The product of the scalar x and the matrix y
vector
operator*
(vector x, real y)
The product of the scalar y and vector x
matrix
operator*
(vector x, row_vector y)
The product of the vector x and row vector y
row_vector
operator*
(row_vector x, real y)
The product of the scalar y and row vector x
real
operator*
(row_vector x, vector y)
The product of the row vector x and vector y
row_vector
operator*
(row_vector x, matrix y)
The product of the row vector x and matrix y
matrix
operator*
(matrix x, real y)
The product of the scalar y and matrix x
vector
operator*
(matrix x, vector y)
The product of the matrix x and vector y
matrix
operator*
(matrix x, matrix y)
The product of the matrices x and y
Broadcast infix operators
vector
operator+
(vector x, real y)
The result of adding y to every entry in the vector x
vector
operator+
(real x, vector y)
The result of adding x to every entry in the vector y
row_vector
operator+
(row_vector x, real y)
The result of adding y to every entry in the row vector x
row_vector
operator+
(real x, row_vector y)
The result of adding x to every entry in the row vector y
matrix
operator+
(matrix x, real y)
The result of adding y to every entry in the matrix x
matrix
operator+
(real x, matrix y)
The result of adding x to every entry in the matrix y
vector
operator-
(vector x, real y)
The result of subtracting y from every entry in the vector x
vector
operator-
(real x, vector y)
The result of adding x to every entry in the negation of the vector y
row_vector
operator-
(row_vector x, real y)
The result of subtracting y from every entry in the row vector x
row_vector
operator-
(real x, row_vector y)
The result of adding x to every entry in the negation of the row vector y
matrix
operator-
(matrix x, real y)
The result of subtracting y from every entry in the matrix x
matrix
operator-
(real x, matrix y)
The result of adding x to every entry in negation of the matrix y
vector
operator/
(vector x, real y)
The result of dividing each entry in the vector x by y
row_vector
operator/
(row_vector x, real y)
The result of dividing each entry in the row vector x by y
matrix
operator/
(matrix x, real y)
The result of dividing each entry in the matrix x by y
Transposition operator
Matrix transposition is represented using a postfix operator.
matrix
operator'
(matrix x)
The transpose of the matrix x, written as x'
row_vector
operator'
(vector x)
The transpose of the vector x, written as x'
vector
operator'
(row_vector x)
The transpose of the row vector x, written as x'
Elementwise functions
Elementwise functions apply a function to each element of a vector or matrix, returning a result of the same shape as the argument. There are many functions that are vectorized in addition to the ad hoc cases listed in this section; see section function vectorization for the general cases.
vector
operator.*
(vector x, vector y)
The elementwise product of y and x
row_vector
operator.*
(row_vector x, row_vector y)
The elementwise product of y and x
matrix
operator.*
(matrix x, matrix y)
The elementwise product of y and x
vector
operator./
(vector x, vector y)
The elementwise quotient of y and x
vector
operator./
(vector x, real y)
The elementwise quotient of y and x
vector
operator./
(real x, vector y)
The elementwise quotient of y and x
row_vector
operator./
(row_vector x, row_vector y)
The elementwise quotient of y and x
row_vector
operator./
(row_vector x, real y)
The elementwise quotient of y and x
row_vector
operator./
(real x, row_vector y)
The elementwise quotient of y and x
matrix
operator./
(matrix x, matrix y)
The elementwise quotient of y and x
matrix
operator./
(matrix x, real y)
The elementwise quotient of y and x
matrix
operator./
(real x, matrix y)
The elementwise quotient of y and x
vector
operator.^
(vector x, vector y)
The elementwise power of y and x
vector
operator.^
(vector x, real y)
The elementwise power of y and x
vector
operator.^
(real x, vector y)
The elementwise power of y and x
row_vector
operator.^
(row_vector x, row_vector y)
The elementwise power of y and x
row_vector
operator.^
(row_vector x, real y)
The elementwise power of y and x
row_vector
operator.^
(real x, row_vector y)
The elementwise power of y and x
matrix
operator.^
(matrix x, matrix y)
The elementwise power of y and x
matrix
operator.^
(matrix x, real y)
The elementwise power of y and x
matrix
operator.^
(real x, matrix y)
The elementwise power of y and x
Dot products and specialized products
real
dot_product
(vector x, vector y)
The dot product of x and y
real
dot_product
(vector x, row_vector y)
The dot product of x and y
real
dot_product
(row_vector x, vector y)
The dot product of x and y
real
dot_product
(row_vector x, row_vector y)
The dot product of x and y
row_vector
columns_dot_product
(vector x, vector y)
The dot product of the columns of x and y
row_vector
columns_dot_product
(row_vector x, row_vector y)
The dot product of the columns of x and y
row_vector
columns_dot_product
(matrix x, matrix y)
The dot product of the columns of x and y
vector
rows_dot_product
(vector x, vector y)
The dot product of the rows of x and y
vector
rows_dot_product
(row_vector x, row_vector y)
The dot product of the rows of x and y
vector
rows_dot_product
(matrix x, matrix y)
The dot product of the rows of x and y
real
dot_self
(vector x)
The dot product of the vector x with itself
real
dot_self
(row_vector x)
The dot product of the row vector x with itself
row_vector
columns_dot_self
(vector x)
The dot product of the columns of x with themselves
row_vector
columns_dot_self
(row_vector x)
The dot product of the columns of x with themselves
row_vector
columns_dot_self
(matrix x)
The dot product of the columns of x with themselves
vector
rows_dot_self
(vector x)
The dot product of the rows of x with themselves
vector
rows_dot_self
(row_vector x)
The dot product of the rows of x with themselves
vector
rows_dot_self
(matrix x)
The dot product of the rows of x with themselves
Specialized products
matrix
tcrossprod
(matrix x)
The product of x postmultiplied by its own transpose, similar to the tcrossprod(x) function in R. The result is a symmetric matrix \(\text{x}\,\text{x}^{\top}\).
matrix
crossprod
(matrix x)
The product of x premultiplied by its own transpose, similar to the crossprod(x) function in R. The result is a symmetric matrix \(\text{x}^{\top}\,\text{x}\).
The following functions all provide shorthand forms for common expressions, which are also much more efficient.
matrix
quad_form
(matrix A, matrix B)
The quadratic form, i.e., B' * A * B
.
real
quad_form
(matrix A, vector B)
The quadratic form, i.e., B' * A * B
.
matrix
quad_form_diag
(matrix m, vector v)
The quadratic form using the column vector v as a diagonal matrix, i.e., diag_matrix(v) * m * diag_matrix(v)
.
matrix
quad_form_diag
(matrix m, row_vector rv)
The quadratic form using the row vector rv as a diagonal matrix, i.e., diag_matrix(rv) * m * diag_matrix(rv)
.
matrix
quad_form_sym
(matrix A, matrix B)
Similarly to quad_form, gives B' * A * B
, but additionally checks if A is symmetric and ensures that the result is also symmetric.
real
quad_form_sym
(matrix A, vector B)
Similarly to quad_form, gives B' * A * B
, but additionally checks if A is symmetric and ensures that the result is also symmetric.
real
trace_quad_form
(matrix A, matrix B)
The trace of the quadratic form, i.e., trace(B' * A * B)
.
real
trace_gen_quad_form
(matrix D,matrix A, matrix B)
The trace of a generalized quadratic form, i.e., trace(D * B' * A * B).
matrix
multiply_lower_tri_self_transpose
(matrix x)
The product of the lower triangular portion of x (including the diagonal) times its own transpose; that is, if L
is a matrix of the same dimensions as x with L(m,n)
equal to x(m,n)
for \(\text{n}
\leq \text{m}\) and L(m,n)
equal to 0 if \(\text{n} > \text{m}\), the result is the symmetric matrix \(\text{L}\,\text{L}^{\top}\). This is a specialization of tcrossprod(x) for lower-triangular matrices. The input matrix does not need to be square.
matrix
diag_pre_multiply
(vector v, matrix m)
Return the product of the diagonal matrix formed from the vector v and the matrix m, i.e., diag_matrix(v) * m
.
matrix
diag_pre_multiply
(row_vector rv, matrix m)
Return the product of the diagonal matrix formed from the vector rv and the matrix m, i.e., diag_matrix(rv) * m
.
matrix
diag_post_multiply
(matrix m, vector v)
Return the product of the matrix m and the diagonal matrix formed from the vector v, i.e., m * diag_matrix(v)
.
matrix
diag_post_multiply
(matrix m, row_vector rv)
Return the product of the matrix m
and the diagonal matrix formed from the the row vector rv
, i.e., m * diag_matrix(rv)
.
Reductions
Log sum of exponents
real
log_sum_exp
(vector x)
The natural logarithm of the sum of the exponentials of the elements in x
real
log_sum_exp
(row_vector x)
The natural logarithm of the sum of the exponentials of the elements in x
real
log_sum_exp
(matrix x)
The natural logarithm of the sum of the exponentials of the elements in x
Minimum and maximum
real
min
(vector x)
The minimum value in x, or \(+\infty\) if x is empty
real
min
(row_vector x)
The minimum value in x, or \(+\infty\) if x is empty
real
min
(matrix x)
The minimum value in x, or \(+\infty\) if x is empty
real
max
(vector x)
The maximum value in x, or \(-\infty\) if x is empty
real
max
(row_vector x)
The maximum value in x, or \(-\infty\) if x is empty
real
max
(matrix x)
The maximum value in x, or \(-\infty\) if x is empty
Sums and products
real
sum
(vector x)
The sum of the values in x, or 0 if x is empty
real
sum
(row_vector x)
The sum of the values in x, or 0 if x is empty
real
sum
(matrix x)
The sum of the values in x, or 0 if x is empty
real
prod
(vector x)
The product of the values in x, or 1 if x is empty
real
prod
(row_vector x)
The product of the values in x, or 1 if x is empty
real
prod
(matrix x)
The product of the values in x, or 1 if x is empty
Sample moments
Full definitions are provided for sample moments in section array reductions.
real
mean
(vector x)
The sample mean of the values in x; see section array reductions for details.
real
mean
(row_vector x)
The sample mean of the values in x; see section array reductions for details.
real
mean
(matrix x)
The sample mean of the values in x; see section array reductions for details.
real
variance
(vector x)
The sample variance of the values in x; see section array reductions for details.
real
variance
(row_vector x)
The sample variance of the values in x; see section array reductions for details.
real
variance
(matrix x)
The sample variance of the values in x; see section array reductions for details.
real
sd
(vector x)
The sample standard deviation of the values in x; see section array reductions for details.
real
sd
(row_vector x)
The sample standard deviation of the values in x; see section array reductions for details.
real
sd
(matrix x)
The sample standard deviation of the values in x; see section array reductions for details.
Quantile
Produces sample quantiles corresponding to the given probabilities. The smallest observation corresponds to a probability of 0 and the largest to a probability of 1.
Implements algorithm 7 from Hyndman, R. J. and Fan, Y., Sample quantiles in Statistical Packages (R’s default quantile function).
real
quantile
(data vector x, data real p)
The p-th quantile of x
array[] real
quantile
(data vector x, data array[] real p)
An array containing the quantiles of x given by the array of probabilities p
real
quantile
(data row_vector x, data real p)
The p-th quantile of x
array[] real
quantile
(data row_vector x, data array[] real p)
An array containing the quantiles of x given by the array of probabilities p
Broadcast functions
The following broadcast functions allow vectors, row vectors and matrices to be created by copying a single element into all of their cells. Matrices may also be created by stacking copies of row vectors vertically or stacking copies of column vectors horizontally.
vector
rep_vector
(real x, int m)
Return the size m (column) vector consisting of copies of x.
row_vector
rep_row_vector
(real x, int n)
Return the size n row vector consisting of copies of x.
matrix
rep_matrix
(real x, int m, int n)
Return the m by n matrix consisting of copies of x.
matrix
rep_matrix
(vector v, int n)
Return the m by n matrix consisting of n copies of the (column) vector v of size m.
matrix
rep_matrix
(row_vector rv, int m)
Return the m by n matrix consisting of m copies of the row vector rv of size n.
Unlike the situation with array broadcasting (see section array broadcasting), where there is a distinction between integer and real arguments, the following two statements produce the same result for vector broadcasting; row vector and matrix broadcasting behave similarly.
vector[3] x;
1, 3);
x = rep_vector(1.0, 3); x = rep_vector(
There are no integer vector or matrix types, so integer values are automatically promoted.
Symmetrization
matrix
symmetrize_from_lower_tri
(matrix A)
Construct a symmetric matrix from the lower triangle of A.
Available since 2.26Diagonal matrix functions
matrix
add_diag
(matrix m, row_vector d)
Add row_vector d
to the diagonal of matrix m
.
matrix
add_diag
(matrix m, vector d)
Add vector d
to the diagonal of matrix m
.
matrix
add_diag
(matrix m, real d)
Add scalar d
to every diagonal element of matrix m
.
vector
diagonal
(matrix x)
The diagonal of the matrix x
matrix
diag_matrix
(vector x)
The diagonal matrix with diagonal x
Although the diag_matrix
function is available, it is unlikely to ever show up in an efficient Stan program. For example, rather than converting a diagonal to a full matrix for use as a covariance matrix,
y ~ multi_normal(mu, diag_matrix(square(sigma)));
it is much more efficient to just use a univariate normal, which produces the same density,
y ~ normal(mu, sigma);
Rather than writing m * diag_matrix(v)
where m
is a matrix and v
is a vector, it is much more efficient to write diag_post_multiply(m, v)
(and similarly for pre-multiplication). By the same token, it is better to use quad_form_diag(m, v)
rather than quad_form(m, diag_matrix(v))
.
matrix
identity_matrix
(int k)
Create an identity matrix of size \(k \times k\)
Container construction functions
array[] real
linspaced_array
(int n, data real lower, data real upper)
Create a real array of length n
of equidistantly-spaced elements between lower
and upper
array[] int
linspaced_int_array
(int n, int lower, int upper)
Create a regularly spaced, increasing integer array of length n
between lower
and upper
, inclusively. If (upper - lower) / (n - 1)
is less than one, repeat each output (n - 1) / (upper - lower)
times. If neither (upper - lower) / (n - 1)
or (n - 1) / (upper - lower)
are integers, upper
is reduced until one of these is true.
vector
linspaced_vector
(int n, data real lower, data real upper)
Create an n
-dimensional vector of equidistantly-spaced elements between lower
and upper
row_vector
linspaced_row_vector
(int n, data real lower, data real upper)
Create an n
-dimensional row-vector of equidistantly-spaced elements between lower
and upper
array[] int
one_hot_int_array
(int n, int k)
Create a one-hot encoded int array of length n
with array[k] = 1
array[] real
one_hot_array
(int n, int k)
Create a one-hot encoded real array of length n
with array[k] = 1
vector
one_hot_vector
(int n, int k)
Create an n
-dimensional one-hot encoded vector with vector[k] = 1
row_vector
one_hot_row_vector
(int n, int k)
Create an n
-dimensional one-hot encoded row-vector with row_vector[k] = 1
array[] int
ones_int_array
(int n)
Create an int array of length n
of all ones
array[] real
ones_array
(int n)
Create a real array of length n
of all ones
vector
ones_vector
(int n)
Create an n
-dimensional vector of all ones
row_vector
ones_row_vector
(int n)
Create an n
-dimensional row-vector of all ones
array[] int
zeros_int_array
(int n)
Create an int array of length n
of all zeros
array[] real
zeros_array
(int n)
Create a real array of length n
of all zeros
vector
zeros_vector
(int n)
Create an n
-dimensional vector of all zeros
row_vector
zeros_row_vector
(int n)
Create an n
-dimensional row-vector of all zeros
vector
uniform_simplex
(int n)
Create an n
-dimensional simplex with elements vector[i] = 1 / n
for all \(i \in 1, \dots, n\)
Slicing and blocking functions
Stan provides several functions for generating slices or blocks or diagonal entries for matrices.
Columns and rows
vector
col
(matrix x, int n)
The n-th column of matrix x
row_vector
row
(matrix x, int m)
The m-th row of matrix x
The row
function is special in that it may be used as an lvalue in an assignment statement (i.e., something to which a value may be assigned). The row function is also special in that the indexing notation x[m]
is just an alternative way of writing row(x,m)
. The col
function may not, be used as an lvalue, nor is there an indexing based shorthand for it.
Block operations
Matrix slicing operations
Block operations may be used to extract a sub-block of a matrix.
matrix
block
(matrix x, int i, int j, int n_rows, int n_cols)
Return the submatrix of x that starts at row i and column j and extends n_rows rows and n_cols columns.
The sub-row and sub-column operations may be used to extract a slice of row or column from a matrix
vector
sub_col
(matrix x, int i, int j, int n_rows)
Return the sub-column of x that starts at row i and column j and extends n_rows rows and 1 column.
row_vector
sub_row
(matrix x, int i, int j, int n_cols)
Return the sub-row of x that starts at row i and column j and extends 1 row and n_cols columns.
Vector and array slicing operations
The head operation extracts the first \(n\) elements of a vector and the tail operation the last. The segment operation extracts an arbitrary subvector.
vector
head
(vector v, int n)
Return the vector consisting of the first n elements of v.
row_vector
head
(row_vector rv, int n)
Return the row vector consisting of the first n elements of rv.
array[] T
head
(array[] T sv, int n)
Return the array consisting of the first n elements of sv; applies to up to three-dimensional arrays containing any type of elements T
.
vector
tail
(vector v, int n)
Return the vector consisting of the last n elements of v.
row_vector
tail
(row_vector rv, int n)
Return the row vector consisting of the last n elements of rv.
array[] T
tail
(array[] T sv, int n)
Return the array consisting of the last n elements of sv; applies to up to three-dimensional arrays containing any type of elements T
.
vector
segment
(vector v, int i, int n)
Return the vector consisting of the n elements of v starting at i; i.e., elements i through through i + n - 1.
row_vector
segment
(row_vector rv, int i, int n)
Return the row vector consisting of the n elements of rv starting at i; i.e., elements i through through i + n - 1.
array[] T
segment
(array[] T sv, int i, int n)
Return the array consisting of the n elements of sv starting at i; i.e., elements i through through i + n - 1. Applies to up to three-dimensional arrays containing any type of elements T
.
Matrix concatenation
Stan’s matrix concatenation operations append_col
and append_row
are like the operations cbind
and rbind
in R.
Horizontal concatenation
matrix
append_col
(matrix x, matrix y)
Combine matrices x and y by column. The matrices must have the same number of rows.
matrix
append_col
(matrix x, vector y)
Combine matrix x and vector y by column. The matrix and the vector must have the same number of rows.
matrix
append_col
(vector x, matrix y)
Combine vector x and matrix y by column. The vector and the matrix must have the same number of rows.
matrix
append_col
(vector x, vector y)
Combine vectors x and y by column. The vectors must have the same number of rows.
row_vector
append_col
(row_vector x, row_vector y)
Combine row vectors x and y of any size into another row vector by appending y to the end of x.
row_vector
append_col
(real x, row_vector y)
Append x to the front of y, returning another row vector.
row_vector
append_col
(row_vector x, real y)
Append y to the end of x, returning another row vector.
Vertical concatenation
matrix
append_row
(matrix x, matrix y)
Combine matrices x and y by row. The matrices must have the same number of columns.
matrix
append_row
(matrix x, row_vector y)
Combine matrix x and row vector y by row. The matrix and the row vector must have the same number of columns.
matrix
append_row
(row_vector x, matrix y)
Combine row vector x and matrix y by row. The row vector and the matrix must have the same number of columns.
matrix
append_row
(row_vector x, row_vector y)
Combine row vectors x and y by row. The row vectors must have the same number of columns.
vector
append_row
(vector x, vector y)
Concatenate vectors x and y of any size into another vector.
vector
append_row
(real x, vector y)
Append x to the top of y, returning another vector.
vector
append_row
(vector x, real y)
Append y to the bottom of x, returning another vector.
Special matrix functions
Softmax
The softmax function maps1 \(y \in \mathbb{R}^K\) to the \(K\)-simplex by \[\begin{equation*} \text{softmax}(y) = \frac{\exp(y)} {\sum_{k=1}^K \exp(y_k)}, \end{equation*}\] where \(\exp(y)\) is the componentwise exponentiation of \(y\). Softmax is usually calculated on the log scale, \[\begin{eqnarray*} \log \text{softmax}(y) & = & \ y - \log \sum_{k=1}^K \exp(y_k) \\[4pt] & = & y - \mathrm{log\_sum\_exp}(y). \end{eqnarray*}\] where the vector \(y\) minus the scalar \(\mathrm{log\_sum\_exp}(y)\) subtracts the scalar from each component of \(y\).
Stan provides the following functions for softmax and its log.
vector
softmax
(vector x)
The softmax of x
vector
log_softmax
(vector x)
The natural logarithm of the softmax of x
Cumulative sums
The cumulative sum of a sequence \(x_1,\ldots,x_N\) is the sequence \(y_1,\ldots,y_N\), where \[\begin{equation*} y_n = \sum_{m = 1}^{n} x_m. \end{equation*}\]
array[] int
cumulative_sum
(array[] int x)
The cumulative sum of x
array[] real
cumulative_sum
(array[] real x)
The cumulative sum of x
vector
cumulative_sum
(vector v)
The cumulative sum of v
row_vector
cumulative_sum
(row_vector rv)
The cumulative sum of rv
Gaussian Process Covariance Functions
The Gaussian process covariance functions compute the covariance between observations in an input data set or the cross-covariance between two input data sets.
For one dimensional GPs, the input data sets are arrays of scalars. The covariance matrix is given by \(K_{ij} = k(x_i, x_j)\) (where \(x_i\) is the \(i^{th}\) element of the array \(x\)) and the cross-covariance is given by \(K_{ij} = k(x_i, y_j)\).
For multi-dimensional GPs, the input data sets are arrays of vectors. The covariance matrix is given by \(K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)\) (where \(\mathbf{x}_i\) is the \(i^{th}\) vector in the array \(x\)) and the cross-covariance is given by \(K_{ij} = k(\mathbf{x}_i, \mathbf{y}_j)\).
Exponentiated quadratic kernel
With magnitude \(\sigma\) and length scale \(l\), the exponentiated quadratic kernel is:
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \sigma^2 \exp \left( -\frac{|\mathbf{x}_i - \mathbf{x}_j|^2}{2l^2} \right) \]
matrix
gp_exp_quad_cov
(array[] real x, real sigma, real length_scale)
Gaussian process covariance with exponentiated quadratic kernel in one dimension.
Available since 2.20matrix
gp_exp_quad_cov
(array[] real x1, array[] real x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with exponentiated quadratic kernel in one dimension.
matrix
gp_exp_quad_cov
(vectors x, real sigma, real length_scale)
Gaussian process covariance with exponentiated quadratic kernel in multiple dimensions.
Available since 2.20matrix
gp_exp_quad_cov
(vectors x, real sigma, array[] real length_scale)
Gaussian process covariance with exponentiated quadratic kernel in multiple dimensions with a length scale for each dimension.
Available since 2.20matrix
gp_exp_quad_cov
(vectors x1, vectors x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with exponentiated quadratic kernel in multiple dimensions.
matrix
gp_exp_quad_cov
(vectors x1, vectors x2, real sigma, array[] real length_scale)
Gaussian process cross-covariance of x1
and x2
with exponentiated quadratic kernel in multiple dimensions with a length scale for each dimension.
Dot product kernel
With bias \(\sigma_0\) the dot product kernel is:
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \sigma_0^2 + \mathbf{x}_i^T \mathbf{x}_j \]
matrix
gp_dot_prod_cov
(array[] real x, real sigma)
Gaussian process covariance with dot product kernel in one dimension.
Available since 2.20matrix
gp_dot_prod_cov
(array[] real x1, array[] real x2, real sigma)
Gaussian process cross-covariance of x1
and x2
with dot product kernel in one dimension.
matrix
gp_dot_prod_cov
(vectors x, real sigma)
Gaussian process covariance with dot product kernel in multiple dimensions.
Available since 2.20matrix
gp_dot_prod_cov
(vectors x1, vectors x2, real sigma)
Gaussian process cross-covariance of x1
and x2
with dot product kernel in multiple dimensions.
Exponential kernel
With magnitude \(\sigma\) and length scale \(l\), the exponential kernel is:
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \sigma^2 \exp \left( -\frac{|\mathbf{x}_i - \mathbf{x}_j|}{l} \right) \]
matrix
gp_exponential_cov
(array[] real x, real sigma, real length_scale)
Gaussian process covariance with exponential kernel in one dimension.
Available since 2.20matrix
gp_exponential_cov
(array[] real x1, array[] real x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with exponential kernel in one dimension.
matrix
gp_exponential_cov
(vectors x, real sigma, real length_scale)
Gaussian process covariance with exponential kernel in multiple dimensions.
Available since 2.20matrix
gp_exponential_cov
(vectors x, real sigma, array[] real length_scale)
Gaussian process covariance with exponential kernel in multiple dimensions with a length scale for each dimension.
Available since 2.20matrix
gp_exponential_cov
(vectors x1, vectors x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with exponential kernel in multiple dimensions.
matrix
gp_exponential_cov
(vectors x1, vectors x2, real sigma, array[] real length_scale)
Gaussian process cross-covariance of x1
and x2
with exponential kernel in multiple dimensions with a length scale for each dimension.
Matern 3/2 kernel
With magnitude \(\sigma\) and length scale \(l\), the Matern 3/2 kernel is:
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \sigma^2 \left( 1 + \frac{\sqrt{3}|\mathbf{x}_i - \mathbf{x}_j|}{l} \right) \exp \left( -\frac{\sqrt{3}|\mathbf{x}_i - \mathbf{x}_j|}{l} \right) \]
matrix
gp_matern32_cov
(array[] real x, real sigma, real length_scale)
Gaussian process covariance with Matern 3/2 kernel in one dimension.
Available since 2.20matrix
gp_matern32_cov
(array[] real x1, array[] real x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with Matern 3/2 kernel in one dimension.
matrix
gp_matern32_cov
(vectors x, real sigma, real length_scale)
Gaussian process covariance with Matern 3/2 kernel in multiple dimensions.
Available since 2.20matrix
gp_matern32_cov
(vectors x, real sigma, array[] real length_scale)
Gaussian process covariance with Matern 3/2 kernel in multiple dimensions with a length scale for each dimension.
Available since 2.20matrix
gp_matern32_cov
(vectors x1, vectors x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with Matern 3/2 kernel in multiple dimensions.
matrix
gp_matern32_cov
(vectors x1, vectors x2, real sigma, array[] real length_scale)
Gaussian process cross-covariance of x1
and x2
with Matern 3/2 kernel in multiple dimensions with a length scale for each dimension.
Matern 5/2 kernel
With magnitude \(\sigma\) and length scale \(l\), the Matern 5/2 kernel is:
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \sigma^2 \left( 1 + \frac{\sqrt{5}|\mathbf{x}_i - \mathbf{x}_j|}{l} + \frac{5 |\mathbf{x}_i - \mathbf{x}_j|^2}{3l^2} \right) \exp \left( -\frac{\sqrt{5} |\mathbf{x}_i - \mathbf{x}_j|}{l} \right) \]
matrix
gp_matern52_cov
(array[] real x, real sigma, real length_scale)
Gaussian process covariance with Matern 5/2 kernel in one dimension.
Available since 2.20matrix
gp_matern52_cov
(array[] real x1, array[] real x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with Matern 5/2 kernel in one dimension.
matrix
gp_matern52_cov
(vectors x, real sigma, real length_scale)
Gaussian process covariance with Matern 5/2 kernel in multiple dimensions.
Available since 2.20matrix
gp_matern52_cov
(vectors x, real sigma, array[] real length_scale)
Gaussian process covariance with Matern 5/2 kernel in multiple dimensions with a length scale for each dimension.
Available since 2.20matrix
gp_matern52_cov
(vectors x1, vectors x2, real sigma, real length_scale)
Gaussian process cross-covariance of x1
and x2
with Matern 5/2 kernel in multiple dimensions.
matrix
gp_matern52_cov
(vectors x1, vectors x2, real sigma, array[] real length_scale)
Gaussian process cross-covariance of x1
and x2
with Matern 5/2 kernel in multiple dimensions with a length scale for each dimension.
Periodic kernel
With magnitude \(\sigma\), length scale \(l\), and period \(p\), the periodic kernel is:
\[ k(\mathbf{x}_i, \mathbf{x}_j) = \sigma^2 \exp \left(-\frac{2 \sin^2 \left( \pi \frac{|\mathbf{x}_i - \mathbf{x}_j|}{p} \right) }{l^2} \right) \]
matrix
gp_periodic_cov
(array[] real x, real sigma, real length_scale, real period)
Gaussian process covariance with periodic kernel in one dimension.
Available since 2.20matrix
gp_periodic_cov
(array[] real x1, array[] real x2, real sigma, real length_scale, real period)
Gaussian process cross-covariance of x1
and x2
with periodic kernel in one dimension.
matrix
gp_periodic_cov
(vectors x, real sigma, real length_scale, real period)
Gaussian process covariance with periodic kernel in multiple dimensions.
Available since 2.20matrix
gp_periodic_cov
(vectors x1, vectors x2, real sigma, real length_scale, real period)
Gaussian process cross-covariance of x1
and x2
with periodic kernel in multiple dimensions with a length scale for each dimension.
Linear algebra functions and solvers
Matrix division operators and functions
In general, it is much more efficient and also more arithmetically stable to use matrix division than to multiply by an inverse. There are specialized forms for lower triangular matrices and for symmetric, positive-definite matrices.
Matrix division operators
row_vector
operator/
(row_vector b, matrix A)
The right division of b by A; equivalently b * inverse(A)
matrix
operator/
(matrix B, matrix A)
The right division of B by A; equivalently B * inverse(A)
vector
operator\
(matrix A, vector b)
The left division of A by b; equivalently inverse(A) * b
matrix
operator\
(matrix A, matrix B)
The left division of A by B; equivalently inverse(A) * B
Lower-triangular matrix division functions
There are four division functions which use lower triangular views of a matrix. The lower triangular view of a matrix \(\text{tri}(A)\) is used in the definitions and defined by \[\begin{equation*} \text{tri}(A)[m,n] = \left\{ \begin{array}{ll} A[m,n] & \text{if } m \geq n, \text{ and} \\[4pt] 0 & \text{otherwise}. \end{array} \right. \end{equation*}\] When a lower triangular view of a matrix is used, the elements above the diagonal are ignored.
vector
mdivide_left_tri_low
(matrix A, vector b)
The left division of b by a lower-triangular view of A; algebraically equivalent to the less efficient and stable form inverse(tri(A)) * b
, where tri(A)
is the lower-triangular portion of A with the above-diagonal entries set to zero.
matrix
mdivide_left_tri_low
(matrix A, matrix B)
The left division of B by a triangular view of A; algebraically equivalent to the less efficient and stable form inverse(tri(A)) * B
, where tri(A)
is the lower-triangular portion of A with the above-diagonal entries set to zero.
row_vector
mdivide_right_tri_low
(row_vector b, matrix A)
The right division of b by a triangular view of A; algebraically equivalent to the less efficient and stable form b * inverse(tri(A))
, where tri(A)
is the lower-triangular portion of A with the above-diagonal entries set to zero.
matrix
mdivide_right_tri_low
(matrix B, matrix A)
The right division of B by a triangular view of A; algebraically equivalent to the less efficient and stable form B * inverse(tri(A))
, where tri(A)
is the lower-triangular portion of A with the above-diagonal entries set to zero.
Symmetric positive-definite matrix division functions
There are four division functions which are specialized for efficiency and stability for symmetric positive-definite matrix dividends. If the matrix dividend argument is not symmetric and positive definite, these will reject and print warnings.
matrix
mdivide_left_spd
(matrix A, vector b)
The left division of b by the symmetric, positive-definite matrix A; algebraically equivalent to the less efficient and stable form inverse(A) * b
.
vector
mdivide_left_spd
(matrix A, matrix B)
The left division of B by the symmetric, positive-definite matrix A; algebraically equivalent to the less efficient and stable form inverse(A) * B
.
row_vector
mdivide_right_spd
(row_vector b, matrix A)
The right division of b by the symmetric, positive-definite matrix A; algebraically equivalent to the less efficient and stable form b *inverse(A)
.
matrix
mdivide_right_spd
(matrix B, matrix A)
The right division of B by the symmetric, positive-definite matrix A; algebraically equivalent to the less efficient and stable form B * inverse(A)
.
Matrix exponential
The exponential of the matrix \(A\) is formally defined by the convergent power series: \[\begin{equation*} e^A = \sum_{n=0}^{\infty} \dfrac{A^n}{n!} \end{equation*}\]
matrix
matrix_exp
(matrix A)
The matrix exponential of A
matrix
matrix_exp_multiply
(matrix A, matrix B)
The multiplication of matrix exponential of A and matrix B; algebraically equivalent to the less efficient form matrix_exp(A) * B
.
matrix
scale_matrix_exp_multiply
(real t, matrix A, matrix B)
The multiplication of matrix exponential of tA and matrix B; algebraically equivalent to the less efficient form matrix_exp(t * A) * B
.
Matrix power
Returns the nth power of the specific matrix: \[\begin{equation*} M^n = M_1 * ... * M_n \end{equation*}\]
matrix
matrix_power
(matrix A, int B)
Matrix A raised to the power B.
Linear algebra functions
Trace
real
trace
(matrix A)
The trace of A, or 0 if A is empty; A is not required to be diagonal
Determinants
real
determinant
(matrix A)
The determinant of A
real
log_determinant
(matrix A)
The log of the absolute value of the determinant of A
real
log_determinant_spd
(matrix A)
The log of the absolute value of the determinant of the symmetric, positive-definite matrix A.
Inverses
It is almost never a good idea to use matrix inverses directly because they are both inefficient and arithmetically unstable compared to the alternatives. Rather than inverting a matrix m
and post-multiplying by a vector or matrix a
, as in inverse(m) * a
, it is better to code this using matrix division, as in m \ a
. The pre-multiplication case is similar, with b * inverse(m)
being more efficiently coded as as b / m
. There are also useful special cases for triangular and symmetric, positive-definite matrices that use more efficient solvers.
Warning: The function inv(m)
is the elementwise inverse function, which returns 1 / m[i, j]
for each element.
matrix
inverse
(matrix A)
Compute the inverse of A
matrix
inverse_spd
(matrix A)
Compute the inverse of A where A is symmetric, positive definite. This version is faster and more arithmetically stable when the input is symmetric and positive definite.
matrix
chol2inv
(matrix L)
Compute the inverse of the matrix whose cholesky factorization is L. That is, for \(A = L L^T\), return \(A^{-1}\).
Generalized Inverse
The generalized inverse \(M^+\) of a matrix \(M\) is a matrix that satisfies \(M M^+ M = M\). For an invertible, square matrix \(M\), \(M^+\) is equivalent to \(M^{-1}\). The dimensions of \(M^+\) are equivalent to the dimensions of \(M^T\). The generalized inverse exists for any matrix, so the \(M\) may be singular or less than full rank.
Even though the generalized inverse exists for any arbitrary matrix, the derivatives of this function only exist on matrices of locally constant rank (Golub and Pereyra 1973), meaning, the derivatives do not exist if small perturbations make the matrix change rank. For example, considered the rank of the matrix \(A\) as a function of \(\epsilon\):
\[ A = \left( \begin{array}{cccc} 1 + \epsilon & 2 & 1 \\ 2 & 4 & 2 \end{array} \right) \]
When \(\epsilon = 0\), \(A\) is rank 1 because the second row is twice the first (and so there is only one linearly independent row). If \(\epsilon \neq 0\), the rows are no longer linearly dependent, and the matrix is rank 2. This matrix does not have locally constant rank at \(\epsilon = 0\), and so the derivatives do not exist at zero. Because HMC depends on the derivatives existing, this lack of differentiability creates undefined behavior.
matrix
generalized_inverse
(matrix A)
The generalized inverse of A
Eigendecomposition
complex_vector
eigenvalues
(matrix A)
The complex-valued vector of eigenvalues of the matrix A
. The eigenvalues are repeated according to their algebraic multiplicity, so there are as many eigenvalues as rows in the matrix. The eigenvalues are not sorted in any particular order.
complex_matrix
eigenvectors
(matrix A)
The matrix with the complex-valued (column) eigenvectors of the matrix A
in the same order as returned by the function eigenvalues
tuple(complex_matrix, complex_vector)
eigendecompose
(matrix A)
Return the matrix of (column) eigenvectors and vector of eigenvalues of the matrix A
. This function is equivalent to (eigenvectors(A), eigenvalues(A))
but with a lower computational cost due to the shared work between the two results.
vector
eigenvalues_sym
(matrix A)
The vector of eigenvalues of a symmetric matrix A
in ascending order
matrix
eigenvectors_sym
(matrix A)
The matrix with the (column) eigenvectors of symmetric matrix A
in the same order as returned by the function eigenvalues_sym
tuple(matrix, vector)
eigendecompose_sym
(matrix A)
Return the matrix of (column) eigenvectors and vector of eigenvalues of the symmetric matrix A
. This function is equivalent to (eigenvectors_sym(A), eigenvalues_sym(A))
but with a lower computational cost due to the shared work between the two results.
Because multiplying an eigenvector by \(-1\) results in an eigenvector, eigenvectors returned by a decomposition are only identified up to a sign change. In order to compare the eigenvectors produced by Stan’s eigendecomposition to others, signs may need to be normalized in some way, such as by fixing the sign of a component, or doing comparisons allowing a multiplication by \(-1\).
The condition number of a symmetric matrix is defined to be the ratio of the largest eigenvalue to the smallest eigenvalue. Large condition numbers lead to difficulty in numerical algorithms such as computing inverses, and thus known as “ill conditioned.” The ratio can even be infinite in the case of singular matrices (i.e., those with eigenvalues of 0).
QR decomposition
matrix
qr_thin_Q
(matrix A)
The orthogonal matrix in the thin QR decomposition of A, which implies that the resulting matrix has the same dimensions as A
matrix
qr_thin_R
(matrix A)
The upper triangular matrix in the thin QR decomposition of A, which implies that the resulting matrix is square with the same number of columns as A
tuple(matrix, matrix)
qr_thin
(matrix A)
Returns both portions of the QR decomposition of A. The first element (“Q”) is the orthonormal matrix in the thin QR decomposition and the second element (“R”) is upper triangular. This function is equivalent to (qr_thin_Q(A), qr_thin_R(A))
but with a lower computational cost due to the shared work between the two results.
matrix
qr_Q
(matrix A)
The orthogonal matrix in the fat QR decomposition of A, which implies that the resulting matrix is square with the same number of rows as A
matrix
qr_R
(matrix A)
The upper trapezoidal matrix in the fat QR decomposition of A, which implies that the resulting matrix will be rectangular with the same dimensions as A
tuple(matrix, matrix)
qr
(matrix A)
Returns both portions of the QR decomposition of A. The first element (“Q”) is the orthonormal matrix in the thin QR decomposition and the second element (“R”) is upper triangular. This function is equivalent to (qr_Q(A), qr_R(A))
but with a lower computational cost due to the shared work between the two results.
The thin QR decomposition is always preferable because it will consume much less memory when the input matrix is large than will the fat QR decomposition. Both versions of the decomposition represent the input matrix as \[\begin{equation*} A = Q \, R. \end{equation*}\] Multiplying a column of an orthogonal matrix by \(-1\) still results in an orthogonal matrix, and you can multiply the corresponding row of the upper trapezoidal matrix by \(-1\) without changing the product. Thus, Stan adopts the normalization that the diagonal elements of the upper trapezoidal matrix are strictly positive and the columns of the orthogonal matrix are reflected if necessary. Also, these QR decomposition algorithms do not utilize pivoting and thus may be numerically unstable on input matrices that have less than full rank.
Cholesky decomposition
Every symmetric, positive-definite matrix (such as a correlation or covariance matrix) has a Cholesky decomposition. If \(\Sigma\) is a symmetric, positive-definite matrix, its Cholesky decomposition is the lower-triangular vector \(L\) such that \[\begin{equation*} \Sigma = L \, L^{\top}. \end{equation*}\]
matrix
cholesky_decompose
(matrix A)
The lower-triangular Cholesky factor of the symmetric positive-definite matrix A
Singular value decomposition
The matrix A can be decomposed into a diagonal matrix of singular values, D, and matrices of its left and right singular vectors, U and V, \[\begin{equation*} A = U D V^T. \end{equation*}\] The matrices of singular vectors here are thin. That is for an \(N\) by \(P\) input A, \(M = min(N, P)\), U is size \(N\) by \(M\) and V is size \(P\) by \(M\).
vector
singular_values
(matrix A)
The singular values of A
in descending order
matrix
svd_U
(matrix A)
The left-singular vectors of A
matrix
svd_V
(matrix A)
The right-singular vectors of A
tuple(matrix, vector, matrix)
svd
(matrix A)
Returns a tuple containing the left-singular vectors of A
, the singular values of A
in descending order, and the right-singular values of A
. This function is equivalent to (svd_U(A), singular_values(A), svd_V(A))
but with a lower computational cost due to the shared work between the different components.
Sort functions
See the sorting functions section for examples of how the functions work.
vector
sort_asc
(vector v)
Sort the elements of v in ascending order
row_vector
sort_asc
(row_vector v)
Sort the elements of v in ascending order
vector
sort_desc
(vector v)
Sort the elements of v in descending order
row_vector
sort_desc
(row_vector v)
Sort the elements of v in descending order
array[] int
sort_indices_asc
(vector v)
Return an array of indices between 1 and the size of v, sorted to index v in ascending order.
array[] int
sort_indices_asc
(row_vector v)
Return an array of indices between 1 and the size of v, sorted to index v in ascending order.
array[] int
sort_indices_desc
(vector v)
Return an array of indices between 1 and the size of v, sorted to index v in descending order.
array[] int
sort_indices_desc
(row_vector v)
Return an array of indices between 1 and the size of v, sorted to index v in descending order.
int
rank
(vector v, int s)
Number of components of v less than v[s]
int
rank
(row_vector v, int s)
Number of components of v less than v[s]
Reverse functions
vector
reverse
(vector v)
Return a new vector containing the elements of the argument in reverse order.
row_vector
reverse
(row_vector v)
Return a new row vector containing the elements of the argument in reverse order.
References
Footnotes
The softmax function is so called because in the limit as \(y_n \rightarrow \infty\) with \(y_m\) for \(m \neq n\) held constant, the result tends toward the “one-hot” vector \(\theta\) with \(\theta_n = 1\) and \(\theta_m = 0\) for \(m \neq n\), thus providing a “soft” version of the maximum function.↩︎