10.10 Covariance Matrices
A \(K \times K\) matrix is a covariance matrix if it is symmetric and positive definite (see the previous section for definitions). It requires \(K + \binom{K}{2}\) free parameters to specify a \(K \times K\) covariance matrix.
Covariance Matrix Transform
Stan’s covariance transform is based on a Cholesky decomposition composed with a log transform of the positive-constrained diagonal elements.16
If \(x\) is a covariance matrix (i.e., a symmetric, positive definite matrix), then there is a unique lower-triangular matrix \(z = \mathrm{chol}(x)\) with positive diagonal entries, called a Cholesky factor, such that
\[ x = z \, z^{\top}. \]
The off-diagonal entries of the Cholesky factor \(z\) are unconstrained, but the diagonal entries \(z_{k,k}\) must be positive for \(1 \leq k \leq K\).
To complete the transform, the diagonal is log-transformed to produce a fully unconstrained lower-triangular matrix \(y\) defined by
\[ y_{m,n} = \left\{ \begin{array}{cl} 0 & \mbox{if } m < n, \\ \log z_{m,m} & \mbox{if } m = n, \mbox{ and} \\ z_{m,n} & \mbox{if } m > n. \end{array} \right. \]
Covariance Matrix Inverse Transform
The inverse transform reverses the two steps of the transform. Given an unconstrained lower-triangular \(K \times K\) matrix \(y\), the first step is to recover the intermediate matrix \(z\) by reversing the log transform,
\[ z_{m,n} = \left\{ \begin{array}{cl} 0 & \mbox{if } m < n, \\ \exp(y_{m,m}) & \mbox{if } m = n, \mbox{ and} \\ y_{m,n} & \mbox{if } m > n. \end{array} \right. \]
The covariance matrix \(x\) is recovered from its Cholesky factor \(z\) by taking
\[ x = z \, z^{\top}. \]
Absolute Jacobian Determinant of the Covariance Matrix Inverse Transform
The Jacobian is the product of the Jacobians of the exponential transform from the unconstrained lower-triangular matrix \(y\) to matrix \(z\) with positive diagonals and the product transform from the Cholesky factor \(z\) to \(x\).
The transform from unconstrained \(y\) to Cholesky factor \(z\) has a diagonal Jacobian matrix, the absolute determinant of which is thus
\[ \prod_{k=1}^K \frac{\partial}{\partial_{y_{k,k}}} \, \exp(y_{k,k}) \ = \ \prod_{k=1}^K \exp(y_{k,k}) \ = \ \prod_{k=1}^K z_{k,k}. \]
The Jacobian matrix of the second transform from the Cholesky factor \(z\) to the covariance matrix \(x\) is also triangular, with diagonal entries corresponding to pairs \((m,n)\) with \(m \geq n\), defined by
\[ \frac{\partial}{\partial z_{m,n}} \left( z \, z^{\top} \right)_{m,n} \ = \ \frac{\partial}{\partial z_{m,n}} \left( \sum_{k=1}^K z_{m,k} \, z_{n,k} \right) \ = \ \left\{ \begin{array}{cl} 2 \, z_{n,n} & \mbox{if } m = n \mbox{ and } \\ z_{n,n} & \mbox{if } m > n. \end{array} \right. \]
The absolute Jacobian determinant of the second transform is thus
\[ 2^{K} \ \prod_{m = 1}^{K} \ \prod_{n=1}^{m} z_{n,n} \ = \ \prod_{n=1}^K \ \prod_{m=n}^K z_{n,n} \ = \ 2^{K} \ \prod_{k=1}^K z_{k,k}^{K - k + 1}. \]
Finally, the full absolute Jacobian determinant of the inverse of the covariance matrix transform from the unconstrained lower-triangular \(y\) to a symmetric, positive definite matrix \(x\) is the product of the Jacobian determinants of the exponentiation and product transforms,
\[ \left( \prod_{k=1}^K z_{k,k} \right) \left( 2^{K} \ \prod_{k=1}^K z_{k,k}^{K - k + 1} \right) \ = \ 2^K \, \prod_{k=1}^K z_{k,k}^{K-k+2}. \]
Let \(f^{-1}\) be the inverse transform from a \(K + \binom{K}{2}\)-vector \(y\) to the \(K \times K\) covariance matrix \(x\). A density function \(p_X(x)\) defined on \(K \times K\) covariance matrices is transformed to the density \(p_Y(y)\) over \(K + \binom{K}{2}\) vectors \(y\) by
\[ p_Y(y) = p_X(f^{-1}(y)) \ 2^K \ \prod_{k=1}^K z_{k,k}^{K-k+2}. \]
An alternative to the transform in this section, which can be coded directly in Stan, is to parameterize a covariance matrix as a scaled correlation matrix. An arbitrary \(K \times K\) covariance matrix \(\Sigma\) can be expressed in terms of a \(K\)-vector \(\sigma\) and correlation matrix \(\Omega\) as \[\Sigma = \mbox{diag}(\sigma) \times \Omega \times \mbox{diag}(\sigma),\] so that each entry is just a deviation-scaled correlation, \[\Sigma_{m,n} = \sigma_m \times \sigma_n \times \Omega_{m,n}.\]↩