10.10 Covariance matrices

This is an old version, view current version.

A $K \times K$ matrix is a covariance matrix if it is symmetric and positive definite (see the previous section for definitions). It requires $K + \binom{K}{2}$ free parameters to specify a $K \times K$ covariance matrix.

Covariance matrix transform

Stan’s covariance transform is based on a Cholesky decomposition composed with a log transform of the positive-constrained diagonal elements.¹⁶

If $x$ is a covariance matrix (i.e., a symmetric, positive definite matrix), then there is a unique lower-triangular matrix $z = \mathrm{chol}(x)$ with positive diagonal entries, called a Cholesky factor, such that

$x = z \, z^{\top}.$

The off-diagonal entries of the Cholesky factor $z$ are unconstrained, but the diagonal entries $z_{k,k}$ must be positive for $1 \leq k \leq K$ .

To complete the transform, the diagonal is log-transformed to produce a fully unconstrained lower-triangular matrix $y$ defined by

$y_{m,n} = \left\{ \begin{array}{cl} 0 & \mbox{if } m < n, \\ \log z_{m,m} & \mbox{if } m = n, \mbox{ and} \\ z_{m,n} & \mbox{if } m > n. \end{array} \right.$

Covariance matrix inverse transform

The inverse transform reverses the two steps of the transform. Given an unconstrained lower-triangular $K \times K$ matrix $y$ , the first step is to recover the intermediate matrix $z$ by reversing the log transform,

$z_{m,n} = \left\{ \begin{array}{cl} 0 & \mbox{if } m < n, \\ \exp(y_{m,m}) & \mbox{if } m = n, \mbox{ and} \\ y_{m,n} & \mbox{if } m > n. \end{array} \right.$

The covariance matrix $x$ is recovered from its Cholesky factor $z$ by taking

$x = z \, z^{\top}.$

Absolute Jacobian determinant of the covariance matrix inverse transform

The Jacobian is the product of the Jacobians of the exponential transform from the unconstrained lower-triangular matrix $y$ to matrix $z$ with positive diagonals and the product transform from the Cholesky factor $z$ to $x$ .

The transform from unconstrained $y$ to Cholesky factor $z$ has a diagonal Jacobian matrix, the absolute determinant of which is thus

$\prod_{k=1}^K \frac{\partial}{\partial_{y_{k,k}}} \, \exp(y_{k,k}) \ = \ \prod_{k=1}^K \exp(y_{k,k}) \ = \ \prod_{k=1}^K z_{k,k}.$

The Jacobian matrix of the second transform from the Cholesky factor $z$ to the covariance matrix $x$ is also triangular, with diagonal entries corresponding to pairs $(m,n)$ with $m \geq n$ , defined by

$\frac{\partial}{\partial z_{m,n}} \left( z \, z^{\top} \right)_{m,n} \ = \ \frac{\partial}{\partial z_{m,n}} \left( \sum_{k=1}^K z_{m,k} \, z_{n,k} \right) \ = \ \left\{ \begin{array}{cl} 2 \, z_{n,n} & \mbox{if } m = n \mbox{ and } \\ z_{n,n} & \mbox{if } m > n. \end{array} \right.$

The absolute Jacobian determinant of the second transform is thus

$2^{K} \ \prod_{m = 1}^{K} \ \prod_{n=1}^{m} z_{n,n} \ = \ \prod_{n=1}^K \ \prod_{m=n}^K z_{n,n} \ = \ 2^{K} \ \prod_{k=1}^K z_{k,k}^{K - k + 1}.$

Finally, the full absolute Jacobian determinant of the inverse of the covariance matrix transform from the unconstrained lower-triangular $y$ to a symmetric, positive definite matrix $x$ is the product of the Jacobian determinants of the exponentiation and product transforms,

$\left( \prod_{k=1}^K z_{k,k} \right) \left( 2^{K} \ \prod_{k=1}^K z_{k,k}^{K - k + 1} \right) \ = \ 2^K \, \prod_{k=1}^K z_{k,k}^{K-k+2}.$

Let $f^{-1}$ be the inverse transform from a $K + \binom{K}{2}$ -vector $y$ to the $K \times K$ covariance matrix $x$ . A density function $p_X(x)$ defined on $K \times K$ covariance matrices is transformed to the density $p_Y(y)$ over $K + \binom{K}{2}$ vectors $y$ by

$p_Y(y) = p_X(f^{-1}(y)) \ 2^K \ \prod_{k=1}^K z_{k,k}^{K-k+2}.$

An alternative to the transform in this section, which can be coded directly in Stan, is to parameterize a covariance matrix as a scaled correlation matrix. An arbitrary $K \times K$ covariance matrix $\Sigma$ can be expressed in terms of a $K$ -vector $\sigma$ and correlation matrix $\Omega$ as $\Sigma = \mbox{diag}(\sigma) \times \Omega \times \mbox{diag}(\sigma),$ so that each entry is just a deviation-scaled correlation, $\Sigma_{m,n} = \sigma_m \times \sigma_n \times \Omega_{m,n}.$ ↩