## 10.10 Covariance matrices

A $$K \times K$$ matrix is a covariance matrix if it is symmetric and positive definite (see the previous section for definitions). It requires $$K + \binom{K}{2}$$ free parameters to specify a $$K \times K$$ covariance matrix.

### Covariance matrix transform

Stan’s covariance transform is based on a Cholesky decomposition composed with a log transform of the positive-constrained diagonal elements.15

If $$x$$ is a covariance matrix (i.e., a symmetric, positive definite matrix), then there is a unique lower-triangular matrix $$z = \mathrm{chol}(x)$$ with positive diagonal entries, called a Cholesky factor, such that

$x = z \, z^{\top}.$

The off-diagonal entries of the Cholesky factor $$z$$ are unconstrained, but the diagonal entries $$z_{k,k}$$ must be positive for $$1 \leq k \leq K$$.

To complete the transform, the diagonal is log-transformed to produce a fully unconstrained lower-triangular matrix $$y$$ defined by

$y_{m,n} = \left\{ \begin{array}{cl} 0 & \mbox{if } m < n, \\ \log z_{m,m} & \mbox{if } m = n, \mbox{ and} \\ z_{m,n} & \mbox{if } m > n. \end{array} \right.$

### Covariance matrix inverse transform

The inverse transform reverses the two steps of the transform. Given an unconstrained lower-triangular $$K \times K$$ matrix $$y$$, the first step is to recover the intermediate matrix $$z$$ by reversing the log transform,

$z_{m,n} = \left\{ \begin{array}{cl} 0 & \mbox{if } m < n, \\ \exp(y_{m,m}) & \mbox{if } m = n, \mbox{ and} \\ y_{m,n} & \mbox{if } m > n. \end{array} \right.$

The covariance matrix $$x$$ is recovered from its Cholesky factor $$z$$ by taking

$x = z \, z^{\top}.$

### Absolute Jacobian determinant of the covariance matrix inverse transform

The Jacobian is the product of the Jacobians of the exponential transform from the unconstrained lower-triangular matrix $$y$$ to matrix $$z$$ with positive diagonals and the product transform from the Cholesky factor $$z$$ to $$x$$.

The transform from unconstrained $$y$$ to Cholesky factor $$z$$ has a diagonal Jacobian matrix, the absolute determinant of which is thus

$\prod_{k=1}^K \frac{\partial}{\partial_{y_{k,k}}} \, \exp(y_{k,k}) \ = \ \prod_{k=1}^K \exp(y_{k,k}) \ = \ \prod_{k=1}^K z_{k,k}.$

The Jacobian matrix of the second transform from the Cholesky factor $$z$$ to the covariance matrix $$x$$ is also triangular, with diagonal entries corresponding to pairs $$(m,n)$$ with $$m \geq n$$, defined by

$\frac{\partial}{\partial z_{m,n}} \left( z \, z^{\top} \right)_{m,n} \ = \ \frac{\partial}{\partial z_{m,n}} \left( \sum_{k=1}^K z_{m,k} \, z_{n,k} \right) \ = \ \left\{ \begin{array}{cl} 2 \, z_{n,n} & \mbox{if } m = n \mbox{ and } \\ z_{n,n} & \mbox{if } m > n. \end{array} \right.$

The absolute Jacobian determinant of the second transform is thus

$2^{K} \ \prod_{m = 1}^{K} \ \prod_{n=1}^{m} z_{n,n} \ = \ \prod_{n=1}^K \ \prod_{m=n}^K z_{n,n} \ = \ 2^{K} \ \prod_{k=1}^K z_{k,k}^{K - k + 1}.$

Finally, the full absolute Jacobian determinant of the inverse of the covariance matrix transform from the unconstrained lower-triangular $$y$$ to a symmetric, positive definite matrix $$x$$ is the product of the Jacobian determinants of the exponentiation and product transforms,

$\left( \prod_{k=1}^K z_{k,k} \right) \left( 2^{K} \ \prod_{k=1}^K z_{k,k}^{K - k + 1} \right) \ = \ 2^K \, \prod_{k=1}^K z_{k,k}^{K-k+2}.$

Let $$f^{-1}$$ be the inverse transform from a $$K + \binom{K}{2}$$-vector $$y$$ to the $$K \times K$$ covariance matrix $$x$$. A density function $$p_X(x)$$ defined on $$K \times K$$ covariance matrices is transformed to the density $$p_Y(y)$$ over $$K + \binom{K}{2}$$ vectors $$y$$ by

$p_Y(y) = p_X(f^{-1}(y)) \ 2^K \ \prod_{k=1}^K z_{k,k}^{K-k+2}.$

1. An alternative to the transform in this section, which can be coded directly in Stan, is to parameterize a covariance matrix as a scaled correlation matrix. An arbitrary $$K \times K$$ covariance matrix $$\Sigma$$ can be expressed in terms of a $$K$$-vector $$\sigma$$ and correlation matrix $$\Omega$$ as $\Sigma = \mbox{diag}(\sigma) \times \Omega \times \mbox{diag}(\sigma),$ so that each entry is just a deviation-scaled correlation, $\Sigma_{m,n} = \sigma_m \times \sigma_n \times \Omega_{m,n}.$↩︎