Stan Reference Manual

This is an old version, view current version.

10.11 Cholesky Factors of Correlation Matrices

A \(K \times K\) correlation matrix \(\Omega\) is positive definite and has a unit diagonal. Because it is positive definite, it can be Cholesky factored to a \(K \times K\) lower-triangular matrix \(L\) with positive diagonal elements such that \(\Omega = L\,L^{\top}\). Because the correlation matrix has a unit diagonal,

\[ \Omega_{k,k} = L_k\,L_k^{\top} = 1, \]

each row vector \(L_k\) of the Cholesky factor is of unit length. The length and positivity constraint allow the diagonal elements of \(L\) to be calculated from the off-diagonal elements, so that a Cholesky factor for a \(K \times K\) correlation matrix requires only \(\binom{K}{2}\) unconstrained parameters.

Cholesky Factor of Correlation Matrix Inverse Transform

It is easiest to start with the inverse transform from the \(\binom{K}{2}\) unconstrained parameters \(y\) to the \(K \times K\) lower-triangular Cholesky factor \(x\). The inverse transform is based on the hyperbolic tangent function, \(\tanh\), which satisfies \(\tanh(x) \in (-1,1)\). Here it will function like an inverse logit with a sign to pick out the direction of an underlying canonical partial correlation; see the section on correlation matrix transforms for more information on the relation between canonical partial correlations and the Cholesky factors of correlation matrices.

Suppose \(y\) is a vector of \(\binom{K}{2}\) unconstrained values. Let \(z\) be a lower-triangular matrix with zero diagonal and below diagonal entries filled by row. For example, in the \(3 \times 3\) case,

\[ z = \left[ \begin{array}{ccc} 0 & 0 & 0 \\ \tanh y_1 & 0 & 0 \\ \tanh y_2 & \tanh y_3 & 0 \end{array} \right] \]

The matrix \(z\), with entries in the range \((-1, 1)\), is then transformed to the Cholesky factor \(x\), by taking¹⁷

\[ x_{i,j} = \left\{ \begin{array}{lll} 0 & \mbox{ if } i < j & \mbox{ [above diagonal]} \\ \sqrt{1 - \sum_{j' < j} x_{i,j'}^2} & \mbox{ if } i = j & \mbox{ [on diagonal]} \\ z_{i,j} \ \sqrt{1 - \sum_{j' < j} x_{i,j'}^2} & \mbox{ if } i > j & \mbox{ [below diagonal]} \end{array} \right. \]

In the \(3 \times 3\) case, this yields

\[ x = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ z_{2,1} & \sqrt{1 - x_{2,1}^2} & 0 \\ z_{3,1} & z_{3,2} \sqrt{1 - x_{3,1}^2} & \sqrt{1 - (x_{3,1}^2 + x_{3,2}^2)} \end{array} \right], \]

where the \(z_{i,j} \in (-1,1)\) are the \(\tanh\)-transformed \(y\).

The approach is a signed stick-breaking process on the quadratic (Euclidean length) scale. Starting from length 1 at \(j=1\), each below-diagonal entry \(x_{i,j}\) is determined by the (signed) fraction \(z_{i,j}\) of the remaining length for the row that it consumes. The diagonal entries \(x_{i,i}\) get any leftover length from earlier entries in their row. The above-diagonal entries are zero.

Cholesky Factor of Correlation Matrix Transform

Suppose \(x\) is a \(K \times K\) Cholesky factor for some correlation matrix. The first step of the transform reconstructs the intermediate values \(z\) from \(x\),

\[ z_{i,j} = \frac{x_{i,j}}{\sqrt{1 - \sum_{j' < j}x_{i,j'}^2}}. \]

The mapping from the resulting \(z\) to \(y\) inverts \(\tanh\),

\[ y \ = \ \tanh^{-1} z \ = \ \frac{1}{2} \left( \log (1 + z) - \log (1 - z) \right). \]

Absolute Jacobian Determinant of Inverse Transform

The Jacobian of the full transform is the product of the Jacobians of its component transforms.

First, for the inverse transform \(z = \tanh y\), the derivative is

\[ \frac{d}{dy} \tanh y = \frac{1}{(\cosh y)^2}. \]

Second, for the inverse transform of \(z\) to \(x\), the resulting Jacobian matrix \(J\) is of dimension \(\binom{K}{2} \times \binom{K}{2}\), with indexes \((i,j)\) for \((i > j)\). The Jacobian matrix is lower triangular, so that its determinant is the product of its diagonal entries, of which there is one for each \((i,j)\) pair,

\[ \left| \, \mbox{det} \, J \, \right| \ = \ \prod_{i > j} \left| \frac{d}{dz_{i,j}} x_{i,j} \right|, \]

where

\[ \frac{d}{dz_{i,j}} x_{i,j} = \sqrt{1 - \sum_{j' < j} x^2_{i,j'}}. \]

So the combined density for unconstrained \(y\) is

\[ p_Y(y) = p_X(f^{-1}(y)) \ \ \prod_{n < \binom{K}{2}} \frac{1}{(\cosh y)^2} \ \ \prod_{i > j} \left( 1 - \sum_{j' < j} x_{i,j'}^2 \right)^{1/2}, \]

where \(x = f^{-1}(y)\) is used for notational convenience. The log Jacobian determinant of the complete inverse transform \(x = f^{-1}(y)\) is given by

\[ \log \left| \, \det J \, \right| = -2 \sum_{n \leq \binom{K}{2}} \log \cosh y \ + \ \frac{1}{2} \ \sum_{i > j} \log \left( 1 - \sum_{j' < j} x_{i,j'}^2 \right) . \]

For convenience, a summation with no terms, such as \(\sum_{j' < 1} x_{i,j'}\), is defined to be 0. This implies \(x_{1,1} = 1\) and that \(x_{i,1} = z_{i,1}\) for \(i > 1\).↩