Stan User’s Guide

This is an old version, view current version.

1.7 Parameterizing Centered Vectors

It is often convenient to define a parameter vector \(\beta\) that is centered in the sense of satisfying the sum-to-zero constraint,

\[ \sum_{k=1}^K \beta_k = 0. \]

Such a parameter vector may be used to identify a multi-logit regression parameter vector (see the multi-logit section for details), or may be used for ability or difficulty parameters (but not both) in an IRT model (see the item-response model section for details).

\(K-1\) Degrees of Freedom

There is more than one way to enforce a sum-to-zero constraint on a parameter vector, the most efficient of which is to define the \(K\)-th element as the negation of the sum of the elements \(1\) through \(K-1\).

parameters {
  vector[K-1] beta_raw;
  ...
transformed parameters {
  vector[K] beta = append_row(beta_raw, -sum(beta_raw));
  ...

Placing a prior on beta_raw in this parameterization leads to a subtly different posterior than that resulting from the same prior on beta in the original parameterization without the sum-to-zero constraint. Most notably, a simple prior on each component of beta_raw produces different results than putting the same prior on each component of an unconstrained \(K\)-vector beta. For example, providing a \(\mathsf{normal}(0,5)\) prior on beta will produce a different posterior mode than placing the same prior on beta_raw.

Marginal distribution of sum-to-zero components

On the Stan forums, Aaron Goodman provided the following code to produce a prior with standard normal marginals on the components of beta,

model {
  beta ~ normal(0, inv(sqrt(1 - inv(K))));
  ...

The components are not independent, as they must sum zero. No Jacobian is required because summation and negation are linear operations (and thus have constant Jacobians).

To generate distributions with marginals other than standard normal, the resulting beta may be scaled by some factor sigma and translated to some new location mu.

QR Decomposition

Aaron Goodman, on the Stan forums, also provided this approach, which calculates a QR decomposition in the transformed data block, then uses it to transform to a sum-to-zero parameter x,

transformed data{
  matrix[K, K] A = diag_matrix(rep_vector(1,K));
  matrix[K, K-1] A_qr;
  for (i in 1:K-1) A[K,i] = -1;
  A[K,K] = 0;
  A_qr = qr_Q(A)[ , 1:(K-1)];
}
parameters {
  vector[K-1] beta_raw;
}
transformed parameters{
   vector[K] beta =  A_qr * beta_raw;
}
model {
  beta_raw ~ normal(0, inv(sqrt(1 - inv(K))));
}

This produces a marginal standard normal distribution on the values of beta, which will sum to zero by construction of the QR decomposition.

Translated and Scaled Simplex

An alternative approach that’s less efficient, but amenable to a symmetric prior, is to offset and scale a simplex.

parameters {
  simplex[K] beta_raw;
  real beta_scale;
  ...
transformed parameters {
  vector[K] beta;
  beta = beta_scale * (beta_raw - inv(K));
  ...

Here inv(K) is just a short way to write 1.0~/~K. Given that beta_raw sums to 1 because it is a simplex, the elementwise subtraction of inv(K) is guaranteed to sum to zero. Because the magnitude of the elements of the simplex is bounded, a scaling factor is required to provide beta with \(K\) degrees of freedom necessary to take on every possible value that sums to zero.

With this parameterization, a Dirichlet prior can be placed on beta_raw, perhaps uniform, and another prior put on beta_scale, typically for “shrinkage.”

Soft Centering

Adding a prior such as \(\beta \sim \mathsf{normal}(0,\sigma)\) will provide a kind of soft centering of a parameter vector \(\beta\) by preferring, all else being equal, that \(\sum_{k=1}^K \beta_k = 0\). This approach is only guaranteed to roughly center if \(\beta\) and the elementwise addition \(\beta + c\) for a scalar constant \(c\) produce the same likelihood (perhaps by another vector \(\alpha\) being transformed to \(\alpha - c\), as in the IRT models). This is another way of achieving a symmetric prior.