1.7 Parameterizing Centered Vectors
It is often convenient to define a parameter vector \(\beta\) that is centered in the sense of satisfying the sum-to-zero constraint,
\[ \sum_{k=1}^K \beta_k = 0. \]
Such a parameter vector may be used to identify a multi-logit regression parameter vector (see the multi-logit section for details), or may be used for ability or difficulty parameters (but not both) in an IRT model (see the item-response model section for details).
\(K-1\) Degrees of Freedom
There is more than one way to enforce a sum-to-zero constraint on a parameter vector, the most efficient of which is to define the \(K\)-th element as the negation of the sum of the elements \(1\) through \(K-1\).
parameters {
vector[K-1] beta_raw;
...
transformed parameters {
vector[K] beta = append_row(beta_raw, -sum(beta_raw));
...
Placing a prior on beta_raw
in this parameterization leads to
a subtly different posterior than that resulting from the same prior
on beta
in the original parameterization without the
sum-to-zero constraint. Most notably, a simple prior on each
component of beta_raw
produces different results than putting
the same prior on each component of an unconstrained \(K\)-vector
beta
. For example, providing a \(\mathsf{normal}(0,5)\) prior
on beta
will produce a different posterior mode than placing
the same prior on beta_raw
.
Marginal distribution of sum-to-zero components
On the Stan forums, Aaron Goodman provided the following code to
produce a prior with standard normal marginals on the components of
beta
,
model {
beta ~ normal(0, inv(sqrt(1 - inv(K))));
...
The components are not independent, as they must sum zero. No Jacobian is required because summation and negation are linear operations (and thus have constant Jacobians).
To generate distributions with marginals other than standard normal,
the resulting beta
may be scaled by some factor sigma
and
translated to some new location mu
.
QR Decomposition
Aaron Goodman, on the Stan forums, also provided this approach, which
calculates a QR decomposition in the transformed data block, then uses
it to transform to a sum-to-zero parameter x
,
transformed data{
matrix[K, K] A = diag_matrix(rep_vector(1,K));
matrix[K, K-1] A_qr;
for (i in 1:K-1) A[K,i] = -1;
A[K,K] = 0;
A_qr = qr_Q(A)[ , 1:(K-1)];
}
parameters {
vector[K-1] beta_raw;
}
transformed parameters{
vector[K] beta = A_qr * beta_raw;
}
model {
beta_raw ~ normal(0, inv(sqrt(1 - inv(K))));
}
This produces a marginal standard normal distribution on the values of
beta
, which will sum to zero by construction of the QR decomposition.
Translated and Scaled Simplex
An alternative approach that’s less efficient, but amenable to a symmetric prior, is to offset and scale a simplex.
parameters {
simplex[K] beta_raw;
real beta_scale;
...
transformed parameters {
vector[K] beta;
beta = beta_scale * (beta_raw - inv(K));
...
Here inv(K)
is just a short way to write 1.0~/~K
. Given
that beta_raw
sums to 1 because it is a simplex, the
elementwise subtraction of inv(K)
is guaranteed to sum to zero.
Because the magnitude of the elements of the simplex is bounded, a
scaling factor is required to provide beta
with \(K\) degrees of
freedom necessary to take on every possible value that sums to zero.
With this parameterization, a Dirichlet prior can be placed on
beta_raw
, perhaps uniform, and another prior put on
beta_scale
, typically for “shrinkage.”
Soft Centering
Adding a prior such as \(\beta \sim \mathsf{normal}(0,\sigma)\) will provide a kind of soft centering of a parameter vector \(\beta\) by preferring, all else being equal, that \(\sum_{k=1}^K \beta_k = 0\). This approach is only guaranteed to roughly center if \(\beta\) and the elementwise addition \(\beta + c\) for a scalar constant \(c\) produce the same likelihood (perhaps by another vector \(\alpha\) being transformed to \(\alpha - c\), as in the IRT models). This is another way of achieving a symmetric prior.