This is an old version, view current version.

10.7 Unit Simplex

Variables constrained to the unit simplex show up in multivariate discrete models as both parameters (categorical and multinomial) and as variates generated by their priors (Dirichlet and multivariate logistic).

The unit K-simplex is the set of points xRK such that for 1kK,

xk>0,

and

Kk=1xk=1.

An alternative definition is to take the convex closure of the vertices. For instance, in 2-dimensions, the simplex vertices are the extreme values (0,1), and (1,0) and the unit 2-simplex is the line connecting these two points; values such as (0.3,0.7) and (0.99,0.01) lie on the line. In 3-dimensions, the basis is (0,0,1), (0,1,0) and (1,0,0) and the unit 3-simplex is the boundary and interior of the triangle with these vertices. Points in the 3-simplex include (0.5,0.5,0), (0.2,0.7,0.1) and all other triplets of non-negative values summing to 1.

As these examples illustrate, the simplex always picks out a subspace of K1 dimensions from RK. Therefore a point x in the K-simplex is fully determined by its first K1 elements x1,x2,,xK1, with

xK=1K1k=1xk.

Unit Simplex Inverse Transform

Stan’s unit simplex inverse transform may be understood using the following stick-breaking metaphor.15

  1. Take a stick of unit length (i.e., length 1).
  2. Break a piece off and label it as x1, and set it aside, keeping what’s left.
  3. Next, break a piece off what’s left, label it x2, and set it aside, keeping what’s left.
  4. Continue breaking off pieces of what’s left, labeling them, and setting them aside for pieces x3,,xK1.
  5. Label what’s left xK.

The resulting vector x=[x1,,xK] is a unit simplex because each piece has non-negative length and the sum of the stick lengths is one by construction.

This full inverse mapping requires the breaks to be represented as the fraction in (0,1) of the original stick that is broken off. These break ratios are themselves derived from unconstrained values in (,) using the inverse logit transform as described above for unidimensional variables with lower and upper bounds.

More formally, an intermediate vector zRK1, whose coordinates zk represent the proportion of the stick broken off in step k, is defined elementwise for 1k<K by

zk=logit1(yk+log(1Kk)).

The logit term log(1Kk)(i.e.,logit(1Kk+1)) in the above definition adjusts the transform so that a zero vector y is mapped to the simplex x=(1/K,,1/K). For instance, if y1=0, then z1=1/K; if y2=0, then z2=1/(K1); and if yK1=0, then zK1=1/2.

The break proportions z are applied to determine the stick sizes and resulting value of xk for 1k<K by

xk=(1k1k=1xk)zk.

The summation term represents the length of the original stick left at stage k. This is multiplied by the break proportion zk to yield xk. Only K1 unconstrained parameters are required, with the last dimension’s value xK set to the length of the remaining piece of the original stick,

xK=1K1k=1xk.

Absolute Jacobian Determinant of the Unit-Simplex Inverse Transform

The Jacobian J of the inverse transform f1 is lower-triangular, with diagonal entries

Jk,k=xkyk=xkzkzkyk,

where

zkyk=yklogit1(yk+log(1Kk))=zk(1zk),

and

xkzk=(1k1k=1xk).

This definition is recursive, defining xk in terms of x1,,xk1.

Because the Jacobian J of f1 is lower triangular and positive, its absolute determinant reduces to

|det

Thus the transformed variable Y = f(X) has a density given by

p_Y(y) = p_X(f^{-1}(y)) \, \prod_{k=1}^{K-1} z_k \, (1 - z_k) \ \left( 1 - \sum_{k'=1}^{k-1} x_{k'} \right) .

Even though it is expressed in terms of intermediate values z_k, this expression still looks more complex than it is. The exponential function need only be evaluated once for each unconstrained parameter y_k; everything else is just basic arithmetic that can be computed incrementally along with the transform.

Unit Simplex Transform

The transform Y = f(X) can be derived by reversing the stages of the inverse transform. Working backwards, given the break proportions z, y is defined elementwise by

y_k = \mathrm{logit}(z_k) - \mbox{log}\left( \frac{1}{K-k} \right) .

The break proportions z_k are defined to be the ratio of x_k to the length of stick left after the first k-1 pieces have been broken off,

z_k = \frac{x_k} {1 - \sum_{k' = 1}^{k-1} x_{k'}} .


  1. For an alternative derivation of the same transform using hyperspherical coordinates, see (Betancourt 2010).↩︎