1.8 Ordered Logistic and Probit Regression

Ordered regression for an outcome \(y_n \in \{ 1,\ldots, k \}\) with predictors \(x_n \in \mathbb{R}^D\) is determined by a single coefficient vector \(\beta \in \mathbb{R}^D\) along with a sequence of cutpoints \(c \in \mathbb{R}^{K-1}\) sorted so that \(c_d < c_{d+1}\). The discrete output is \(k\) if the linear predictor \(x_n \beta\) falls between \(c_{k-1}\) and \(c_k\), assuming \(c_0 = -\infty\) and \(c_K = \infty\). The noise term is fixed by the form of regression, with examples for ordered logistic and ordered probit models.

Ordered Logistic Regression

The ordered logistic model can be coded in Stan using the ordered data type for the cutpoints and the built-in ordered_logistic distribution.

data {
  int<lower=2> K;
  int<lower=0> N;
  int<lower=1> D;
  int<lower=1,upper=K> y[N];
  row_vector[D] x[N];
}
parameters {
  vector[D] beta;
  ordered[K-1] c;
}
model {
  for (n in 1:N)
    y[n] ~ ordered_logistic(x[n] * beta, c);
}

The vector of cutpoints c is declared as ordered[K-1], which guarantees that c[k] is less than c[k+1].

If the cutpoints were assigned independent priors, the constraint effectively truncates the joint prior to support over points that satisfy the ordering constraint. Luckily, Stan does not need to compute the effect of the constraint on the normalizing term because the probability is needed only up to a proportion.

Ordered Probit

An ordered probit model could be coded in exactly the same way by swapping the cumulative logistic (inv_logit) for the cumulative normal (Phi).

data {
  int<lower=2> K;
  int<lower=0> N;
  int<lower=1> D;
  int<lower=1,upper=K> y[N];
  row_vector[D] x[N];
}
parameters {
  vector[D] beta;
  ordered[K-1] c;
}
model {
  vector[K] theta;
  for (n in 1:N) {
    real eta;
    eta = x[n] * beta;
    theta[1] = 1 - Phi(eta - c[1]);
    for (k in 2:(K-1))
      theta[k] = Phi(eta - c[k-1]) - Phi(eta - c[k]);
    theta[K] = Phi(eta - c[K-1]);
    y[n] ~ categorical(theta);
  }
}

The logistic model could also be coded this way by replacing Phi with inv_logit, though the built-in encoding based on the softmax transform is more efficient and more numerically stable. A small efficiency gain could be achieved by computing the values Phi(eta - c[k]) once and storing them for re-use.