1.8 Ordered Logistic and Probit Regression
Ordered regression for an outcome \(y_n \in \{ 1,\ldots, k \}\) with predictors \(x_n \in \mathbb{R}^D\) is determined by a single coefficient vector \(\beta \in \mathbb{R}^D\) along with a sequence of cutpoints \(c \in \mathbb{R}^{K-1}\) sorted so that \(c_d < c_{d+1}\). The discrete output is \(k\) if the linear predictor \(x_n \beta\) falls between \(c_{k-1}\) and \(c_k\), assuming \(c_0 = -\infty\) and \(c_K = \infty\). The noise term is fixed by the form of regression, with examples for ordered logistic and ordered probit models.
Ordered Logistic Regression
The ordered logistic model can be coded in Stan using the
ordered
data type for the cutpoints and the built-in
ordered_logistic
distribution.
data {
int<lower=2> K;
int<lower=0> N;
int<lower=1> D;
int<lower=1,upper=K> y[N];
row_vector[D] x[N];
}
parameters {
vector[D] beta;
ordered[K-1] c;
}
model {
for (n in 1:N)
y[n] ~ ordered_logistic(x[n] * beta, c);
}
The vector of cutpoints c
is declared as ordered[K-1]
,
which guarantees that c[k]
is less than c[k+1]
.
If the cutpoints were assigned independent priors, the constraint effectively truncates the joint prior to support over points that satisfy the ordering constraint. Luckily, Stan does not need to compute the effect of the constraint on the normalizing term because the probability is needed only up to a proportion.
Ordered Probit
An ordered probit model could be coded in exactly the same way by
swapping the cumulative logistic (inv_logit
) for the cumulative
normal (Phi
).
data {
int<lower=2> K;
int<lower=0> N;
int<lower=1> D;
int<lower=1,upper=K> y[N];
row_vector[D] x[N];
}
parameters {
vector[D] beta;
ordered[K-1] c;
}
model {
vector[K] theta;
for (n in 1:N) {
real eta;
eta = x[n] * beta;
theta[1] = 1 - Phi(eta - c[1]);
for (k in 2:(K-1))
theta[k] = Phi(eta - c[k-1]) - Phi(eta - c[k]);
theta[K] = Phi(eta - c[K-1]);
y[n] ~ categorical(theta);
}
}
The logistic model could also be coded this way by replacing
Phi
with inv_logit
, though the built-in encoding based
on the softmax transform is more efficient and more numerically
stable. A small efficiency gain could be achieved by computing the
values Phi(eta - c[k])
once and storing them for re-use.