1.5 Logistic and probit regression

This is an old version, view current version.

1.5 Logistic and probit regression

For binary outcomes, either of the closely related logistic or probit regression models may be used. These generalized linear models vary only in the link function they use to map linear predictions in $(-\infty,\infty)$ to probability values in $(0,1)$ . Their respective link functions, the logistic function and the standard normal cumulative distribution function, are both sigmoid functions (i.e., they are both S-shaped).

A logistic regression model with one predictor and an intercept is coded as follows.

data {
  int<lower=0> N;
  vector[N] x;
  array[N] int<lower=0, upper=1> y;
}
parameters {
  real alpha;
  real beta;
}
model {
  y ~ bernoulli_logit(alpha + beta * x);
}

The noise parameter is built into the Bernoulli formulation here rather than specified directly.

Logistic regression is a kind of generalized linear model with binary outcomes and the log odds (logit) link function, defined by $\operatorname{logit}(v) = \log \left( \frac{v}{1-v} \right).$

The inverse of the link function appears in the model: $\operatorname{logit}^{-1}(u) = \texttt{inv}\mathtt{\_}\texttt{logit}(u) = \frac{1}{1 + \exp(-u)}.$

The model formulation above uses the logit-parameterized version of the Bernoulli distribution, which is defined by $\texttt{bernoulli}\mathtt{\_}\texttt{logit}\left(y \mid \alpha \right) = \texttt{bernoulli}\left(y \mid \operatorname{logit}^{-1}(\alpha)\right).$

The formulation is also vectorized in the sense that alpha and beta are scalars and x is a vector, so that alpha + beta * x is a vector. The vectorized formulation is equivalent to the less efficient version

for (n in 1:N) {
  y[n] ~ bernoulli_logit(alpha + beta * x[n]);
}

Expanding out the Bernoulli logit, the model is equivalent to the more explicit, but less efficient and less arithmetically stable

for (n in 1:N) {
  y[n] ~ bernoulli(inv_logit(alpha + beta * x[n]));
}

Other link functions may be used in the same way. For example, probit regression uses the cumulative normal distribution function, which is typically written as

$\Phi(x) = \int_{-\infty}^x \textsf{normal}\left(y \mid 0,1 \right) \,\textrm{d}y.$

The cumulative standard normal distribution function $\Phi$ is implemented in Stan as the function Phi. The probit regression model may be coded in Stan by replacing the logistic model’s sampling statement with the following.

y[n] ~ bernoulli(Phi(alpha + beta * x[n]));

A fast approximation to the cumulative standard normal distribution function $\Phi$ is implemented in Stan as the function Phi_approx.² The approximate probit regression model may be coded with the following.

y[n] ~ bernoulli(Phi_approx(alpha + beta * x[n]));

The Phi_approx function is a rescaled version of the inverse logit function, so while the scale is roughly the same $\Phi$ , the tails do not match.↩︎