This is an old version, view current version.

12.6 Categorical Logit Generalized Linear Model (Softmax Regression)

Stan also supplies a single function for a generalized linear model with categorical likelihood and logit link function, i.e. a function for a softmax regression. This provides a more efficient implementation of softmax regression than a manually written regression in terms of a Categorical likelihood and matrix multiplication.

Note that the implementation does not put any restrictions on the coefficient matrix \(\beta\). It is up to the user to use a reference category, a suitable prior or some other means of identifiability. See Multi-logit in the Stan User’s Guide.

12.6.1 Probability Mass Functions

If \(N,M,K \in \mathbb{N}\), \(N,M,K > 0\), and if \(x\in \mathbb{R}^{M\cdot K}, \alpha \in \mathbb{R}^N, \beta\in \mathbb{R}^{K\cdot N}\), then for \(y \in \{1,\ldots,N\}^M\), \[ \text{CategoricalLogitGLM}(y~|~x,\alpha,\beta) = \\[5pt] \prod_{1\leq i \leq M}\text{CategoricalLogit}(y_i~|~\alpha+x_i\cdot\beta) = \\[15pt] \prod_{1\leq i \leq M}\text{Categorical}(y_i~|~softmax(\alpha+x_i\cdot\beta)). \] See the definition of softmax for the definition of the softmax function.

12.6.2 Sampling Statement

y ~ categorical_logit_glm(x, alpha, beta)

Increment target log probability density with categorical_logit_glm(y | x, alpha, beta) dropping constant additive terms.

12.6.3 Stan Functions

real categorical_logit_glm_lpmf(int y | row_vector x, vector alpha, matrix beta)
The log categorical probability mass function with outcome y in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta. The size of the independent variable row vector x needs to match the number of rows of the coefficient matrix beta. The size of the intercept vector alpha must match the number of columns of the coefficient matrix beta.

real categorical_logit_glm_lpmf(int y | matrix x, vector alpha, matrix beta)
The log categorical probability mass function with outcomes y in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta. The same vector of intercepts alpha and the same dependent variable value y are used for all instances. The number of columns of the independent variable x needs to match the number of rows of the coefficient matrix beta. The size of the intercept vector alpha must match the number of columns of the coefficient matrix beta. If x and y are data (not parameters) this function can be executed on a GPU.

real categorical_logit_glm_lpmf(int[] y | row_vector x, vector alpha, matrix beta)
The log categorical probability mass function with outcomes y in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta. The same vector of intercepts alpha and same row vector of the independent variables x are used for all instances. The size of the independent variable matrix x needs to match the number of rows of the coefficient vector beta. The size of the intercept vector alpha must match the number of columns of the coefficient vector beta.

real categorical_logit_glm_lpmf(int[] y | matrix x, vector alpha, matrix beta)
The log categorical probability mass function with outcomes y in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta. The same vector of intercepts alpha is used for all instances. The number of rows of the independent variable matrix x needs to match the size of the dependent variable vector y. The number of columns of independnt variable x needs to match the number of rows of the coefficient matrix beta. The size of the intercept vector alpha must match the number of columns of the coefficient matrix beta. If x and y are data (not parameters) this function can be executed on a GPU.