12.6 Categorical Logit Generalized Linear Model (Softmax Regression)
Stan also supplies a single function for a generalized linear model with categorical likelihood and logit link function, i.e. a function for a softmax regression. This provides a more efficient implementation of softmax regression than a manually written regression in terms of a Categorical likelihood and matrix multiplication.
Note that the implementation does not put any restrictions on the coefficient matrix \(\beta\). It is up to the user to use a reference category, a suitable prior or some other means of identifiability. See Multi-logit in the Stan User’s Guide.
12.6.1 Probability Mass Functions
If \(N,M,K \in \mathbb{N}\), \(N,M,K > 0\), and if \(x\in \mathbb{R}^{M\cdot K}, \alpha \in \mathbb{R}^N, \beta\in \mathbb{R}^{K\cdot N}\), then for \(y \in \{1,\ldots,N\}^M\), \[ \text{CategoricalLogitGLM}(y~|~x,\alpha,\beta) = \\[5pt] \prod_{1\leq i \leq M}\text{CategoricalLogit}(y_i~|~\alpha+x_i\cdot\beta) = \\[15pt] \prod_{1\leq i \leq M}\text{Categorical}(y_i~|~softmax(\alpha+x_i\cdot\beta)). \] See the definition of softmax for the definition of the softmax function.
12.6.2 Sampling Statement
y ~
categorical_logit_glm
(x, alpha, beta)
Increment target log probability density with categorical_logit_glm(y | x, alpha, beta)
dropping constant additive terms.
12.6.3 Stan Functions
real
categorical_logit_glm_lpmf
(int y | row_vector x, vector alpha, matrix beta)
The log categorical probability mass function with outcome y
in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta
. The size of the independent variable row vector x
needs to match the number of rows of the coefficient matrix beta
. The size of the intercept vector alpha
must match the number of columns of the coefficient matrix beta
.
real
categorical_logit_glm_lpmf
(int y | matrix x, vector alpha, matrix beta)
The log categorical probability mass function with outcomes y
in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta
. The same vector of intercepts alpha
and the same dependent variable value y
are used for all instances. The number of columns of the independent variable x
needs to match the number of rows of the coefficient matrix beta
. The size of the intercept vector alpha
must match the number of columns of the coefficient matrix beta
. If x
and y
are data (not parameters) this function can be executed on a GPU.
real
categorical_logit_glm_lpmf
(int[] y | row_vector x, vector alpha, matrix beta)
The log categorical probability mass function with outcomes y
in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta
. The same vector of intercepts alpha
and same row vector of the independent variables x
are used for all instances. The size of the independent variable matrix x
needs to match the number of rows of the coefficient vector beta
. The size of the intercept vector alpha
must match the number of columns of the coefficient vector beta
.
real
categorical_logit_glm_lpmf
(int[] y | matrix x, vector alpha, matrix beta)
The log categorical probability mass function with outcomes y
in \(1:N\) given \(N\)-vector of log-odds of outcomes alpha + x * beta
. The same vector of intercepts alpha
is used for all instances. The number of rows of the independent variable matrix x
needs to match the size of the dependent variable vector y
. The number of columns of independnt variable x
needs to match the number of rows of the coefficient matrix beta
. The size of the intercept vector alpha
must match the number of columns of the coefficient matrix beta
. If x
and y
are data (not parameters) this function can be executed on a GPU.