Multivariate Discrete Distributions
The multivariate discrete distributions are over multiple integer values, which are expressed in Stan as arrays.
Multinomial distribution
Probability mass function
If \(K \in \mathbb{N}\), \(N \in \mathbb{N}\), and \(\theta \in \text{$K$-simplex}\), then for \(y \in \mathbb{N}^K\) such that \(\sum_{k=1}^K y_k = N\), \[\begin{equation*} \text{Multinomial}(y|\theta) = \binom{N}{y_1,\ldots,y_K} \prod_{k=1}^K \theta_k^{y_k}, \end{equation*}\] where the multinomial coefficient is defined by \[\begin{equation*} \binom{N}{y_1,\ldots,y_k} = \frac{N!}{\prod_{k=1}^K y_k!}. \end{equation*}\]
Distribution statement
y ~ multinomial(theta)
Increment target log probability density with multinomial_lupmf(y | theta).
Stan functions
real multinomial_lpmf(array[] int y | vector theta)
The log multinomial probability mass function with outcome array y of size \(K\) given the \(K\)-simplex distribution parameter theta and (implicit) total count N = sum(y)
real multinomial_lupmf(array[] int y | vector theta)
The log multinomial probability mass function with outcome array y of size \(K\) given the \(K\)-simplex distribution parameter theta and (implicit) total count N = sum(y) dropping constant additive terms
array[] int multinomial_rng(vector theta, int N)
Generate a multinomial variate with simplex distribution parameter theta and total count \(N\); may only be used in transformed data and generated quantities blocks
Multinomial distribution, logit parameterization
Stan also provides a version of the multinomial probability mass function distribution with the \(\text{$K$-simplex}\) for the event count probabilities per category given on the unconstrained logistic scale.
Probability mass function
If \(K \in \mathbb{N}\), \(N \in \mathbb{N}\), and \(\text{softmax}(\theta) \in \text{$K$-simplex}\), then for \(y \in \mathbb{N}^K\) such that \(\sum_{k=1}^K y_k = N\), \[\begin{equation*} \begin{split} \text{MultinomialLogit}(y \mid \gamma) & = \text{Multinomial}(y \mid \text{softmax}(\gamma)) \\ & = \binom{N}{y_1,\ldots,y_K} \prod_{k=1}^K [\text{softmax}(\gamma_k)]^{y_k}, \end{split} \end{equation*}\] where the multinomial coefficient is defined by \[\begin{equation*} \binom{N}{y_1,\ldots,y_k} = \frac{N!}{\prod_{k=1}^K y_k!}. \end{equation*}\]
Distribution statement
y ~ multinomial_logit(gamma)
Increment target log probability density with multinomial_logit_lupmf(y | gamma).
Stan functions
real multinomial_logit_lpmf(array[] int y | vector gamma)
The log multinomial probability mass function with outcome array y of size \(K\) given the log \(K\)-simplex distribution parameter \(\gamma\) and (implicit) total count N = sum(y)
real multinomial_logit_lupmf(array[] int y | vector gamma)
The log multinomial probability mass function with outcome array y of size \(K\) given the log \(K\)-simplex distribution parameter \(\gamma\) and (implicit) total count N = sum(y) dropping constant additive terms
array[] int multinomial_logit_rng(vector gamma, int N)
Generate a variate from a multinomial distribution with probabilities softmax(gamma) and total count N; may only be used in transformed data and generated quantities blocks.
Dirichlet-multinomial distribution
Stan also provides the Dirichlet-multinomial distribution, which generalizes the Beta-binomial distribution to more than two categories. As such, it is an overdispersed version of the multinomial distribution.
Probability mass function
If \(K \in \mathbb{N}\), \(N \in \mathbb{N}\), and \(\alpha \in \mathbb{R}_{+}^K\), then for \(y \in \mathbb{N}^K\) such that \(\sum_{k=1}^K y_k = N\), the PMF of the Dirichlet-multinomial distribution is defined as \[\begin{equation*} \text{DirMult}(y|\theta) = \frac{\Gamma(\alpha_0)\Gamma(N+1)}{\Gamma(N+\alpha_0)} \prod_{k=1}^K \frac{\Gamma(y_k + \alpha_k)}{\Gamma(\alpha_k)\Gamma(y_k+1)}, \end{equation*}\] where \(\alpha_0\) is defined as \(\alpha_0 = \sum_{k=1}^K \alpha_k\).
Distribution statement
y ~ dirichlet_multinomial(alpha)
Increment target log probability density with dirichlet_multinomial_lupmf(y | alpha).
Stan functions
real dirichlet_multinomial_lpmf(array[] int y | vector alpha)
The log multinomial probability mass function with outcome array y with \(K\) elements given the positive \(K\)-vector distribution parameter alpha and (implicit) total count N = sum(y).
real dirichlet_multinomial_lupmf(array[] int y | vector alpha)
The log multinomial probability mass function with outcome array y with \(K\) elements, given the positive \(K\)-vector distribution parameter alpha and (implicit) total count N = sum(y) dropping constant additive terms.
array[] int dirichlet_multinomial_rng(vector alpha, int N)
Generate a multinomial variate with positive vector distribution parameter alpha and total count N; may only be used in transformed data and generated quantities blocks. This is equivalent to multinomial_rng(dirichlet_rng(alpha), N).