6.12 Special matrix functions
6.12.1 Softmax
The softmax function maps3 \(y \in \mathbb{R}^K\) to the \(K\)-simplex by \[ \text{softmax}(y) = \frac{\exp(y)} {\sum_{k=1}^K \exp(y_k)}, \] where \(\exp(y)\) is the componentwise exponentiation of \(y\). Softmax is usually calculated on the log scale, \[\begin{eqnarray*} \log \text{softmax}(y) & = & \ y - \log \sum_{k=1}^K \exp(y_k) \\[4pt] & = & y - \mathrm{log\_sum\_exp}(y). \end{eqnarray*}\] where the vector \(y\) minus the scalar \(\mathrm{log\_sum\_exp}(y)\) subtracts the scalar from each component of \(y\).
Stan provides the following functions for softmax and its log.
vector
softmax
(vector x)
The softmax of x
Available since 2.0
vector
log_softmax
(vector x)
The natural logarithm of the softmax of x
Available since 2.0
6.12.2 Cumulative sums
The cumulative sum of a sequence \(x_1,\ldots,x_N\) is the sequence \(y_1,\ldots,y_N\), where \[ y_n = \sum_{m = 1}^{n} x_m. \]
array[] real
cumulative_sum
(array[] real x)
The cumulative sum of x
Available since 2.0
vector
cumulative_sum
(vector v)
The cumulative sum of v
Available since 2.0
row_vector
cumulative_sum
(row_vector rv)
The cumulative sum of rv
Available since 2.0
The softmax function is so called because in the limit as \(y_n \rightarrow \infty\) with \(y_m\) for \(m \neq n\) held constant, the result tends toward the “one-hot” vector \(\theta\) with \(\theta_n = 1\) and \(\theta_m = 0\) for \(m \neq n\), thus providing a “soft” version of the maximum function.↩︎