5.11 Special Matrix Functions

5.11.1 Softmax

The softmax function maps3 \(y \in \mathbb{R}^K\) to the \(K\)-simplex by \[ \text{softmax}(y) = \frac{\exp(y)} {\sum_{k=1}^K \exp(y_k)}, \] where \(\exp(y)\) is the componentwise exponentiation of \(y\). Softmax is usually calculated on the log scale, \[\begin{eqnarray*} \log \text{softmax}(y) & = & \ y - \log \sum_{k=1}^K \exp(y_k) \\[4pt] & = & y - \mathrm{log\_sum\_exp}(y). \end{eqnarray*}\] where the vector \(y\) minus the scalar \(\mathrm{log\_sum\_exp}(y)\) subtracts the scalar from each component of \(y\).

Stan provides the following functions for softmax and its log.

vector softmax(vector x)
The softmax of x

vector log_softmax(vector x)
The natural logarithm of the softmax of x

5.11.2 Cumulative Sums

The cumulative sum of a sequence \(x_1,\ldots,x_N\) is the sequence \(y_1,\ldots,y_N\), where \[ y_n = \sum_{m = 1}^{n} x_m. \]

real[] cumulative_sum(real[] x)
The cumulative sum of x

vector cumulative_sum(vector v)
The cumulative sum of v

row_vector cumulative_sum(row_vector rv)
The cumulative sum of rv

  1. The softmax function is so called because in the limit as \(y_n \rightarrow \infty\) with \(y_m\) for \(m \neq n\) held constant, the result tends toward the “one-hot” vector \(\theta\) with \(\theta_n = 1\) and \(\theta_m = 0\) for \(m \neq n\), thus providing a “soft” version of the maximum function.