6.12 Special matrix functions

This is an old version, view current version.

6.12.1 Softmax

The softmax function maps³ $y \in \mathbb{R}^K$ to the $K$ -simplex by $\text{softmax}(y) = \frac{\exp(y)} {\sum_{k=1}^K \exp(y_k)},$ where $\exp(y)$ is the componentwise exponentiation of $y$ . Softmax is usually calculated on the log scale, $\begin{eqnarray*} \log \text{softmax}(y) & = & \ y - \log \sum_{k=1}^K \exp(y_k) \\[4pt] & = & y - \mathrm{log\_sum\_exp}(y). \end{eqnarray*}$ where the vector $y$ minus the scalar $\mathrm{log\_sum\_exp}(y)$ subtracts the scalar from each component of $y$ .

Stan provides the following functions for softmax and its log.

vector softmax(vector x)
The softmax of x
Available since 2.0

vector log_softmax(vector x)
The natural logarithm of the softmax of x
Available since 2.0

6.12.2 Cumulative sums

The cumulative sum of a sequence $x_1,\ldots,x_N$ is the sequence $y_1,\ldots,y_N$ , where $y_n = \sum_{m = 1}^{n} x_m.$

array[] real cumulative_sum(array[] real x)
The cumulative sum of x
Available since 2.0

vector cumulative_sum(vector v)
The cumulative sum of v
Available since 2.0

row_vector cumulative_sum(row_vector rv)
The cumulative sum of rv
Available since 2.0

The softmax function is so called because in the limit as $y_n \rightarrow \infty$ with $y_m$ for $m \neq n$ held constant, the result tends toward the “one-hot” vector $\theta$ with $\theta_n = 1$ and $\theta_m = 0$ for $m \neq n$ , thus providing a “soft” version of the maximum function.↩︎