17.4 Log sum of exponentials

This is an old version, view current version.

Working on the log scale, multiplication is converted to addition, \[ \log (a \cdot b) = \log a + \log b. \] Thus sequences of multiplication operations can remain on the log scale. But what about addition? Given \(\log a\) and \(\log b\), how do we get \(\log (a + b)\)? Working out the algebra, \[ \log (a + b) = \log (\exp(\log a) + \exp(\log b)). \]

17.4.1 Log-sum-exp function

The nested log of sum of exponentials is so common, it has its own name, “log-sum-exp,” \[ \textrm{log-sum-exp}(u, v) = \log (\exp(u) + \exp(v)). \] so that \[ \log (a + b) = \textrm{log-sum-exp}(\log a, \log b). \]

Although it appears this might overflow as soon as exponentiation is introduced, evaluation does not proceed by evaluating the terms as written. Instead, with a little algebra, the terms are rearranged into a stable form, \[ \textrm{log-sum-exp}(u, v) = \max(u, v) + \log\big( \exp(u - \max(u, v)) + \exp(v - \max(u, v)) \big). \]

Because the terms inside the exponentiations are \(u - \max(u, v)\) and \(v - \max(u, v)\), one will be zero and the other will be negative. Because the operation is symmetric, it may be assumed without loss of generality that \(u \geq v\), so that \[ \textrm{log-sum-exp}(u, v) = u + \log\big(1 + \exp(v - u)\big). \]

Although the inner term may itself be evaluated using the built-in function log1p, there is only limited gain because \(\exp(v - u)\) is only near zero when \(u\) is much larger than \(v\), meaning the final result is likely to round to \(u\) anyway.

To conclude, when evaluating \(\log (a + b)\) given \(\log a\) and \(\log b\), and assuming \(\log a > \log b\), return

\[ \log (a + b) = \log a + \textrm{log1p}\big(\exp(\log b - \log a)\big). \]

17.4.2 Applying log-sum-exp to a sequence

The log sum of exponentials function may be generalized to sequences in the obvious way, so that if \(v = v_1, \ldots, v_N\), then \[\begin{eqnarray*} \textrm{log-sum-exp}(v) & = & \log \sum_{n = 1}^N \exp(v_n) \\[4pt] & = & \max(v) + \log \sum_{n = 1}^N \exp(v_n - \max(v)). \end{eqnarray*}\] The exponent cannot overflow because its argument is either zero or negative. This form makes it easy to calculate \(\log (u_1 + \cdots + u_N)\) given only \(\log u_n\).

17.4.3 Calculating means with log-sum-exp

An immediate application is to computing the mean of a vector \(u\) entirely on the log scale. That is, given \(\log u\) and returning \(\log \textrm{mean}(u)\). \[\begin{eqnarray*} \log \left( \frac{1}{N} \sum_{n = 1}^N u_n \right) & = & \log \frac{1}{N} + \log \sum_{n = 1}^N \exp(\log u_n) \\[4pt] & = & -\log N + \textrm{log-sum-exp}(\log u). \end{eqnarray*}\] where \(\log u = (\log u_1, \ldots, \log u_N)\) is understood elementwise.