17.4 Log sum of exponentials
Working on the log scale, multiplication is converted to addition, log(a⋅b)=loga+logb. Thus sequences of multiplication operations can remain on the log scale. But what about addition? Given loga and logb, how do we get log(a+b)? Working out the algebra, log(a+b)=log(exp(loga)+exp(logb)).
17.4.1 Log-sum-exp function
The nested log of sum of exponentials is so common, it has its own name, “log-sum-exp”, log-sum-exp(u,v)=log(exp(u)+exp(v)). so that log(a+b)=log-sum-exp(loga,logb).
Although it appears this might overflow as soon as exponentiation is introduced, evaluation does not proceed by evaluating the terms as written. Instead, with a little algebra, the terms are rearranged into a stable form, log-sum-exp(u,v)=max
Because the terms inside the exponentiations are u - \max(u, v) and v - \max(u, v), one will be zero and the other will be negative. Because the operation is symmetric, it may be assumed without loss of generality that u \geq v, so that \textrm{log-sum-exp}(u, v) = u + \log\big(1 + \exp(v - u)\big).
Although the inner term may itself be evaluated using the built-in
function log1p
, there is only limited gain because \exp(v - u) is
only near zero when u is much larger than v, meaning the final
result is likely to round to u anyway.
To conclude, when evaluating \log (a + b) given \log a and \log b, and assuming \log a > \log b, return
\log (a + b) = \log a + \textrm{log1p}\big(\exp(\log b - \log a)\big).
17.4.2 Applying log-sum-exp to a sequence
The log sum of exponentials function may be generalized to sequences in the obvious way, so that if v = v_1, \ldots, v_N, then \begin{eqnarray*} \textrm{log-sum-exp}(v) & = & \log \sum_{n = 1}^N \exp(v_n) \\[4pt] & = & \max(v) + \log \sum_{n = 1}^N \exp(v_n - \max(v)). \end{eqnarray*} The exponent cannot overflow because its argument is either zero or negative. This form makes it easy to calculate \log (u_1 + \cdots + u_N) given only \log u_n.
17.4.3 Calculating means with log-sum-exp
An immediate application is to computing the mean of a vector u entirely on the log scale. That is, given \log u and returning \log \textrm{mean}(u). \begin{eqnarray*} \log \left( \frac{1}{N} \sum_{n = 1}^N u_n \right) & = & \log \frac{1}{N} + \log \sum_{n = 1}^N \exp(\log u_n) \\[4pt] & = & -\log N + \textrm{log-sum-exp}(\log u). \end{eqnarray*} where \log u = (\log u_1, \ldots, \log u_N) is understood elementwise.