18.1 Stochastic gradient ascent
ADVI optimizes the ELBO in the real-coordinate space using stochastic gradient ascent. We obtain noisy (yet unbiased) gradients of the variational objective using automatic differentiation and Monte Carlo integration. The algorithm ascends these gradients using an adaptive stepsize sequence. We evaluate the ELBO also using Monte Carlo integration and measure convergence similar to the relative tolerance scheme in Stan’s optimization feature.
Monte Carlo approximation of the ELBO
ADVI uses Monte Carlo integration to approximate the variational
objective function, the ELBO. The number of draws used to
approximate the ELBO is denoted by
recommend a default value of \(100\), as we only evaluate the ELBO every
eval_elbo iterations, which also defaults to \(100\).
Monte Carlo approximation of the gradients
ADVI uses Monte Carlo integration to approximate the gradients of the
ELBO. The number of draws used to approximate the gradients is
grad_samples. We recommend a default value of
\(1\), as this is the most efficient. It also a very noisy estimate of
the gradient, but stochastic gradient ascent is capable of following
Adaptive stepsize sequence
ADVI uses a finite-memory version of adaGrad Duchi, Hazan, and Singer (2011). This
has a single parameter that we expose, denoted
eta. We now
have a warmup adaptation phase that selects a good value for
eta. The procedure does a heuristic search over
values that span 5 orders of magnitude.
ADVI tracks the progression of the ELBO through the stochastic
optimization. Specifically, ADVI heuristically determines a rolling
window over which it computes the average and the median change of the
ELBO. Should either number fall below a threshold, denoted by
tol_rel_obj, we consider the algorithm to have converged. The change
in ELBO is calculated the same way as in Stan’s optimization module.