Variational Inference using ADVI
Stan implements an automatic variational inference algorithm, called Automatic Differentiation Variational Inference (ADVI) Kucukelbir et al. (2017). ADVI uses Monte Carlo integration to approximate the variational objective function, the ELBO (evidence lower bound). ADVI optimizes the ELBO in the real-coordinate space using stochastic gradient ascent. The measures of convergence are similar to the relative tolerance scheme of Stan’s optimization algorithms.
The algorithm progression consists of an adaptation phase followed by a sampling phase. The adaptation phase finds a good value for the step size scaling parameter eta
. The evidence lower bound (ELBO) is the variational objective function and is evaluated based on a Monte Carlo estimate. The variational inference algorithm in Stan is stochastic, which makes it challenging to assess convergence. The algorithm runs until the mean change in ELBO drops below the specified tolerance.
The full set of configuration options available for the variational
method is available by using the variational help-all
subcommand. The arguments with their requested values or defaults are also reported at the beginning of the algorithm’s console output and in the output CSV file’s comments.
The following is a minimal call to Stan’s variational inference algorithm using defaults for everything but the location of the data file.
> ./bernoulli variational data file=bernoulli.data.R
Executing this command prints both output to the console and to a csv file.
The first part of the console output reports on the configuration used: the default option algorithm=meanfield
and the default tolerances for monitoring the algorithm’s convergence.
method = variational
variational
algorithm = meanfield (Default)
meanfield
iter = 10000 (Default)
grad_samples = 1 (Default)
elbo_samples = 100 (Default)
eta = 1 (Default)
adapt
engaged = 1 (Default)
iter = 50 (Default)
tol_rel_obj = 0.01 (Default)
eval_elbo = 100 (Default)
output_samples = 1000 (Default)
id = 0 (Default)
data
file = bernoulli.data.json
init = 2 (Default)
random
seed = 3323783840 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
After the configuration has been displayed, informational and timing messages are output:
------------------------------------------------------------
EXPERIMENTAL ALGORITHM:
This procedure has not been thoroughly tested and may be unstable
or buggy. The interface is subject to change.
------------------------------------------------------------
Gradient evaluation took 2.1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Adjust your expectations accordingly!
The rest of the output describes the progression of the algorithm. An adaptation phase finds a good value for the step size scaling parameter eta
. The evidence lower bound (ELBO) is the variational objective function and is evaluated based on a Monte Carlo estimate. The variational inference algorithm in Stan is stochastic, which makes it challenging to assess convergence. That is, while the algorithm appears to have converged in \(\sim\) 250 iterations, the algorithm runs for another few thousand iterations until mean change in ELBO drops below the default tolerance of 0.01.
Begin eta adaptation.
Iteration: 1 / 250 [ 0%] (Adaptation)
Iteration: 50 / 250 [ 20%] (Adaptation)
Iteration: 100 / 250 [ 40%] (Adaptation)
Iteration: 150 / 250 [ 60%] (Adaptation)
Iteration: 200 / 250 [ 80%] (Adaptation)
Success! Found best value [eta = 1] earlier than expected.
Begin stochastic gradient ascent.
iter ELBO delta_ELBO_mean delta_ELBO_med notes
100 -6.131 1.000 1.000
200 -6.458 0.525 1.000
300 -6.300 0.359 0.051
400 -6.137 0.276 0.051
500 -6.243 0.224 0.027
600 -6.305 0.188 0.027
700 -6.289 0.162 0.025
800 -6.402 0.144 0.025
900 -6.103 0.133 0.025
1000 -6.314 0.123 0.027
1100 -6.348 0.024 0.025
1200 -6.244 0.020 0.018
1300 -6.293 0.019 0.017
1400 -6.250 0.017 0.017
1500 -6.241 0.015 0.010 MEDIAN ELBO CONVERGED
Drawing a sample of size 1000 from the approximate posterior...
COMPLETED.
Variational algorithms
Stan implements two variational algorithms. They differ in the approximating distribution used in the unconstrained variable space. By default, ADVI uses option algorithm=meanfield
. The algorithm
argument specifies the variational algorithm.
algorithm=meanfield
- Use a fully factorized Gaussian for the approximation. This is the default algorithm.algorithm=fullrank
Use a Gaussian with a full-rank covariance matrix for the approximation.
Configuration
iter=<int>
Maximum number of iterations. Must be \(> 0\). Default is \(10000\).grad_samples=<int>
Number of samples for Monte Carlo estimate of gradients. Must be \(> 0\). Default is \(1\).elbo_samples=<int>
Number of samples for Monte Carlo estimate of ELBO (objective function). Must be \(> 0\). Default is \(100\).eta=<double>
Stepsize weighting parameter for adaptive stepsize sequence. Must be \(> 0\). Default is \(1.0\).adapt
Warmup Adaptation keyword, takes sub-arguments:engaged=<boolean>
Adaptation engaged? Valid values: \((0, 1)\). Default is \(1\).iter=<int>
Maximum number of adaptation iterations. Must be \(> 0\). Default is \(50\).
tol_rel_obj=<double>
Convergence tolerance on the relative norm of the objective. Must be \(> 0\). Default is \(0.01\).eval_elbo=<int>
Evaluate ELBO every Nth iteration. Must be \(> 0\). Default is 100.output_samples=<int>
Number of posterior samples to draw and save. Must be \(> 0\). Default is 1000.
CSV output
The output file consists of the following pieces of information:
The full set of configuration options available for the
variational
method is reported at the beginning of the sampler output file as CSV comments.The first three output columns are labelled
lp__
,log_p__
,log_g__
, the rest are the model parameters.The stepsize adaptation information is output as CSV comments following column header row.
The following line contains the mean of the variational approximation.
The rest of the output contains
output_samples
number of samples drawn from the variational approximation.
To illustrate, we call Stan’s variational inference on the example model and data:
> ./bernoulli variational data file=bernoulli.data.R
By default, the output file is output.csv
.
The output follows the same pattern as the output for sampling, first dumping the entire set of parameters used as CSV comments:
# stan_version_major = 2
# stan_version_minor = 23
# stan_version_patch = 0
# model = bernoulli_model
# method = variational
# variational
# algorithm = meanfield (Default)
# meanfield
# iter = 10000 (Default)
# grad_samples = 1 (Default)
# elbo_samples = 100 (Default)
# eta = 1 (Default)
# adapt
# engaged = 1 (Default)
# iter = 50 (Default)
# tol_rel_obj = 0.01 (Default)
# eval_elbo = 100 (Default)
# output_samples = 1000 (Default)
...
Next, the column header row:
lp__,log_p__,log_g__,theta
Additional comments provide stepsize adaptation information:
# Stepsize adaptation complete.
# eta = 1
Followed by the data rows. The first line is special — it is the mean of the variational approximation.
0,0,0,0.214911
That is, the estimate for theta
given the data is 0.2
.
The rest of the output contains output_samples
number of samples drawn from the variational approximation.
The following is a sample based on this approximation:
0,-14.0252,-5.21718,0.770397
0,-7.05063,-0.10025,0.162061
0,-6.75031,-0.0191099,0.241606
...
The header indicates the unnormalized log probability with lp__
. This is a legacy feature that we do not use for variational inference. The ELBO is not stored unless a diagnostic option is given.