This is an old version, view current version.

7 Variational Inference using ADVI

The CmdStan method variational uses the Automatic Differentiation Variational Inference (ADVI) algorithm of Kucukelbir et al. (2017) to provide an approximate posterior distribution of the model conditioned on the data. The approximating distribution it uses is a Gaussian in the unconstrained variable space, either a fully factorized Gaussian approximation, specified by argument algorithm=meanfield option, or a Gaussian approximation using a full-rank covariance matrix, specified by argument algorithm=fullrank. By default, ADVI uses option algorithm=meanfield.

The following is a minimal call to Stan’s variational inference algorithm using defaults for everything but the location of the data file.

> ./bernoulli variational data file=bernoulli.data.R

Executing this command prints both output to the console and to a csv file.

The first part of the console output reports on the configuration used: the default option algorithm=meanfield and the default tolerances for monitoring the algorithm’s convergence.

method = variational
  variational
    algorithm = meanfield (Default)
      meanfield
    iter = 10000 (Default)
    grad_samples = 1 (Default)
    elbo_samples = 100 (Default)
    eta = 1 (Default)
    adapt
      engaged = 1 (Default)
      iter = 50 (Default)
    tol_rel_obj = 0.01 (Default)
    eval_elbo = 100 (Default)
    output_samples = 1000 (Default)
id = 0 (Default)
data
  file = bernoulli.data.json
init = 2 (Default)
random
  seed = 3323783840 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)

After the configuration has been displayed, informational and timing messages are output:

------------------------------------------------------------
EXPERIMENTAL ALGORITHM:
  This procedure has not been thoroughly tested and may be unstable
  or buggy. The interface is subject to change.
------------------------------------------------------------

Gradient evaluation took 2.1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Adjust your expectations accordingly!

The rest of the output describes the progression of the algorithm. An adaptation phase finds a good value for the step size scaling parameter eta. The evidence lower bound (ELBO) is the variational objective function and is evaluated based on a Monte Carlo estimate. The variational inference algorithm in Stan is stochastic, which makes it challenging to assess convergence. That is, while the algorithm appears to have converged in $\sim$ 250 iterations, the algorithm runs for another few thousand iterations until mean change in ELBO drops below the default tolerance of 0.01.

Begin eta adaptation.
Iteration:   1 / 250 [  0%]  (Adaptation)
Iteration:  50 / 250 [ 20%]  (Adaptation)
Iteration: 100 / 250 [ 40%]  (Adaptation)
Iteration: 150 / 250 [ 60%]  (Adaptation)
Iteration: 200 / 250 [ 80%]  (Adaptation)
Success! Found best value [eta = 1] earlier than expected.

Begin stochastic gradient ascent.
  iter             ELBO   delta_ELBO_mean   delta_ELBO_med   notes 
   100           -6.131             1.000            1.000
   200           -6.458             0.525            1.000
   300           -6.300             0.359            0.051
   400           -6.137             0.276            0.051
   500           -6.243             0.224            0.027
   600           -6.305             0.188            0.027
   700           -6.289             0.162            0.025
   800           -6.402             0.144            0.025
   900           -6.103             0.133            0.025
  1000           -6.314             0.123            0.027
  1100           -6.348             0.024            0.025
  1200           -6.244             0.020            0.018
  1300           -6.293             0.019            0.017
  1400           -6.250             0.017            0.017
  1500           -6.241             0.015            0.010   MEDIAN ELBO CONVERGED

Drawing a sample of size 1000 from the approximate posterior... 
COMPLETED.

The output from variational is written into the file output.csv by default. The output follows the same pattern as the output for sampling, first dumping the entire set of parameters used as CSV comments:

# stan_version_major = 2
# stan_version_minor = 23
# stan_version_patch = 0
# model = bernoulli_model
# method = variational
#   variational
#     algorithm = meanfield (Default)
#       meanfield
#     iter = 10000 (Default)
#     grad_samples = 1 (Default)
#     elbo_samples = 100 (Default)
#     eta = 1 (Default)
#     adapt
#       engaged = 1 (Default)
#       iter = 50 (Default)
#     tol_rel_obj = 0.01 (Default)
#     eval_elbo = 100 (Default)
#     output_samples = 1000 (Default)
...

Next is the column header line, followed more CSV comments reporting the adapted value for the stepsize, followed by the values. The first line is special: it is the mean of the variational approximation. The rest of the output contains output_samples number of samples drawn from the variational approximation.

lp__,log_p__,log_g__,theta
# Stepsize adaptation complete.
# eta = 1
0,0,0,0.236261
0,-6.82318,-0.0929121,0.300415
0,-6.89701,-0.158687,0.321982
0,-6.99391,-0.23916,0.343643
0,-7.35801,-0.51787,0.401554
0,-7.4668,-0.539473,0.123081
...

The header indicates the unnormalized log probability with lp__. This is a legacy feature that we do not use for variational inference. The ELBO is not stored unless a diagnostic option is given.

Bibliography

Kucukelbir, Alp, Dustin Tran, Rajesh Ranganath, Andrew Gelman, and David M Blei. 2017. “Automatic Differentiation Variational Inference.” Journal of Machine Learning Research.