6 Variational Inference
CmdStan can approximate the posterior distribution using variational
inference. The approximation is a Gaussian in the unconstrained variable space.
Stan implements two variational algorithms. The algorithm=meanfield
option
uses a fully factorized Gaussian for the approximation.
The algorithm=fullrank
option uses a Gaussian with a full-rank
covariance matrix for the approximation.
The executable does not need to be recompiled in order to switch to variational inference, and the data input format is the same. The following is a minimal call to Stan’s variational inference algorithm using defaults for everything but the location of the data file.
> ./bernoulli variational data file=bernoulli.data.R
Executing this command prints both output to the console and to a csv file.
The first part of the console output reports on the configuration used. Here it indicates the default mean-field setting of the variational inference algorithm. It also indicates the default parameter sizes and tolerances for monitoring the algorithm’s convergence.
method = variational
variational
algorithm = meanfield (Default)
meanfield
iter = 10000 (Default)
grad_samples = 1 (Default)
elbo_samples = 100 (Default)
eta = 1 (Default)
adapt
engaged = 1 (Default)
iter = 50 (Default)
tol_rel_obj = 0.01 (Default)
eval_elbo = 100 (Default)
output_samples = 1000 (Default)
id = 0 (Default)
data
file = bernoulli.data.json
init = 2 (Default)
random
seed = 3323783840 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
After the configuration has been displayed, informational and timing messages are output:
------------------------------------------------------------
EXPERIMENTAL ALGORITHM:
This procedure has not been thoroughly tested and may be unstable
or buggy. The interface is subject to change.
------------------------------------------------------------
Gradient evaluation took 2.1e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.21 seconds.
Adjust your expectations accordingly!
The rest of the output describes the progression of the algorithm.
An adaptation phase finds a good value for the step size scaling
parameter eta
. The evidence lower bound (ELBO) is the variational
objective function and is evaluated based on a Monte Carlo estimate.
The variational inference algorithm in Stan is stochastic, which makes
it challenging to assess convergence. That is, while the algorithm
appears to have converged in \(\sim\) 250 iterations, the algorithm runs
for another few thousand iterations until mean change in ELBO drops
below the default tolerance of 0.01.
Begin eta adaptation.
Iteration: 1 / 250 [ 0%] (Adaptation)
Iteration: 50 / 250 [ 20%] (Adaptation)
Iteration: 100 / 250 [ 40%] (Adaptation)
Iteration: 150 / 250 [ 60%] (Adaptation)
Iteration: 200 / 250 [ 80%] (Adaptation)
Success! Found best value [eta = 1] earlier than expected.
Begin stochastic gradient ascent.
iter ELBO delta_ELBO_mean delta_ELBO_med notes
100 -6.131 1.000 1.000
200 -6.458 0.525 1.000
300 -6.300 0.359 0.051
400 -6.137 0.276 0.051
500 -6.243 0.224 0.027
600 -6.305 0.188 0.027
700 -6.289 0.162 0.025
800 -6.402 0.144 0.025
900 -6.103 0.133 0.025
1000 -6.314 0.123 0.027
1100 -6.348 0.024 0.025
1200 -6.244 0.020 0.018
1300 -6.293 0.019 0.017
1400 -6.250 0.017 0.017
1500 -6.241 0.015 0.010 MEDIAN ELBO CONVERGED
Drawing a sample of size 1000 from the approximate posterior...
COMPLETED.
The output from variational is written into the file output.csv
by default. The output follows the same pattern as the output for
sampling, first dumping the entire set of parameters used
as CSV comments:
# stan_version_major = 2
# stan_version_minor = 23
# stan_version_patch = 0
# model = bernoulli_model
# method = variational
# variational
# algorithm = meanfield (Default)
# meanfield
# iter = 10000 (Default)
# grad_samples = 1 (Default)
# elbo_samples = 100 (Default)
# eta = 1 (Default)
# adapt
# engaged = 1 (Default)
# iter = 50 (Default)
# tol_rel_obj = 0.01 (Default)
# eval_elbo = 100 (Default)
# output_samples = 1000 (Default)
...
Next is the column header line, followed more CSV comments reporting the
adapted value for the stepsize, followed by the values.
The first line is special: it is the mean of the variational approximation.
The rest of the output contains output_samples
number of samples
drawn from the variational approximation.
lp__,log_p__,log_g__,theta
# Stepsize adaptation complete.
# eta = 1
0,0,0,0.236261
0,-6.82318,-0.0929121,0.300415
0,-6.89701,-0.158687,0.321982
0,-6.99391,-0.23916,0.343643
0,-7.35801,-0.51787,0.401554
0,-7.4668,-0.539473,0.123081
...
The header indicates the unnormalized log probability with lp__
.
This is a legacy feature that we do not use for variational inference.
The ELBO is not stored unless a diagnostic option is given.
For further details, see Kucukelbir, Alp, Rajesh Ranganath, Andrew Gelman, and David M. Blei. 2015. Automatic Variational Inference in Stan. arXiv 1506.03431. http://arxiv.org/abs/1506.03431.