6 Variational Inference using Pathfinder
The CmdStan method pathfinder
uses the Pathfinder algorithm of
Zhang et al. (2022). Pathfinder is a variational method for
approximately sampling from differentiable log densities. Starting
from a random initialization, Pathfinder locates normal approximations
to the target density along a quasi-Newton optimization path, with
local covariance estimated using the negative inverse Hessian estimates
produced by the L-BFGS optimizer. Pathfinder returns draws from the
Gaussian approximation with the lowest estimated Kullback-Leibler (KL)
divergence to the true posterior.
Pathfinder differs from the ADVI method in that it uses quasi-Newton optimization on the log posterior instead of stochastic gradient descent (SGD) on the Monte Carlo computation of the evidence lower bound (ELBO). Pathfinder’s approach is both faster and more stable than that of ADVI. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors.
A single run of the Pathfinder algorithm generates a set of
approximate draws. Inference is improved by running multiple
Pathfinder instances and using Pareto-smoothed importance resampling
(PSIS) of the resulting sets of draws. This better matches non-normal
target densities and also eliminates minor modes. By default, the
pathfinder
method uses 4 independent Pathfinder runs, each of which
produces 1000 approximate draws, which are then importance resampled
down to 1000 final draws.
The following is a minimal call the Pathfinder algorithm using defaults for everything but the location of the data file.
> ./bernoulli pathfinder data file=bernoulli.data.R
Executing this command prints both output to the console and csv files.
The first part of the console output reports on the configuration used.
method = pathfinder
pathfinder
init_alpha = 0.001 (Default)
tol_obj = 9.9999999999999998e-13 (Default)
tol_rel_obj = 10000 (Default)
tol_grad = 1e-08 (Default)
tol_rel_grad = 10000000 (Default)
tol_param = 1e-08 (Default)
history_size = 5 (Default)
num_psis_draws = 1000 (Default)
num_paths = 4 (Default)
save_single_paths = 0 (Default)
max_lbfgs_iters = 1000 (Default)
num_draws = 1000 (Default)
num_elbo_draws = 25 (Default)
id = 1 (Default)
data
file = examples/bernoulli/bernoulli.data.json
init = 2 (Default)
random
seed = 1995513073 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
sig_figs = -1 (Default)
profile_file = profile.csv (Default)
num_threads = 1 (Default)
The rest of the output describes the progression of the algorithm.
By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, the uses importance resampling on the set of returned draws to produce the specified number of draws.
Path [1] :Initial log joint density = -11.543343
Path [1] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
5 -6.748e+00 1.070e-03 1.707e-05 1.000e+00 1.000e+00 126 -6.220e+00 -6.220e+00
Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126)
Path [2] :Initial log joint density = -7.443345
Path [2] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
5 -6.748e+00 9.936e-05 3.738e-07 1.000e+00 1.000e+00 126 -6.164e+00 -6.164e+00
Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126)
Path [3] :Initial log joint density = -18.986308
Path [3] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
5 -6.748e+00 2.996e-04 4.018e-06 1.000e+00 1.000e+00 126 -6.201e+00 -6.201e+00
Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126)
Path [4] :Initial log joint density = -8.304453
Path [4] : Iter log prob ||dx|| ||grad|| alpha alpha0 # evals ELBO Best ELBO Notes
5 -6.748e+00 2.814e-04 2.034e-06 1.000e+00 1.000e+00 126 -6.221e+00 -6.221e+00
Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126)
Total log probability function evaluations:8404
Pathfinder outputs a StanCSV file file which
contains the importance resampled draws from multi-path Pathfinder.
The initial CSV comment rows contain the complete set of CmdStan
configuration options. Next is the column header line, followed the
set of approximate draws. The Pathfinder algorithm first outputs
lp_approx__
, the log density in the approximating distribution, and
lp__
, the log density in the target distribution, followed by
estimates of the model parameters, transformed parameters, and
generated quantities.
lp_approx__,lp__,theta
-2.4973, -8.2951, 0.0811852
-0.87445, -7.06526, 0.160207
-0.812285, -7.07124, 0.35819
...
The final lines are comment lines which give timing information.
# Elapsed Time: 0.016000 seconds (Pathfinders)
# 0.003000 seconds (PSIS)
# 0.019000 seconds (Total)
Pathfinder provides option save_single_paths
which will save output
from the single-path Pathfinder runs.
See section Pathfinder Method for details.