This is an old version, view current version.

6 Variational Inference using Pathfinder

The CmdStan method pathfinder uses the Pathfinder algorithm of Zhang et al. (2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the L-BFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.

Pathfinder differs from the ADVI method in that it uses quasi-Newton optimization on the log posterior instead of stochastic gradient descent (SGD) on the Monte Carlo computation of the evidence lower bound (ELBO). Pathfinder’s approach is both faster and more stable than that of ADVI. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors.

A single run of the Pathfinder algorithm generates a set of approximate draws. Inference is improved by running multiple Pathfinder instances and using Pareto-smoothed importance resampling (PSIS) of the resulting sets of draws. This better matches non-normal target densities and also eliminates minor modes. By default, the pathfinder method uses 4 independent Pathfinder runs, each of which produces 1000 approximate draws, which are then importance resampled down to 1000 final draws.

The following is a minimal call the Pathfinder algorithm using defaults for everything but the location of the data file.

> ./bernoulli pathfinder data file=bernoulli.data.R

Executing this command prints both output to the console and csv files.

The first part of the console output reports on the configuration used.

method = pathfinder
  pathfinder
    init_alpha = 0.001 (Default)
    tol_obj = 9.9999999999999998e-13 (Default)
    tol_rel_obj = 10000 (Default)
    tol_grad = 1e-08 (Default)
    tol_rel_grad = 10000000 (Default)
    tol_param = 1e-08 (Default)
    history_size = 5 (Default)
    num_psis_draws = 1000 (Default)
    num_paths = 4 (Default)
    save_single_paths = 0 (Default)
    max_lbfgs_iters = 1000 (Default)
    num_draws = 1000 (Default)
    num_elbo_draws = 25 (Default)
id = 1 (Default)
data
  file = examples/bernoulli/bernoulli.data.json
init = 2 (Default)
random
  seed = 1995513073 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)
  sig_figs = -1 (Default)
  profile_file = profile.csv (Default)
num_threads = 1 (Default)

The rest of the output describes the progression of the algorithm.

By default, the Pathfinder algorithm runs 4 single-path Pathfinders in parallel, the uses importance resampling on the set of returned draws to produce the specified number of draws.

Path [1] :Initial log joint density = -11.543343
Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes 
              5      -6.748e+00      1.070e-03   1.707e-05    1.000e+00  1.000e+00       126 -6.220e+00 -6.220e+00                  
Path [1] :Best Iter: [5] ELBO (-6.219833) evaluations: (126)
Path [2] :Initial log joint density = -7.443345
Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes 
              5      -6.748e+00      9.936e-05   3.738e-07    1.000e+00  1.000e+00       126 -6.164e+00 -6.164e+00                  
Path [2] :Best Iter: [5] ELBO (-6.164015) evaluations: (126)
Path [3] :Initial log joint density = -18.986308
Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes 
              5      -6.748e+00      2.996e-04   4.018e-06    1.000e+00  1.000e+00       126 -6.201e+00 -6.201e+00                  
Path [3] :Best Iter: [5] ELBO (-6.200559) evaluations: (126)
Path [4] :Initial log joint density = -8.304453
Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes 
              5      -6.748e+00      2.814e-04   2.034e-06    1.000e+00  1.000e+00       126 -6.221e+00 -6.221e+00                  
Path [4] :Best Iter: [3] ELBO (-6.161276) evaluations: (126)
Total log probability function evaluations:8404

Pathfinder outputs a StanCSV file file which contains the importance resampled draws from multi-path Pathfinder. The initial CSV comment rows contain the complete set of CmdStan configuration options. Next is the column header line, followed the set of approximate draws. The Pathfinder algorithm first outputs lp_approx__, the log density in the approximating distribution, and lp__, the log density in the target distribution, followed by estimates of the model parameters, transformed parameters, and generated quantities.

lp_approx__,lp__,theta
-2.4973, -8.2951, 0.0811852
-0.87445, -7.06526, 0.160207
-0.812285, -7.07124, 0.35819
...

The final lines are comment lines which give timing information.

# Elapsed Time: 0.016000 seconds (Pathfinders)
#               0.003000 seconds (PSIS)
#               0.019000 seconds (Total)

Pathfinder provides option save_single_paths which will save output from the single-path Pathfinder runs. See section Pathfinder Method for details.

Bibliography

Zhang, Lu, Bob Carpenter, Andrew Gelman, and Aki Vehtari. 2022. “Pathfinder: Parallel Quasi-Newton Variational Inference.” Journal of Machine Learning Research 23 (306): 1–49. http://jmlr.org/papers/v23/21-0889.html.