Variational Inference using Pathfinder

Stan supports the Pathfinder algorithm (Zhang, 2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the LBFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.

There are two Stan implementations of the Pathfinder algorithm: single-path Pathfinder and multi-path Pathfinder. Single-path Pathfinder generates a set of approximate draws from one run of the basic Pathfinder algorithm. Multi-path Pathfinder uses importance resampling over the draws from multiple runs of Pathfinder. This better matches non-normal target densities and also mitigates the problem of L-BFGS getting stuck at local optima or in saddle points on plateaus.

Example: variational inference with Pathfinder for model bernoulli.stan

The CmdStanModel pathfinder method wraps the CmdStan pathfinder method.

By default, CmdStanPy runs multi-path Pathfinder which returns an importance-resampled set of draws over the outputs of 4 independent single-path Pathfinders.

import os
from cmdstanpy.model import CmdStanModel, cmdstan_path
bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
stan_file = os.path.join(bernoulli_dir, 'bernoulli.stan')
data_file = os.path.join(bernoulli_dir, '')
# instantiate, compile bernoulli model
model = CmdStanModel(stan_file=stan_file)
# run CmdStan's pathfinder method, returns object `CmdStanPathfinder`
pathfinder = model.pathfinder(data=data_file)
14:28:45 - cmdstanpy - INFO - Chain [1] start processing
14:28:45 - cmdstanpy - INFO - Chain [1] done processing
CmdStanPathfinder: model=bernoulli['method=pathfinder']
{'stan_version_major': 2, 'stan_version_minor': 35, 'stan_version_patch': 0, 'model': 'bernoulli_model', 'start_datetime': '2024-06-17 14:28:45 UTC', 'method': 'pathfinder', 'init_alpha': 0.001, 'tol_obj': 1e-12, 'tol_rel_obj': 10000, 'tol_grad': 1e-08, 'tol_rel_grad': 10000000.0, 'tol_param': 1e-08, 'history_size': 5, 'num_psis_draws': 1000, 'num_paths': 4, 'save_single_paths': 0, 'psis_resample': 1, 'calculate_lp': 1, 'max_lbfgs_iters': 1000, 'num_draws': 1000, 'num_elbo_draws': 25, 'id': 1, 'data_file': '/home/runner/.cmdstan/cmdstan-2.35.0/examples/bernoulli/', 'init': 2, 'seed': 97812, 'diagnostic_file': '', 'refresh': 100, 'sig_figs': -1, 'profile_file': 'profile.csv', 'save_cmdstan_config': 0, 'num_threads': 1, 'stanc_version': 'stanc3 v2.35.0', 'stancflags': '', 'raw_header': 'lp_approx__,lp__,theta', 'column_names': ('lp_approx__', 'lp__', 'theta')}

The pathfinder method returns a CmdStanPathfinder object, which provides access to the disparate information from the Stan CSV files.

  • The stan_variable and stan_variables methods return a Python numpy.ndarray containing all draws from the sample where the structure of each draw corresponds to the structure of the Stan variable.

  • The draws method returns the sample as a numpy.ndarray.

('lp_approx__', 'lp__', 'theta')
(1000, 3)

Pathfinders as initialization for the MCMC sampler

The method create_inits returns a Python Dict containing a set of per-chain initializations for the model parameters. Each set of initializations is a random draw from the Pathfinder sample. These initializations can be used as the initial parameter values for Stan’s NUTS-HMC sampler, which will reduce the number of warmup iterations needed.

inits = pathfinder.create_inits()
[{'theta': array(0.369154)}, {'theta': array(0.1015)}, {'theta': array(0.168767)}, {'theta': array(0.0580899)}]

The create_inits takes two arguments:

  • seed - used for random selection.

  • chains - the number of draws to return, default is 4. This should match the number of sampler chains to run.

inits = pathfinder.create_inits(chains=3)
[{'theta': array(0.214343)}, {'theta': array(0.168289)}, {'theta': array(0.213014)}]