Variational Inference using Pathfinder

Stan supports the Pathfinder algorithm (Zhang, 2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the LBFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.

There are two Stan implementations of the Pathfinder algorithm: single-path Pathfinder and multi-path Pathfinder. Single-path Pathfinder generates a set of approximate draws from one run of the basic Pathfinder algorithm. Multi-path Pathfinder uses importance resampling over the draws from multiple runs of Pathfinder. This better matches non-normal target densities and also mitigates the problem of L-BFGS getting stuck at local optima or in saddle points on plateaus.

Example: variational inference with Pathfinder for model bernoulli.stan

The CmdStanModel pathfinder method wraps the CmdStan pathfinder method.

By default, CmdStanPy runs multi-path Pathfinder which returns an importance-resampled set of draws over the outputs of 4 independent single-path Pathfinders.

[1]:
import os
from cmdstanpy.model import CmdStanModel, cmdstan_path
[2]:
bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
stan_file = os.path.join(bernoulli_dir, 'bernoulli.stan')
data_file = os.path.join(bernoulli_dir, 'bernoulli.data.json')
# instantiate, compile bernoulli model
model = CmdStanModel(stan_file=stan_file)
# run CmdStan's pathfinder method, returns object `CmdStanPathfinder`
pathfinder = model.pathfinder(data=data_file)
20:13:40 - cmdstanpy - INFO - Chain [1] start processing
20:13:40 - cmdstanpy - INFO - Chain [1] done processing
[3]:
print(pathfinder)
print(pathfinder.metadata)
CmdStanPathfinder: model=bernoulli['method=pathfinder']
 csv_files:
        /tmp/tmpbblpkqzz/bernoulliinr4ztho/bernoulli-20241211201340.csv
 output_files:
        /tmp/tmpbblpkqzz/bernoulliinr4ztho/bernoulli-20241211201340_0-stdout.txt
Metadata:
{'stan_version_major': 2, 'stan_version_minor': 36, 'stan_version_patch': 0, 'model': 'bernoulli_model', 'start_datetime': '2024-12-11 20:13:40 UTC', 'method': 'pathfinder', 'init_alpha': 0.001, 'tol_obj': 1e-12, 'tol_rel_obj': 10000, 'tol_grad': 1e-08, 'tol_rel_grad': 10000000.0, 'tol_param': 1e-08, 'history_size': 5, 'num_psis_draws': 1000, 'num_paths': 4, 'save_single_paths': 0, 'psis_resample': 1, 'calculate_lp': 1, 'max_lbfgs_iters': 1000, 'num_draws': 1000, 'num_elbo_draws': 25, 'id': 1, 'data_file': '/home/runner/.cmdstan/cmdstan-2.36.0/examples/bernoulli/bernoulli.data.json', 'init': 2, 'seed': 40737, 'diagnostic_file': '', 'refresh': 100, 'sig_figs': -1, 'profile_file': 'profile.csv', 'save_cmdstan_config': 0, 'num_threads': 1, 'stanc_version': 'stanc3 v2.36.0', 'stancflags': '', 'raw_header': 'lp_approx__,lp__,theta', 'column_names': ('lp_approx__', 'lp__', 'theta')}

The pathfinder method returns a CmdStanPathfinder object, which provides access to the disparate information from the Stan CSV files.

  • The stan_variable and stan_variables methods return a Python numpy.ndarray containing all draws from the sample where the structure of each draw corresponds to the structure of the Stan variable.

  • The draws method returns the sample as a numpy.ndarray.

[4]:
pathfinder.stan_variable("theta").shape
[4]:
(1000,)
[5]:
pathfinder.column_names
[5]:
('lp_approx__', 'lp__', 'theta')
[6]:
pathfinder.draws().shape
[6]:
(1000, 3)

Pathfinders as initialization for the MCMC sampler

The method create_inits returns a Python Dict containing a set of per-chain initializations for the model parameters. Each set of initializations is a random draw from the Pathfinder sample. These initializations can be used as the initial parameter values for Stan’s NUTS-HMC sampler, which will reduce the number of warmup iterations needed.

[7]:
inits = pathfinder.create_inits()
print(inits)
[{'theta': array(0.336095)}, {'theta': array(0.0803562)}, {'theta': array(0.217729)}, {'theta': array(0.304769)}]

The create_inits takes two arguments:

  • seed - used for random selection.

  • chains - the number of draws to return, default is 4. This should match the number of sampler chains to run.

[8]:
inits = pathfinder.create_inits(chains=3)
print(inits)
[{'theta': array(0.125923)}, {'theta': array(0.135961)}, {'theta': array(0.270804)}]