Variational Inference using Pathfinder¶
Stan supports the Pathfinder algorithm (Zhang, 2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the LBFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.
There are two Stan implementations of the Pathfinder algorithm: single-path Pathfinder and multi-path Pathfinder. Single-path Pathfinder generates a set of approximate draws from one run of the basic Pathfinder algorithm. Multi-path Pathfinder uses importance resampling over the draws from multiple runs of Pathfinder. This better matches non-normal target densities and also mitigates the problem of L-BFGS getting stuck at local optima or in saddle points on plateaus.
Example: variational inference with Pathfinder for model bernoulli.stan
¶
The CmdStanModel pathfinder method wraps the CmdStan pathfinder method.
By default, CmdStanPy runs multi-path Pathfinder which returns an importance-resampled set of draws over the outputs of 4 independent single-path Pathfinders.
[1]:
import os
from cmdstanpy.model import CmdStanModel, cmdstan_path
[2]:
bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
stan_file = os.path.join(bernoulli_dir, 'bernoulli.stan')
data_file = os.path.join(bernoulli_dir, 'bernoulli.data.json')
# instantiate, compile bernoulli model
model = CmdStanModel(stan_file=stan_file)
# run CmdStan's pathfinder method, returns object `CmdStanPathfinder`
pathfinder = model.pathfinder(data=data_file)
14:28:45 - cmdstanpy - INFO - Chain [1] start processing
14:28:45 - cmdstanpy - INFO - Chain [1] done processing
[3]:
print(pathfinder)
print(pathfinder.metadata)
CmdStanPathfinder: model=bernoulli['method=pathfinder']
csv_files:
/tmp/tmpcggr6zzu/bernoulli0sflv0au/bernoulli-20240617142845.csv
output_files:
/tmp/tmpcggr6zzu/bernoulli0sflv0au/bernoulli-20240617142845_0-stdout.txt
Metadata:
{'stan_version_major': 2, 'stan_version_minor': 35, 'stan_version_patch': 0, 'model': 'bernoulli_model', 'start_datetime': '2024-06-17 14:28:45 UTC', 'method': 'pathfinder', 'init_alpha': 0.001, 'tol_obj': 1e-12, 'tol_rel_obj': 10000, 'tol_grad': 1e-08, 'tol_rel_grad': 10000000.0, 'tol_param': 1e-08, 'history_size': 5, 'num_psis_draws': 1000, 'num_paths': 4, 'save_single_paths': 0, 'psis_resample': 1, 'calculate_lp': 1, 'max_lbfgs_iters': 1000, 'num_draws': 1000, 'num_elbo_draws': 25, 'id': 1, 'data_file': '/home/runner/.cmdstan/cmdstan-2.35.0/examples/bernoulli/bernoulli.data.json', 'init': 2, 'seed': 97812, 'diagnostic_file': '', 'refresh': 100, 'sig_figs': -1, 'profile_file': 'profile.csv', 'save_cmdstan_config': 0, 'num_threads': 1, 'stanc_version': 'stanc3 v2.35.0', 'stancflags': '', 'raw_header': 'lp_approx__,lp__,theta', 'column_names': ('lp_approx__', 'lp__', 'theta')}
The pathfinder
method returns a CmdStanPathfinder object, which provides access to the disparate information from the Stan CSV files.
The stan_variable and stan_variables methods return a Python numpy.ndarray containing all draws from the sample where the structure of each draw corresponds to the structure of the Stan variable.
The draws method returns the sample as a numpy.ndarray.
[4]:
pathfinder.stan_variable("theta").shape
[4]:
(1000,)
[5]:
pathfinder.column_names
[5]:
('lp_approx__', 'lp__', 'theta')
[6]:
pathfinder.draws().shape
[6]:
(1000, 3)
Pathfinders as initialization for the MCMC sampler¶
The method create_inits returns a Python Dict containing a set of per-chain initializations for the model parameters. Each set of initializations is a random draw from the Pathfinder sample. These initializations can be used as the initial parameter values for Stan’s NUTS-HMC sampler, which will reduce the number of warmup iterations needed.
[7]:
inits = pathfinder.create_inits()
print(inits)
[{'theta': array(0.369154)}, {'theta': array(0.1015)}, {'theta': array(0.168767)}, {'theta': array(0.0580899)}]
The create_inits
takes two arguments:
seed
- used for random selection.chains
- the number of draws to return, default is 4. This should match the number of sampler chains to run.
[8]:
inits = pathfinder.create_inits(chains=3)
print(inits)
[{'theta': array(0.214343)}, {'theta': array(0.168289)}, {'theta': array(0.213014)}]