Pathfinder
Stan supports the Pathfinder algorithm (Zhang et al. 2022). Pathfinder is a variational method for approximately sampling from differentiable log densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the negative inverse Hessian estimates produced by the LBFGS optimizer. Pathfinder returns draws from the Gaussian approximation with the lowest estimated Kullback-Leibler (KL) divergence to the true posterior.
Stan provides two versions of the Pathfinder algorithm: single-path Pathfinder and multi-path Pathfinder. Single-path Pathfinder generates a set of approximate draws from one run of the basic Pathfinder algorithm. Multi-path Pathfinder uses importance resampling over the draws from multiple runs of Pathfinder. This better matches non-normal target densities and also mitigates the problem of L-BFGS getting stuck at local optima or in saddle points on plateaus. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. While the evaluations by Zhang et al. (2022) found that single-path and multi-path Pathfinder outperform ADVI for most of the models in the PosteriorDB (Magnusson et al. 2024) evaluation set, we recognize the need for further experiments on a wider range of models.
Diagnosing Pathfinder
Pathfinder diagnoses the accuracy of the approximation by computing the density ratio of the true posterior and the approximation and using Pareto-\(\hat{k}\) diagnostic (Vehtari et al. 2024) to assess whether these ratios can be used to improve the approximation via resampling. The normalization for the posterior can be estimated reliably (Vehtari et al. 2024, sec. 3), which is the first requirement for reliable resampling. If estimated Pareto-\(\hat{k}\) for the ratios is smaller than 0.7, there is still need to further diagnose reliability of importance sampling estimate for all quantities of interest (Vehtari et al. 2024, sec. 2.2). If estimated Pareto-\(\hat{k}\) is larger than 0.7, then the estimate for the normalization is unreliable and any Monte Carlo estimate may have a big error. The resampled draws can still contain some useful information about the location and shape of the posterior which can be used in early parts of Bayesian workflow (Gelman et al. 2020).
Using Pathfinder for initializing MCMC
If estimated Pareto-\(\hat{k}\) for the ratios is smaller than 0.7, the resampled posterior draws are almost as good for initializing MCMC as would independent draws from the posterior be. If estimated Pareto-\(\hat{k}\) for the ratios is larger than 0.7, the Pathfinder draws are not reliable for posterior inference directly, but they are still very likely better for initializing MCMC than random draws from an arbitrary pre-defined distribution (e.g. uniform from -2 to 2 used by Stan by default). If Pareto-\(\hat{k}\) is larger than 0.7, it is likely that one of the ratios is much bigger than others and the default resampling with replacement would produce copies of one unique draw. For initializing several Markov chains, it is better to use resampling without replacement to guarantee unique initialization for each chain. At the moment Stan allows turning off the resampling completely, and then the resampling without replacement can be done outside of Stan.