This is an old version, view current version.

## 15.1 Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) method that uses the derivatives of the density function being sampled to generate efficient transitions spanning the posterior (see, e.g., , Neal (2011) for more details). It uses an approximate Hamiltonian dynamics simulation based on numerical integration which is then corrected by performing a Metropolis acceptance step.

This section translates the presentation of HMC by into the notation of .

### Target density

The goal of sampling is to draw from a density $$p(\theta)$$ for parameters $$\theta$$. This is typically a Bayesian posterior $$p(\theta|y)$$ given data $$y$$, and in particular, a Bayesian posterior coded as a Stan program.

### Auxiliary momentum variable

HMC introduces auxiliary momentum variables $$\rho$$ and draws from a joint density

$p(\rho, \theta) = p(\rho | \theta) p(\theta).$

In most applications of HMC, including Stan, the auxiliary density is a multivariate normal that does not depend on the parameters $$\theta$$,

$\rho \sim \mathsf{MultiNormal}(0, M).$

$$M$$ is the Euclidean metric. It can be seen as a transform of parameter space that makes sampling more efficient; see for details.

By default Stan sets $$M^{-1}$$ equal to a diagonal estimate of the covariance computed during warmup.

### The Hamiltonian

The joint density $$p(\rho, \theta)$$ defines a Hamiltonian

$\begin{array}{rcl} H(\rho, \theta) & = & - \log p(\rho, \theta) \\[3pt] & = & - \log p(\rho | \theta) - \log p(\theta). \\[3pt] & = & T(\rho | \theta) + V(\theta), \end{array}$

where the term

$T(\rho | \theta) = - \log p(\rho | \theta)$

is called the “kinetic energy” and the term

$V(\theta) = - \log p(\theta)$

is called the “potential energy.” The potential energy is specified by the Stan program through its definition of a log density.

### Generating transitions

Starting from the current value of the parameters $$\theta$$, a transition to a new state is generated in two stages before being subjected to a Metropolis accept step.

First, a value for the momentum is drawn independently of the current parameter values,

$\rho \sim \mathsf{MultiNormal}(0, M).$

Thus momentum does not persist across iterations.

Next, the joint system $$(\theta,\rho)$$ made up of the current parameter values $$\theta$$ and new momentum $$\rho$$ is evolved via Hamilton’s equations,

$\begin{array}{rcccl} \displaystyle \frac{d\theta}{dt} & = & \displaystyle + \frac{\partial H}{\partial \rho} & = & \displaystyle + \frac{\partial T}{\partial \rho} \\[12pt] \displaystyle \frac{d\rho}{dt} & = & \displaystyle - \frac{\partial H}{\partial \theta } & = & \displaystyle - \frac{\partial T}{\partial \theta} - \frac{\partial V}{\partial \theta}. \end{array}$

With the momentum density being independent of the target density, i.e., $$p(\rho | \theta) = p(\rho)$$, the first term in the momentum time derivative, $${\partial T} / {\partial \theta}$$ is zero, yielding the pair time derivatives

$\begin{array}{rcl} \frac{d \theta}{d t} & = & +\frac{\partial T}{\partial \rho} \\[2pt] \frac{d \rho}{d t} & = & -\frac{\partial V}{\partial \theta}. \end{array}$

### Leapfrog integrator

The last section leaves a two-state differential equation to solve. Stan, like most other HMC implementations, uses the leapfrog integrator, which is a numerical integration algorithm that’s specifically adapted to provide stable results for Hamiltonian systems of equations.

Like most numerical integrators, the leapfrog algorithm takes discrete steps of some small time interval $$\epsilon$$. The leapfrog algorithm begins by drawing a fresh momentum term independently of the parameter values $$\theta$$ or previous momentum value.

$\rho \sim \mathsf{MultiNormal}(0, M).$ It then alternates half-step updates of the momentum and full-step updates of the position.

$\begin{array}{rcl} \rho & \leftarrow & \rho \, - \, \frac{\epsilon}{2} \frac{\partial V}{\partial \theta} \\[6pt] \theta & \leftarrow & \theta \, + \, \epsilon \, M^{-1} \, \rho \\[6pt] \rho & \leftarrow & \rho \, - \, \frac{\epsilon}{2} \frac{\partial V}{\partial \theta}. \end{array}$

By applying $$L$$ leapfrog steps, a total of $$L \, \epsilon$$ time is simulated. The resulting state at the end of the simulation ($$L$$ repetitions of the above three steps) will be denoted $$(\rho^{*}, \theta^{*})$$.

The leapfrog integrator’s error is on the order of $$\epsilon^3$$ per step and $$\epsilon^2$$ globally, where $$\epsilon$$ is the time interval (also known as the step size); provide a detailed analysis of numerical integration for Hamiltonian systems, including a derivation of the error bound for the leapfrog integrator.

### Metropolis accept step

If the leapfrog integrator were perfect numerically, there would no need to do any more randomization per transition than generating a random momentum vector. Instead, what is done in practice to account for numerical errors during integration is to apply a Metropolis acceptance step, where the probability of keeping the proposal $$(\rho^{*}, \theta^{*})$$ generated by transitioning from $$(\rho, \theta)$$ is

$\min \! \left( 1, \ \exp \! \left( H(\rho, \theta) - H(\rho^{*}, \theta^{*}) \right) \right).$

If the proposal is not accepted, the previous parameter value is returned for the next draw and used to initialize the next iteration.

### Algorithm summary

The Hamiltonian Monte Carlo algorithm starts at a specified initial set of parameters $$\theta$$; in Stan, this value is either user-specified or generated randomly. Then, for a given number of iterations, a new momentum vector is sampled and the current value of the parameter $$\theta$$ is updated using the leapfrog integrator with discretization time $$\epsilon$$ and number of steps $$L$$ according to the Hamiltonian dynamics. Then a Metropolis acceptance step is applied, and a decision is made whether to update to the new state $$(\theta^{*}, \rho^{*})$$ or keep the existing state.

### References

———. 2017. “A Conceptual Introduction to Hamiltonian Monte Carlo.” arXiv 1701.02434. https://arxiv.org/abs/1701.02434.
Betancourt, Michael, and Mark Girolami. 2013. Hamiltonian Monte Carlo for Hierarchical Models.” arXiv 1312.0906. http://arxiv.org/abs/1312.0906.
Gelman, Andrew, J. B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. Third Edition. London: Chapman & Hall / CRC Press.
Leimkuhler, Benedict, and Sebastian Reich. 2004. Simulating Hamiltonian Dynamics. Cambridge: Cambridge University Press.
Neal, Radford. 2011. MCMC Using Hamiltonian Dynamics.” In Handbook of Markov Chain Monte Carlo, edited by Steve Brooks, Andrew Gelman, Galin L. Jones, and Xiao-Li Meng, 116–62. Chapman; Hall/CRC.