16.1 Markov chains

This is an old version, view current version.

A Markov chain is a sequence of random variables \(\theta^{(1)}, \theta^{(2)},\ldots\) where each variable is conditionally independent of all other variables given the value of the previous value. Thus if \(\theta = \theta^{(1)}, \theta^{(2)},\ldots, \theta^{(N)}\), then

\[ p(\theta) = p(\theta^{(1)}) \prod_{n=2}^N p(\theta^{(n)}|\theta^{(n-1)}). \]

Stan uses Hamiltonian Monte Carlo to generate a next state in a manner described in the Hamiltonian Monte Carlo chapter.

The Markov chains Stan and other MCMC samplers generate are ergodic in the sense required by the Markov chain central limit theorem, meaning roughly that there is a reasonable chance of reaching one value of \(\theta\) from another. The Markov chains are also stationary, meaning that the transition probabilities do not change at different positions in the chain, so that for \(n, n' \geq 0\), the probability function \(p(\theta^{(n+1)}|\theta^{(n)})\) is the same as \(p(\theta^{(n'+1)}|\theta^{(n')})\) (following the convention of overloading random and bound variables and picking out a probability function by its arguments).

Stationary Markov chains have an equilibrium distribution on states in which each has the same marginal probability function, so that \(p(\theta^{(n)})\) is the same probability function as \(p(\theta^{(n+1)})\). In Stan, this equilibrium distribution \(p(\theta^{(n)})\) is the target density \(p(\theta)\) defined by a Stan program, which is typically a proper Bayesian posterior density \(p(\theta | y)\) defined on the log scale up to a constant.

Using MCMC methods introduces two difficulties that are not faced by independent sample Monte Carlo methods. The first problem is determining when a randomly initialized Markov chain has converged to its equilibrium distribution. The second problem is that the draws from a Markov chain may be correlated or even anti-correlated, and thus the central limit theorem’s bound on estimation error no longer applies. These problems are addressed in the next two sections.

Stan’s posterior analysis tools compute a number of summary statistics, estimates, and diagnostics for Markov chain Monte Carlo (MCMC) samples. Stan’s estimators and diagnostics are more robust in the face of non-convergence, antithetical sampling, and long-term Markov chain correlations than most of the other tools available. The algorithms Stan uses to achieve this are described in this chapter.