36.1 Some differences in how BUGS and Stan work
BUGS is interpreted, Stan is compiled
Stan is compiled in two steps, first a model is translated to templated C++ and then to a platform-specific executable. Stan, unlike BUGS, allows the user to directly program in C++, but we do not describe how to do this in this Stan manual (see the getting started with C++ section of https://mc-stan.org for more information on using Stan directly from C++).
BUGS performs MCMC updating one scalar parameter at a time, Stan uses HMC which moves in the entire space of all the parameters at each step
BUGS performs MCMC updating one scalar parameter at a time, (with some exceptions such as JAGS’s implementation of regression and generalized linear models and some conjugate multivariate parameters), using conditional distributions (Gibbs sampling) where possible and otherwise using adaptive rejection sampling, slice sampling, and Metropolis jumping. BUGS figures out the dependence structure of the joint distribution as specified in its modeling language and uses this information to compute only what it needs at each step. Stan moves in the entire space of all the parameters using Hamiltonian Monte Carlo (more precisely, the no-U-turn sampler), thus avoiding some difficulties that occur with one-dimension-at-a-time sampling in high dimensions but at the cost of requiring the computation of the entire log density at each step.
Differences in tuning during warmup
BUGS tunes its adaptive jumping (if necessary) during its warmup phase (traditionally referred to as “burn-in”). Stan uses its warmup phase to tune the no-U-turn sampler (NUTS).
The Stan language is directly executable, the BUGS modeling language is not
The BUGS modeling language is not directly executable. Rather, BUGS parses its model to determine the posterior density and then decides on a sampling scheme. In contrast, the statements in a Stan model are directly executable: they translate exactly into C++ code that is used to compute the log posterior density (which in turn is used to compute the gradient).
Differences in statement order
In BUGS, statements are executed according to the directed graphical
model so that variables are always defined when needed. A side effect
of the direct execution of Stan’s modeling language is that statements
execute in the order in which they are written. For instance, the
following Stan program, which sets mu
before using it to sample y
:
mu = a + b * x; y ~ normal(mu, sigma);
translates to the following C++ code:
mu = a + b * x;target += normal_lpdf(y | mu, sigma);
Contrast this with the following Stan program:
y ~ normal(mu, sigma); mu = a + b * x;
This program is well formed, but is almost certainly
a coding error, because it attempts to use mu
before
it is set. The direct translation to C++ code
highlights the potential error of using mu
in the first
statement:
target += normal_lpdf(y | mu, sigma);
mu = a + b * x;
To trap these kinds of errors, variables are initialized to the
special not-a-number (NaN
) value. If NaN
is passed to a
log probability function, it will raise a domain exception, which will
in turn be reported by the sampler. The sampler will reject the
sample out of hand as if it had zero probability.
Stan computes the gradient of the log density, BUGS computes the log density but not its gradient
Stan uses its own C++ algorithmic differentiation packages to compute the gradient of the log density (up to a proportion). Gradients are required during the Hamiltonian dynamics simulations within the leapfrog algorithm of the Hamiltonian Monte Carlo and NUTS samplers.
Both BUGS and Stan are semi-automatic
Both BUGS and Stan are semi-automatic in that they run by themselves with no outside tuning required. Nevertheless, the user needs to pick the number of chains and number of iterations per chain. We usually pick 4 chains and start with 10 iterations per chain (to make sure there are no major bugs and to approximately check the timing), then go to 100, 1000, or more iterations as necessary. Compared to Gibbs or Metropolis, Hamiltonian Monte Carlo can take longer per iteration (as it typically takes many “leapfrog steps” within each iteration), but the iterations typically have lower autocorrelation. So Stan might work fine with 1000 iterations in an example where BUGS would require 100,000 for good mixing. We recommend monitoring potential scale reduction statistics (\(\hat{R}\)) and the effective sample size to judge when to stop (stopping when \(\hat{R}\) values do not counter-indicate convergence and when enough effective samples have been collected).
Licensing
WinBUGS is closed source. OpenBUGS and JAGS are both licensed under the Gnu Public License (GPL), otherwise known as copyleft due to the restrictions it places on derivative works. Stan is licensed under the much more liberal new BSD license.