Reproducibility

Floating point operations on modern computers are notoriously difficult to replicate because the fundamental arithmetic operations, right down to the IEEE 754 encoding level, are not fully specified. The primary problem is that the precision of operations varies across different hardware platforms and software implementations.

Stan is designed to allow full reproducibility. However, this is only possible up to the external constraints imposed by floating point arithmetic.

Stan results will only be exactly reproducible if all of the following components are identical:

  • Stan version
  • Stan interface (RStan, PyStan, CmdStan) and version, plus version of interface language (R, Python, shell)
  • versions of included libraries (Boost and Eigen)
  • operating system version
  • computer hardware including CPU, motherboard and memory
  • C++ compiler, including version, compiler flags, and linked libraries
  • same configuration of call to Stan, including random seed, chain ID, initialization and data

It doesn’t matter if you use a stable release version of Stan or the version with a particular Git hash tag. The same goes for all of the interfaces, compilers, and so on. The point is that if any of these moving parts changes in some way, floating point results may change.

Concretely, if you compile a single Stan program using the same CmdStan code base, but changed the optimization flag (-O3 vs. -O2 or -O0), the two programs may not return the identical stream of results. Thus it is very hard to guarantee reproducibility on externally managed hardware, like in a cluster or even a desktop managed by an IT department or with automatic updates turned on.

If, however, you compiled a Stan program today using one set of flags, took the computer away from the internet and didn’t allow it to update anything, then came back in a decade and recompiled the Stan program in the same way, you should get the same results.

The data needs to be the same down to the bit level. For example, if you are running in RStan, Rcpp handles the conversion between R’s floating point numbers and C++ doubles. If Rcpp changes the conversion process or use different types, the results are not guaranteed to be the same down to the bit level.

The compiler and compiler settings can also be an issue. There is a nice discussion of the issues and how to control reproducibility in Intel’s proprietary compiler by Corden and Kreitzer (2014).

Notable changes across versions

As noted above, there is no guarantee that the same results will be reproducible between two different versions of Stan, even if the same settings and environment are used.

However, there are occassionally notable changes which would affect many if not all users, and these are noted here. The absence of a version from this list still does not guarantee exact reproducibility between it and other versions.

  • Stan 2.28 changed the default chain ID for MCMC from 0 to 1. Users who had set a seed but not a chain ID would observe completely different outputs.
  • Stan 2.35 changed the default pseudo-random number generator used by the Stan algorithms. There is no relationship between seeds in versions pre-2.35 and version 2.35 and on.
Back to top

References

Corden, Martyn J., and David Kreitzer. 2014. “Consistency of Floating-Point Results Using the Intel Compiler or Why Doesn’t My Application Always Give the Same Answer?” Intel Corporation. https://software.intel.com/en-us/articles/consistency-of-floating-point-results-using-the-intel-compiler.