15.4 General Configuration Options
Stan’s interfaces provide a number of configuration options that are shared among the MCMC algorithms (this chapter), the optimization algorithms chapter, and the diagnostics chapter.
Random Number Generator
The random-number generator’s behavior is fully determined by the unsigned seed (positive integer) it is started with. If a seed is not specified, or a seed of 0 or less is specified, the system time is used to generate a seed. The seed is recorded and included with Stan’s output regardless of whether it was specified or generated randomly from the system time.
Stan also allows a chain identifier to be specified, which is useful when running multiple Markov chains for sampling. The chain identifier is used to advance the random number generator a very large number of random variates so that two chains with different identifiers draw from non-overlapping subsequences of the random-number sequence determined by the seed. When running multiple chains from a single command, Stan’s interfaces will manage the chain identifiers.
Replication
Together, the seed and chain identifier determine the behavior of the underlying random number generator. For complete reproducibility, every aspect of the environment needs to be locked down from the OS and version to the C++ compiler and version to the version of Stan and all dependent libraries.
Initialization
The initial parameter values for Stan’s algorithms (MCMC, optimization, or diagnostic) may be either specified by the user or generated randomly. If user-specified values are provided, all parameters must be given initial values or Stan will abort with an error message.
User-Defined Initialization
If the user specifies initial values, they must satisfy the constraints declared in the model (i.e., they are on the constrained scale).
System Constant Zero Initialization
It is also possible to provide an initialization of 0, which causes all variables to be initialized with zero values on the unconstrained scale. The transforms are arranged in such a way that zero initialization provides reasonable variable initializations for most parameters, such as 0 for unconstrained parameters, 1 for parameters constrained to be positive, 0.5 for variables to constrained to lie between 0 and 1, a symmetric (uniform) vector for simplexes, unit matrices for both correlation and covariance matrices, and so on.
System Random Initialization
Random initialization by default initializes the parameter values with values drawn at random from a \(\mathsf{Uniform}(-2, 2)\) distribution. Alternatively, a value other than 2 may be specified for the absolute bounds. These values are on the unconstrained scale, so must be inverse transformed back to satisfy the constraints declared for parameters.
Because zero is chosen to be a reasonable default initial value for most parameters, the interval around zero provides a fairly diffuse starting point. For instance, unconstrained variables are initialized randomly in \((-2, 2)\), variables constrained to be positive are initialized roughly in \((0.14, 7.4)\), variables constrained to fall between 0 and 1 are initialized with values roughly in \((0.12, 0.88)\).