4 MCMC Sampling
4.1 Running the sampler
To generate a sample from the posterior distribution of
the model conditioned on the data,
we run the executable program with the argument sample
or method=sample
together with the input data.
The executable can be run from any directory.
Here, we run it in the directory which contains the Stan program and input data,
<cmdstan-home>/examples/bernoulli
:
> cd examples/bernoulli
To execute sampling of the model under Linux or Mac, use:
> ./bernoulli sample data file=bernoulli.data.json
In Windows, the ./
prefix is not needed:
> bernoulli.exe sample data file=bernoulli.data.json
The output is the same across all supported platforms. First, the configuration of the program is echoed to the standard output:
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 1000 (Default)
save_warmup = 0 (Default)
thin = 1 (Default)
adapt
engaged = 1 (Default)
gamma = 0.050000000000000003 (Default)
delta = 0.80000000000000004 (Default)
kappa = 0.75 (Default)
t0 = 10 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 10 (Default)
metric = diag_e (Default)
metric_file = (Default)
stepsize = 1 (Default)
stepsize_jitter = 0 (Default)
id = 0 (Default)
data
file = bernoulli.data.json
init = 2 (Default)
random
seed = 3252652196 (Default)
output
file = output.csv (Default)
diagnostic_file = (Default)
refresh = 100 (Default)
After the configuration has been displayed, a short timing message is given.
Gradient evaluation took 1.2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Adjust your expectations accordingly!
Next, the sampler reports the iteration number, reporting the percentage complete.
Iteration: 1 / 2000 [ 0%] (Warmup)
....
Iteration: 2000 / 2000 [100%] (Sampling)
Finally, the sampler reports timing information:
Elapsed Time: 0.007 seconds (Warm-up)
0.017 seconds (Sampling)
0.024 seconds (Total)
4.2 Running multiple chains
A Markov chain generates samples from the target distribution only after it has converged to equilibrium. In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior to other randomly initialized chains. For robust diagnostics, we recommend running 4 chains.
There are two different ways of running multiple chains, with the num_chains
argument using a single executable
and by using the Unix and DOS shell to run multiple executables. The former is currently supported and
recommended when using the NUTS sampling algorithm with either the diagonal (diag_e
) on dense (dense_e
) metric.
4.2.1 Using the num_chains argument to run multiple chains
The num_chains
argument can be used with the NUTS sampling algorihtm with either the diagonal (diag_e
) or dense
(dense_e
) metric.
Example that will run 4 chains:
./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv
If the model was not compiled with STAN_THREADS=true
, the above command will run 4 chains sequentially and will
produce the sample in output_1.csv
, output_2.csv
, output_3.csv
, output_4.csv
. A suffix with the chain id
is appended to the provided output filename (output.csv
in the above command).
If the model was compiled with STAN_THREADS=true
, the chains can run in parallel, with the num_threads
argument
defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization
(map_rect
or reduce_sum
calls), the below command will run 4 chains in parallel, provided there are cores
available:
./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4
If the model uses within-chain parallelization (map_rect
or reduce_sum
calls), the threads are automatically
scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below
call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads,
1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the Threading
Building Blocks scheduler.
./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16
4.2.2 Using shell for running multiple chains
To run multiple chains given a model and data, either sequentially or in parallel,
we can also use the Unix or DOS shell for
loop to set up index variables needed to identify
each chain and its outputs.
On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters is:
for NAME [in LIST]; do COMMANDS; done
The list can be a simple sequence of numbers, or you can use the shell expansion syntax {1..N}
which expands to the sequence from \(1\) to \(N\), e.g. {1..4}
expands to 1 2 3 4
.
Note that the expression {1..N}
cannot contain spaces.
To run 4 chains for the example bernoulli model on MacOS or Linux:
> for i in {1..4}
do
./bernoulli sample data file=bernoulli.data.json \
output file=output_${i}.csv
done
The backslash (\
) indicates a line continuation in Unix.
The expression ${i}
substitutes in the value of loop index variable i
.
To run chains in parallel, put an ampersand (&
) at the end of the nested sampler command:
> for i in {1..4}
do
./bernoulli sample data file=bernoulli.data.json \
output file=output_${i}.csv &
done
This pushes each process into the background which allows the loop to continue without waiting for the current chain to finish.
On Windows, the DOS for-loop syntax is one of:
for %i in (SET) do COMMAND COMMAND-ARGUMENTS
for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS
To run 4 chains in parallel on Windows:
>for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^
data file=bernoulli.data.json my_data ^
output file=output_%i.csv
The caret (^
) indicates a line continuation in DOS.
4.3 Stan CSV output file
Each execution of the model results in draws from a single Markov
chain being written to a file in comma-separated value (CSV) format.
The default name of the output file is output.csv
.
The first part of the output file records the version of the
underlying Stan library and the configuration as comments (i.e., lines
beginning with the pound sign (#
)).
# stan_version_major = 2
# stan_version_minor = 23
# stan_version_patch = 0
# model = bernoulli_model
# method = sample (Default)
# sample
# num_samples = 1000 (Default)
# num_warmup = 1000 (Default)
...
# output
# file = output.csv (Default)
# diagnostic_file = (Default)
# refresh = 100 (Default)
This is followed by a CSV header indicating the names of the values sampled.
lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta
The first output columns report the HMC sampler information:
lp__
- the total log probability density (up to an additive constant) at each sampleaccept_stat__
- the average Metropolis acceptance probability over each simulated Hamiltonian trajectorystepsize__
- integrator step sizetreedepth__
- depth of tree used by NUTS (NUTS sampler)n_leapfrog__
- number of leapfrog calculations (NUTS sampler)divergent__
- has value1
if trajectory diverged, otherwise0
. (NUTS sampler)energy__
- value of the Hamiltonianint_time__
- total integration time (static HMC sampler)
Because the above header is from the NUTS sampler, it has columns treedepth__
, n_leapfrog__
, and divergent__
and doesn’t have column int_time__
.
The remaining columns correspond to model parameters. For the
Bernoulli model, it is just the final column, theta
.
The header line is written to the output file before warmup begins.
If option save_warmup
is set to 1
, the warmup draws are output directly after the header.
The total number of warmup draws saved is num_warmup
divided by thin
, rounded up (i.e., ceiling
).
Following the warmup draws (if any), are comments which record the results of adaptation: the stepsize, and inverse mass metric used during sampling:
# Adaptation terminated
# Step size = 0.884484
# Diagonal elements of inverse mass matrix:
# 0.535006
The default sampler is NUTS with an adapted step size and a diagonal
inverse mass matrix. For this example, the step size is 0.884484, and
the inverse mass contains the single entry 0.535006 corresponding to
the parameter theta
.
Draws from the posterior distribution are printed out next, each line containing a single draw with the columns corresponding to the header.
-6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853
-6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295
-7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299
-6.88712,1,0.884484,1,1,0,7.02101,0.188229
-7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596
...
The output ends with timing details:
# Elapsed Time: 0.007 seconds (Warm-up)
# 0.017 seconds (Sampling)
# 0.024 seconds (Total)
4.4 Summarizing sampler output(s) with stansummary
The stansummary
utility processes one or more output files from a run
or set of runs of Stan’s HMC sampler given a model and data.
For all columns in the Stan CSV output file stansummary
reports a set of statistics
including mean, standard deviation, percentiles, effective number of samples, and \(\hat{R}\) values.
To run stansummary
on the output files generated by the for loop above,
by the above run of the bernoulli
model on Mac or Linux:
<cmdstan-home>/bin/stansummary output_*.csv
On Windows, use backslashes to call the stansummary.exe
.
<cmdstan-home>\bin\stansummary.exe output_*.csv
The stansummary output consists of one row of statistics per column
in the Stan CSV output file. Therefore, the first rows in the
stansummary report statistics over the sampler state.
The final row of output summarizes the estimates of the model variable theta
:
Inference for Stan model: bernoulli_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.
Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total
Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total
Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat
lp__ -7.3 1.8e-02 0.75 -8.8 -7.0 -6.8 1.8e+03 2.4e+04 1.0e+00
accept_stat__ 0.89 2.7e-03 0.17 0.52 0.96 1.0 3.9e+03 5.1e+04 1.0e+00
stepsize__ 1.1 7.5e-02 0.11 0.93 1.2 1.2 2.0e+00 2.6e+01 2.5e+13
treedepth__ 1.4 8.1e-03 0.49 1.0 1.0 2.0 3.6e+03 4.7e+04 1.0e+00
n_leapfrog__ 2.3 1.7e-02 0.98 1.0 3.0 3.0 3.3e+03 4.3e+04 1.0e+00
divergent__ 0.00 nan 0.00 0.00 0.00 0.00 nan nan nan
energy__ 7.8 2.6e-02 1.0 6.8 7.5 9.9 1.7e+03 2.2e+04 1.0e+00
theta 0.25 2.9e-03 0.12 0.079 0.23 0.46 1.7e+03 2.1e+04 1.0e+00
Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at
convergence, R_hat=1).
In this example, we conditioned the model on a dataset consisting of the outcomes of
10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95%
percentile values for theta
reflect the uncertainty in our estimate, due to the
small amount of data, given the prior of beta(1, 1)