4 MCMC Sampling

4.1 Running the sampler

To generate a sample from the posterior distribution of the model conditioned on the data, we run the executable program with the argument sample or method=sample together with the input data. The executable can be run from any directory. Here, we run it in the directory which contains the Stan program and input data, <cmdstan-home>/examples/bernoulli:

> cd examples/bernoulli

To execute sampling of the model under Linux or Mac, use:

> ./bernoulli sample data file=bernoulli.data.json

In Windows, the ./ prefix is not needed:

> bernoulli.exe sample data file=bernoulli.data.json

The output is the same across all supported platforms. First, the configuration of the program is echoed to the standard output:

method = sample (Default)
  sample
    num_samples = 1000 (Default)
    num_warmup = 1000 (Default)
    save_warmup = 0 (Default)
    thin = 1 (Default)
    adapt
      engaged = 1 (Default)
      gamma = 0.050000000000000003 (Default)
      delta = 0.80000000000000004 (Default)
      kappa = 0.75 (Default)
      t0 = 10 (Default)
      init_buffer = 75 (Default)
      term_buffer = 50 (Default)
      window = 25 (Default)
    algorithm = hmc (Default)
      hmc
        engine = nuts (Default)
          nuts
            max_depth = 10 (Default)
        metric = diag_e (Default)
        metric_file =  (Default)
        stepsize = 1 (Default)
        stepsize_jitter = 0 (Default)
id = 0 (Default)
data
  file = bernoulli.data.json
init = 2 (Default)
random
  seed = 3252652196 (Default)
output
  file = output.csv (Default)
  diagnostic_file =  (Default)
  refresh = 100 (Default)

After the configuration has been displayed, a short timing message is given.

Gradient evaluation took 1.2e-05 seconds
1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds.
Adjust your expectations accordingly!

Next, the sampler reports the iteration number, reporting the percentage complete.

Iteration:    1 / 2000 [  0%]  (Warmup)
....
Iteration: 2000 / 2000 [100%]  (Sampling)

Finally, the sampler reports timing information:

 Elapsed Time: 0.007 seconds (Warm-up)
               0.017 seconds (Sampling)
               0.024 seconds (Total)

4.2 Running multiple chains

A Markov chain generates samples from the target distribution only after it has converged to equilibrium. In theory, convergence is only guaranteed asymptotically as the number of draws grows without bound. In practice, diagnostics must be applied to monitor convergence for the finite number of draws actually available. One way to monitor whether a chain has converged to the equilibrium distribution is to compare its behavior to other randomly initialized chains. For robust diagnostics, we recommend running 4 chains.

There are two different ways of running multiple chains, with the num_chains argument using a single executable and by using the Unix and DOS shell to run multiple executables. The former is currently supported and recommended when using the NUTS sampling algorithm with either the diagonal (diag_e) on dense (dense_e) metric.

4.2.1 Using the num_chains argument to run multiple chains

The num_chains argument can be used with the NUTS sampling algorihtm with either the diagonal (diag_e) or dense (dense_e) metric.

Example that will run 4 chains:

./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv

If the model was not compiled with STAN_THREADS=true, the above command will run 4 chains sequentially and will produce the sample in output_1.csv, output_2.csv, output_3.csv, output_4.csv. A suffix with the chain id is appended to the provided output filename (output.csv in the above command).

If the model was compiled with STAN_THREADS=true, the chains can run in parallel, with the num_threads argument defining the maximum number of threads used to run the chains. If the model uses no within-chain parallelization (map_rect or reduce_sum calls), the below command will run 4 chains in parallel, provided there are cores available:

./bernoulli sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=4

If the model uses within-chain parallelization (map_rect or reduce_sum calls), the threads are automatically scheduled to run the parallel parts of a single chain or run the sequential parts of another chains. The below call starts 4 chains that can use 16 threads. At a given moment a single chain may use all 16 threads, 1 thread, anything in between, or can wait for a thread to be available. The scheduling is left to the Threading Building Blocks scheduler.

./bernoulli_par sample num_chains=4 data file=bernoulli.data.json output file=output.csv num_threads=16

4.2.2 Using shell for running multiple chains

To run multiple chains given a model and data, either sequentially or in parallel, we can also use the Unix or DOS shell for loop to set up index variables needed to identify each chain and its outputs.

On MacOS or Linux, the for-loop syntax for both the bash and zsh interpreters is:

for NAME [in LIST]; do COMMANDS; done

The list can be a simple sequence of numbers, or you can use the shell expansion syntax {1..N} which expands to the sequence from \(1\) to \(N\), e.g. {1..4} expands to 1 2 3 4. Note that the expression {1..N} cannot contain spaces.

To run 4 chains for the example bernoulli model on MacOS or Linux:

> for i in {1..4}
    do
      ./bernoulli sample data file=bernoulli.data.json \
      output file=output_${i}.csv
    done

The backslash (\) indicates a line continuation in Unix. The expression ${i} substitutes in the value of loop index variable i. To run chains in parallel, put an ampersand (&) at the end of the nested sampler command:

> for i in {1..4}
    do
      ./bernoulli sample data file=bernoulli.data.json \
      output file=output_${i}.csv &
    done

This pushes each process into the background which allows the loop to continue without waiting for the current chain to finish.

On Windows, the DOS for-loop syntax is one of:

for %i in (SET) do COMMAND COMMAND-ARGUMENTS
for /l %i in (START, STEP, END) do COMMAND COMMAND-ARGUMENTS

To run 4 chains in parallel on Windows:

>for /l %i in (1, 1, 4) do start /b bernoulli.exe sample ^
                                    data file=bernoulli.data.json my_data ^
                                    output file=output_%i.csv

The caret (^) indicates a line continuation in DOS.

4.3 Stan CSV output file

Each execution of the model results in draws from a single Markov chain being written to a file in comma-separated value (CSV) format. The default name of the output file is output.csv.

The first part of the output file records the version of the underlying Stan library and the configuration as comments (i.e., lines beginning with the pound sign (#)).

# stan_version_major = 2
# stan_version_minor = 23
# stan_version_patch = 0
# model = bernoulli_model
# method = sample (Default)
#   sample
#     num_samples = 1000 (Default)
#     num_warmup = 1000 (Default)
...
# output
#   file = output.csv (Default)
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)

This is followed by a CSV header indicating the names of the values sampled.

lp__,accept_stat__,stepsize__,treedepth__,n_leapfrog__,divergent__,energy__,theta

The first output columns report the HMC sampler information:

  • lp__ - the total log probability density (up to an additive constant) at each sample
  • accept_stat__ - the average Metropolis acceptance probability over each simulated Hamiltonian trajectory
  • stepsize__ - integrator step size
  • treedepth__ - depth of tree used by NUTS (NUTS sampler)
  • n_leapfrog__ - number of leapfrog calculations (NUTS sampler)
  • divergent__ - has value 1 if trajectory diverged, otherwise 0. (NUTS sampler)
  • energy__ - value of the Hamiltonian
  • int_time__ - total integration time (static HMC sampler)

Because the above header is from the NUTS sampler, it has columns treedepth__, n_leapfrog__, and divergent__ and doesn’t have column int_time__. The remaining columns correspond to model parameters. For the Bernoulli model, it is just the final column, theta.

The header line is written to the output file before warmup begins. If option save_warmup is set to 1, the warmup draws are output directly after the header. The total number of warmup draws saved is num_warmup divided by thin, rounded up (i.e., ceiling).

Following the warmup draws (if any), are comments which record the results of adaptation: the stepsize, and inverse mass metric used during sampling:

# Adaptation terminated
# Step size = 0.884484
# Diagonal elements of inverse mass matrix:
# 0.535006

The default sampler is NUTS with an adapted step size and a diagonal inverse mass matrix. For this example, the step size is 0.884484, and the inverse mass contains the single entry 0.535006 corresponding to the parameter theta.

Draws from the posterior distribution are printed out next, each line containing a single draw with the columns corresponding to the header.

-6.84097,0.974135,0.884484,1,3,0,6.89299,0.198853
-6.91767,0.985167,0.884484,1,1,0,6.92236,0.182295
-7.04879,0.976609,0.884484,1,1,0,7.05641,0.162299
-6.88712,1,0.884484,1,1,0,7.02101,0.188229
-7.22917,0.899446,0.884484,1,3,0,7.73663,0.383596
...

The output ends with timing details:

#  Elapsed Time: 0.007 seconds (Warm-up)
#                0.017 seconds (Sampling)
#                0.024 seconds (Total)

4.4 Summarizing sampler output(s) with stansummary

The stansummary utility processes one or more output files from a run or set of runs of Stan’s HMC sampler given a model and data. For all columns in the Stan CSV output file stansummary reports a set of statistics including mean, standard deviation, percentiles, effective number of samples, and \(\hat{R}\) values.

To run stansummary on the output files generated by the for loop above, by the above run of the bernoulli model on Mac or Linux:

<cmdstan-home>/bin/stansummary output_*.csv

On Windows, use backslashes to call the stansummary.exe.

<cmdstan-home>\bin\stansummary.exe output_*.csv

The stansummary output consists of one row of statistics per column in the Stan CSV output file. Therefore, the first rows in the stansummary report statistics over the sampler state. The final row of output summarizes the estimates of the model variable theta:

Inference for Stan model: bernoulli_model
4 chains: each with iter=(1000,1000,1000,1000); warmup=(0,0,0,0); thin=(1,1,1,1); 4000 iterations saved.

Warmup took (0.0070, 0.0070, 0.0070, 0.0070) seconds, 0.028 seconds total
Sampling took (0.020, 0.017, 0.021, 0.019) seconds, 0.077 seconds total

                Mean     MCSE  StdDev     5%   50%   95%    N_Eff  N_Eff/s    R_hat
lp__            -7.3  1.8e-02    0.75   -8.8  -7.0  -6.8  1.8e+03  2.4e+04  1.0e+00
accept_stat__   0.89  2.7e-03    0.17   0.52  0.96   1.0  3.9e+03  5.1e+04  1.0e+00
stepsize__       1.1  7.5e-02    0.11   0.93   1.2   1.2  2.0e+00  2.6e+01  2.5e+13
treedepth__      1.4  8.1e-03    0.49    1.0   1.0   2.0  3.6e+03  4.7e+04  1.0e+00
n_leapfrog__     2.3  1.7e-02    0.98    1.0   3.0   3.0  3.3e+03  4.3e+04  1.0e+00
divergent__     0.00      nan    0.00   0.00  0.00  0.00      nan      nan      nan
energy__         7.8  2.6e-02     1.0    6.8   7.5   9.9  1.7e+03  2.2e+04  1.0e+00
theta           0.25  2.9e-03    0.12  0.079  0.23  0.46  1.7e+03  2.1e+04  1.0e+00

Samples were drawn using hmc with nuts.
For each parameter, N_Eff is a crude measure of effective sample size,
and R_hat is the potential scale reduction factor on split chains (at 
convergence, R_hat=1).

In this example, we conditioned the model on a dataset consisting of the outcomes of 10 bernoulli trials, where only 2 trials reported success. The 5%, 50%, and 95% percentile values for theta reflect the uncertainty in our estimate, due to the small amount of data, given the prior of beta(1, 1)