The $sample() method of a CmdStanModel object runs the default MCMC algorithm in CmdStan (algorithm=hmc engine=nuts), to produce a set of draws from the posterior distribution of a model conditioned on some data.

Usage

$sample(
  data = NULL,
  seed = NULL,
  refresh = NULL,
  init = NULL,
  save_latent_dynamics = FALSE,
  output_dir = NULL,
  chains = 4,
  cores = getOption("mc.cores", 1),
  iter_warmup = NULL,
  iter_sampling = NULL,
  save_warmup = FALSE,
  thin = NULL,
  max_treedepth = NULL,
  adapt_engaged = TRUE,
  adapt_delta = NULL,
  step_size = NULL,
  metric = NULL,
  metric_file = NULL,
  inv_metric = NULL,
  init_buffer = NULL,
  term_buffer = NULL,
  window = NULL,
  fixed_param = FALSE,
  validate_csv = TRUE
)

Arguments shared by all fitting methods

The following arguments can be specified for any of the fitting methods (sample, optimize, variational). Arguments left at NULL default to the default used by the installed version of CmdStan.

  • data: (multiple options) The data to use. One of the following:

    • A named list of R objects (like for RStan). Internally this list is then written to JSON for CmdStan using write_stan_json().

    • A path to a data file compatible with CmdStan (JSON or R dump). See the appendices in the CmdStan manual for details on using these formats.

  • seed: (positive integer) A seed for the (P)RNG to pass to CmdStan.

  • refresh: (non-negative integer) The number of iterations between printed screen updates.

  • init: (multiple options) The initialization method:

    • A real number x>0 initializes randomly between [-x,x] (on the unconstrained parameter space);

    • 0 initializes to 0;

    • A character vector of paths (one per chain) to JSON or Rdump files. See write_stan_json() to write R objects to JSON files compatible with CmdStan.

  • save_latent_dynamics: (logical) Should auxiliary diagnostic information about the latent dynamics be written to temporary diagnostic CSV files? This argument replaces CmdStan's diagnostic_file argument and the content written to CSV is controlled by the user's CmdStan installation and not CmdStanR (and for some algorithms no content may be written). The default is save_latent_dynamics=FALSE, which is appropriate for almost every use case (all diagnostics recommended for users to check are always saved, e.g., divergences for HMC). To save the temporary files created when save_latent_dynamics=TRUE see the $save_latent_dynamics_files() method.

  • output_dir: (string) A path to a directory where CmdStan should write its output CSV files. For interactive use this can typically be left at NULL (temporary directory) since CmdStanR makes the CmdStan output (e.g., posterior draws and diagnostics) available in R via methods of the fitted model objects. The behavior of output_dir is as follows:

    • If NULL (the default) then the CSV files are written to a temporary directory and only saved permanently if the user calls one of the $save_* methods of the fitted model object (e.g., $save_output_files()).

    • If a path then the files are created in output_dir with names corresponding the defaults used by $save_output_files() (and similar methods like $save_latent_dynamics_files()).

Arguments unique to the sample method

In addition to the arguments above, the $sample() method also has its own set of arguments.

The following two arguments are offered by CmdStanR but do not correspond to arguments in CmdStan because all CmdStan arguments pertain to the execution of a single run only.

  • chains: (positive integer) The number of Markov chains to run. The default is 4.

  • cores: (positive integer) The maximum number of cores to use for running parallel chains. If cores is not specified then the default is to look for the option "mc.cores", which can be set for an entire R session by options(mc.cores=value). If the "mc.cores" option has not been set then the default is 1.

The rest of the arguments correspond to arguments offered by CmdStan. They are described briefly here and in greater detail in the CmdStan manual. Arguments left at NULL default to the default used by the installed version of CmdStan.

  • iter_sampling: (positive integer) The number of post-warmup iterations to run per chain.

  • iter_warmup: (positive integer) The number of warmup iterations to run per chain.

  • save_warmup: (logical) Should warmup iterations be saved? The default is FALSE.

  • thin: (positive integer) The period between saved samples. This should typically be left at its default (no thinning) unless memory is a problem.

  • max_treedepth: (positive integer) The maximum allowed tree depth for the NUTS engine. See the Tree Depth section of the CmdStan manual for more details.

  • adapt_engaged: (logical) Do warmup adaptation? The default is TRUE. If a precomputed inverse metric is specified via the inv_metric argument (or metric_file) then, if adapt_engaged=TRUE, Stan will use the provided inverse metric just as an initial guess during adaptation. To turn off adaptation when using a precomputed inverse metric set adapt_engaged=FALSE.

  • adapt_delta: (real in (0,1)) The adaptation target acceptance statistic.

  • step_size: (positive real) The initial step size for the discrete approximation to continuous Hamiltonian dynamics. This is further tuned during warmup.

  • metric: (character) One of "diag_e", "dense_e", or "unit_e", specifying the geometry of the base manifold. See the Euclidean Metric section of the CmdStan documentation for more details. To specify a precomputed (inverse) metric, see the inv_metric argument below.

  • metric_file: (character) A character vector containing paths to JSON or Rdump files (one per chain) compatible with CmdStan that contain precomputed inverse metrics. The metric_file argument is inherited from CmdStan but is confusing in that the entry in JSON or Rdump file(s) must be named inv_metric, referring to the inverse metric. We recommend instead using CmdStanR's inv_metric argument (see below) to specify an inverse metric directly using a vector or matrix from your R session.

  • inv_metric: (vector, matrix) A vector (if metric='diag_e') or a matrix (if metric='dense_e') for initializing the inverse metric, which can be used as an alternative to the metric_file argument. A vector is interpreted as a diagonal metric. The inverse metric is usually set to an estimate of the posterior covariance. See the adapt_engaged argument above for details on (and control over) how specifying a precomputed inverse metric interacts with adaptation.

  • init_buffer: (nonnegative integer) Width of initial fast timestep adaptation interval during warmup.

  • term_buffer: (nonnegative integer) Width of final fast timestep adaptation interval during warmup.

  • window: (nonnegative integer) Initial width of slow timestep/metric adaptation interval.

  • fixed_param: (logical) When TRUE, call CmdStan with argument "algorithm=fixed_param". The default is FALSE.

  • validate_csv: (logical) When TRUE (the default), validate the sampling results in the csv files. Disable if you wish to manually read in the sampling results and validate them.

Value

The $sample() method returns a CmdStanMCMC object.

See also

The CmdStanR website (mc-stan.org/cmdstanr) for online documentation and tutorials.

The Stan and CmdStan documentation:

Other CmdStanModel methods: model-method-compile, model-method-optimize, model-method-variational

Examples

# \dontrun{ # Set path to cmdstan # (Note: if you installed CmdStan via install_cmdstan() with default settings # then setting the path is unnecessary but the default below should still work. # Otherwise use the `path` argument to specify the location of your # CmdStan installation.) set_cmdstan_path(path = NULL)
#> CmdStan path set to: /Users/jgabry/.cmdstanr/cmdstan-2.23.0
# Create a CmdStanModel object from a Stan program, # here using the example model that comes with CmdStan stan_program <- file.path(cmdstan_path(), "examples/bernoulli/bernoulli.stan") mod <- cmdstan_model(stan_program)
#> Model executable is up to date!
mod$print()
#> data { #> int<lower=0> N; #> int<lower=0,upper=1> y[N]; #> } #> parameters { #> real<lower=0,upper=1> theta; #> } #> model { #> theta ~ beta(1,1); // uniform prior on interval 0,1 #> y ~ bernoulli(theta); #> }
# data as a named list (like RStan) stan_data <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1)) # run MCMC using the 'sample' method fit_mcmc <- mod$sample( data = stan_data, seed = 123, chains = 2, cores = 2 )
#> Running MCMC with 2 chain(s) on 2 core(s)... #> #> Running ./bernoulli 'id=1' random 'seed=123' data \ #> 'file=/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/standata-29296abb0b4f.json' \ #> output \ #> 'file=/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-1-e3a647.csv' \ #> 'method=sample' 'save_warmup=0' 'algorithm=hmc' 'engine=nuts' adapt \ #> 'engaged=1' #> Running ./bernoulli 'id=2' random 'seed=124' data \ #> 'file=/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/standata-29296abb0b4f.json' \ #> output \ #> 'file=/var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-2-e3a647.csv' \ #> 'method=sample' 'save_warmup=0' 'algorithm=hmc' 'engine=nuts' adapt \ #> 'engaged=1' #> Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup) #> Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup) #> Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup) #> Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup) #> Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup) #> Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup) #> Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup) #> Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup) #> Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup) #> Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup) #> Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup) #> Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling) #> Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling) #> Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling) #> Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling) #> Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling) #> Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling) #> Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling) #> Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling) #> Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling) #> Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling) #> Chain 1 Iteration: 2000 / 2000 [100%] (Sampling) #> Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup) #> Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup) #> Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup) #> Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup) #> Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup) #> Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup) #> Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup) #> Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup) #> Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup) #> Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup) #> Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup) #> Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling) #> Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling) #> Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling) #> Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling) #> Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling) #> Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling) #> Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling) #> Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling) #> Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling) #> Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling) #> Chain 2 Iteration: 2000 / 2000 [100%] (Sampling) #> Chain 1 finished in 0.1 seconds. #> Chain 2 finished in 0.1 seconds. #> #> Both chains finished successfully. #> Mean chain execution time: 0.1 seconds. #> Total execution time: 0.2 seconds.
# Use 'posterior' package for summaries fit_mcmc$summary()
#> # A tibble: 2 x 10 #> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 lp__ -7.28 -7.00 0.739 0.342 -8.80 -6.75 1.00 815. 621. #> 2 theta 0.254 0.235 0.123 0.123 0.0809 0.485 1.00 752. 589.
# Call CmdStan's diagnose and stansummary utilities fit_mcmc$cmdstan_diagnose()
#> Running bin/diagnose \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-1-e3a647.csv \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-2-e3a647.csv #> Processing csv files: /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-1-e3a647.csv, /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-2-e3a647.csv #> #> Checking sampler transitions treedepth. #> Treedepth satisfactory for all transitions. #> #> Checking sampler transitions for divergences. #> No divergent transitions found. #> #> Checking E-BFMI - sampler transitions HMC potential energy. #> E-BFMI satisfactory for all transitions. #> #> Effective sample size satisfactory. #> #> Split R-hat values satisfactory all parameters. #> #> Processing complete, no problems detected.
fit_mcmc$cmdstan_summary()
#> Running bin/stansummary \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-1-e3a647.csv \ #> /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-2-e3a647.csv #> Inference for Stan model: bernoulli_model #> 2 chains: each with iter=(1000,1000); warmup=(0,0); thin=(1,1); 2000 iterations saved. #> #> Warmup took (0.0066, 0.0066) seconds, 0.013 seconds total #> Sampling took (0.014, 0.013) seconds, 0.027 seconds total #> #> Mean MCSE StdDev 5% 50% 95% N_Eff N_Eff/s R_hat #> lp__ -7.3 3.1e-02 0.74 -8.8 -7.0 -6.8 5.9e+02 2.1e+04 1.0e+00 #> accept_stat__ 0.92 5.0e-03 0.14 0.61 0.97 1.0 7.3e+02 2.7e+04 1.0e+00 #> stepsize__ 1.0 9.0e-02 0.090 0.93 1.1 1.1 1.0e+00 3.7e+01 2.6e+13 #> treedepth__ 1.4 1.2e-02 0.52 1.0 1.0 2.0 1.9e+03 6.8e+04 1.0e+00 #> n_leapfrog__ 2.6 4.0e-01 1.5 1.0 3.0 7.0 1.4e+01 5.1e+02 1.0e+00 #> divergent__ 0.00 nan 0.00 0.00 0.00 0.00 nan nan nan #> energy__ 7.8 4.0e-02 1.0 6.8 7.4 10.0 6.9e+02 2.5e+04 1.0e+00 #> theta 0.25 4.5e-03 0.12 0.081 0.23 0.49 7.6e+02 2.8e+04 1.0e+00 #> #> Samples were drawn using hmc with nuts. #> For each parameter, N_Eff is a crude measure of effective sample size, #> and R_hat is the potential scale reduction factor on split chains (at #> convergence, R_hat=1). #>
# For models fit using MCMC, if you like working with RStan's stanfit objects # then you can create one with rstan::read_stan_csv() if (require(rstan, quietly = TRUE)) { stanfit <- rstan::read_stan_csv(fit_mcmc$output_files()) print(stanfit) }
#> Inference for Stan model: bernoulli-202005121301-1-e3a647. #> 2 chains, each with iter=2000; warmup=1000; thin=1; #> post-warmup draws per chain=1000, total post-warmup draws=2000. #> #> mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat #> theta 0.25 0.00 0.12 0.06 0.16 0.23 0.33 0.53 754 1 #> lp__ -7.28 0.03 0.74 -9.50 -7.45 -7.00 -6.80 -6.75 586 1 #> #> Samples were drawn using NUTS(diag_e) at Tue May 12 13:01:47 2020. #> For each parameter, n_eff is a crude measure of effective sample size, #> and Rhat is the potential scale reduction factor on split chains (at #> convergence, Rhat=1).
# Run 'optimize' method to get a point estimate (default is Stan's LBFGS algorithm) # and also demonstrate specifying data as a path to a file instead of a list my_data_file <- file.path(cmdstan_path(), "examples/bernoulli/bernoulli.data.json") fit_optim <- mod$optimize(data = my_data_file, seed = 123)
#> method = optimize #> optimize #> algorithm = lbfgs (Default) #> lbfgs #> init_alpha = 0.001 (Default) #> tol_obj = 9.9999999999999998e-13 (Default) #> tol_rel_obj = 10000 (Default) #> tol_grad = 1e-08 (Default) #> tol_rel_grad = 10000000 (Default) #> tol_param = 1e-08 (Default) #> history_size = 5 (Default) #> iter = 2000 (Default) #> save_iterations = 0 (Default) #> id = 1 #> data #> file = /Users/jgabry/.cmdstanr/cmdstan-2.23.0/examples/bernoulli/bernoulli.data.json #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-1-b98ea5.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> Initial log joint probability = -9.51104 #> Iter log prob ||dx|| ||grad|| alpha alpha0 # evals Notes #> 6 -5.00402 0.000103557 2.55661e-07 1 1 9 #> Optimization terminated normally: #> Convergence detected: relative gradient magnitude is below tolerance
#> Optimization method is experimental and the structure of returned object may change.
fit_optim$summary()
#> # A tibble: 2 x 2 #> variable estimate #> <chr> <dbl> #> 1 lp__ -5.00 #> 2 theta 0.2
# Run 'variational' to approximate the posterior (default is meanfield ADVI) fit_vb <- mod$variational(data = stan_data, seed = 123)
#> method = variational #> variational #> algorithm = meanfield (Default) #> meanfield #> iter = 10000 (Default) #> grad_samples = 1 (Default) #> elbo_samples = 100 (Default) #> eta = 1 (Default) #> adapt #> engaged = 1 (Default) #> iter = 50 (Default) #> tol_rel_obj = 0.01 (Default) #> eval_elbo = 100 (Default) #> output_samples = 1000 (Default) #> id = 1 #> data #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/standata-29292a4419fd.json #> init = 2 (Default) #> random #> seed = 123 #> output #> file = /var/folders/h6/14xy_35x4wd2tz542dn0qhtc0000gn/T/RtmpMwcfcG/bernoulli-202005121301-1-0646b8.csv #> diagnostic_file = (Default) #> refresh = 100 (Default) #> #> ------------------------------------------------------------ #> EXPERIMENTAL ALGORITHM: #> This procedure has not been thoroughly tested and may be unstable #> or buggy. The interface is subject to change. #> ------------------------------------------------------------ #> #> #> #> Gradient evaluation took 1.8e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds. #> Adjust your expectations accordingly! #> #> #> Begin eta adaptation. #> Iteration: 1 / 250 [ 0%] (Adaptation) #> Iteration: 50 / 250 [ 20%] (Adaptation) #> Iteration: 100 / 250 [ 40%] (Adaptation) #> Iteration: 150 / 250 [ 60%] (Adaptation) #> Iteration: 200 / 250 [ 80%] (Adaptation) #> Success! Found best value [eta = 1] earlier than expected. #> #> Begin stochastic gradient ascent. #> iter ELBO delta_ELBO_mean delta_ELBO_med notes #> 100 -6.262 1.000 1.000 #> 200 -6.263 0.500 1.000 #> 300 -6.307 0.336 0.007 MEDIAN ELBO CONVERGED #> #> Drawing a sample of size 1000 from the approximate posterior... #> COMPLETED.
#> Variational method is experimental and the structure of returned object may change.
fit_vb$summary()
#> # A tibble: 3 x 7 #> variable mean median sd mad q5 q95 #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 lp__ -7.18 -6.94 0.588 0.259 -8.36 -6.75 #> 2 lp_approx__ -0.515 -0.221 0.692 0.303 -2.06 -0.00257 #> 3 theta 0.263 0.246 0.115 0.113 0.106 0.481
# }