read_cmdstan_csv() is used internally by CmdStanR to read CmdStan's output CSV files into R. It can also be used by CmdStan users as a more flexible and efficient alternative to rstan::read_stan_csv(). See the Value section for details on the structure of the returned list.

It is also possible to create CmdStanR's fitted model objects directly from CmdStan CSV files using the as_cmdstan_fit() function.

read_cmdstan_csv(
  files,
  variables = NULL,
  sampler_diagnostics = NULL,
  format = getOption("cmdstanr_draws_format", NULL)
)

as_cmdstan_fit(
  files,
  check_diagnostics = TRUE,
  format = getOption("cmdstanr_draws_format")
)

Arguments

files

(character vector) The paths to the CmdStan CSV files. These can be files generated by running CmdStanR or running CmdStan directly.

variables

(character vector) Optionally, the names of the variables (parameters, transformed parameters, and generated quantities) to read in.

  • If NULL (the default) then all variables are included.

  • If an empty string (variables="") then none are included.

  • For non-scalar variables all elements or specific elements can be selected:

    • variables = "theta" selects all elements of theta;

    • variables = c("theta[1]", "theta[3]") selects only the 1st and 3rd elements.

sampler_diagnostics

(character vector) Works the same way as variables but for sampler diagnostic variables (e.g., "treedepth__", "accept_stat__", etc.). Ignored if the model was not fit using MCMC.

format

(string) The format for storing the draws or point estimates. The default depends on the method used to fit the model. See draws for details, in particular the note about speed and memory for models with many parameters.

check_diagnostics

(logical) For models fit using MCMC, should diagnostic checks be performed after reading in the files? The default is TRUE but set to FALSE to avoid checking for problems with divergences and treedepth.

Value

as_cmdstan_fit() returns a CmdStanMCMC, CmdStanMLE, or CmdStanVB object. Some methods typically defined for those objects will not work (e.g. save_data_file()) but the important methods like $summary(), $draws(), $sampler_diagnostics() and others will work fine.

read_cmdstan_csv() returns a named list with the following components:

  • metadata: A list of the meta information from the run that produced the CSV file(s). See Examples below.

The other components in the returned list depend on the method that produced the CSV file(s).

For sampling the returned list also includes the following components:

  • time: Run time information for the individual chains. The returned object is the same as for the $time() method except the total run time can't be inferred from the CSV files (the chains may have been run in parallel) and is therefore NA.

  • inv_metric: A list (one element per chain) of inverse mass matrices or their diagonals, depending on the type of metric used.

  • step_size: A list (one element per chain) of the step sizes used.

  • warmup_draws: If save_warmup was TRUE when fitting the model then a draws_array (or different format if format is specified) of warmup draws.

  • post_warmup_draws: A draws_array (or different format if format is specified) of post-warmup draws.

  • warmup_sampler_diagnostics: If save_warmup was TRUE when fitting the model then a draws_array (or different format if format is specified) of warmup draws of the sampler diagnostic variables.

  • post_warmup_sampler_diagnostics: A draws_array (or different format if format is specified) of post-warmup draws of the sampler diagnostic variables.

For optimization the returned list also includes the following components:

  • point_estimates: Point estimates for the model parameters.

For variational inference the returned list also includes the following components:

  • draws: A draws_matrix (or different format if format is specified) of draws from the approximate posterior distribution.

For standalone generated quantities the returned list also includes the following components:

  • generated_quantities: A draws_array of the generated quantities.

Examples

# \dontrun{ # Generate some CSV files to use for demonstration fit1 <- cmdstanr_example("logistic", method = "sample", save_warmup = TRUE) csv_files <- fit1$output_files() print(csv_files)
#> [1] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpmzUYEz/logistic-202203181227-1-726257.csv" #> [2] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpmzUYEz/logistic-202203181227-2-726257.csv" #> [3] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpmzUYEz/logistic-202203181227-3-726257.csv" #> [4] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpmzUYEz/logistic-202203181227-4-726257.csv"
# Creating fitting model objects # Create a CmdStanMCMC object from the CSV files fit2 <- as_cmdstan_fit(csv_files) fit2$print("beta")
#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail #> beta[1] -0.67 -0.66 0.24 0.24 -1.08 -0.28 1.00 4045 3169 #> beta[2] -0.27 -0.27 0.22 0.22 -0.64 0.08 1.00 3894 2721 #> beta[3] 0.69 0.68 0.26 0.26 0.27 1.13 1.00 3754 2922
# Using read_cmdstan_csv # # Read in everything x <- read_cmdstan_csv(csv_files) str(x)
#> List of 8 #> $ metadata :List of 40 #> ..$ stan_version_major : num 2 #> ..$ stan_version_minor : num 29 #> ..$ stan_version_patch : num 1 #> ..$ start_datetime : chr "2022-03-18 18:27:08 UTC" #> ..$ method : chr "sample" #> ..$ save_warmup : num 1 #> ..$ thin : num 1 #> ..$ gamma : num 0.05 #> ..$ kappa : num 0.75 #> ..$ t0 : num 10 #> ..$ init_buffer : num 75 #> ..$ term_buffer : num 50 #> ..$ window : num 25 #> ..$ algorithm : chr "hmc" #> ..$ engine : chr "nuts" #> ..$ metric : chr "diag_e" #> ..$ stepsize_jitter : num 0 #> ..$ num_chains : num 1 #> ..$ id : num [1:4] 1 2 3 4 #> ..$ init : num [1:4] 2 2 2 2 #> ..$ seed : num 27467875 #> ..$ refresh : num 100 #> ..$ sig_figs : num -1 #> ..$ profile_file : chr "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/RtmpmzUYEz/logistic-profile-202203181227-1-051c26.csv" #> ..$ stanc_version : chr "stanc3 v2.29.1" #> ..$ sampler_diagnostics : chr [1:6] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__" ... #> ..$ variables : chr [1:105] "lp__" "alpha" "beta[1]" "beta[2]" ... #> ..$ step_size_adaptation: num [1:4] 0.729 0.767 0.747 0.752 #> ..$ model_name : chr "logistic_model" #> ..$ adapt_engaged : num 1 #> ..$ adapt_delta : num 0.8 #> ..$ max_treedepth : num 10 #> ..$ step_size : num [1:4] 1 1 1 1 #> ..$ iter_warmup : num 1000 #> ..$ iter_sampling : num 1000 #> ..$ threads_per_chain : num 1 #> ..$ time :'data.frame': 4 obs. of 4 variables: #> .. ..$ chain_id: num [1:4] 1 2 3 4 #> .. ..$ warmup : num [1:4] 0.093 0.087 0.149 0.092 #> .. ..$ sampling: num [1:4] 0.081 0.096 0.1 0.087 #> .. ..$ total : num [1:4] 0.174 0.183 0.249 0.179 #> ..$ stan_variable_sizes :List of 4 #> .. ..$ lp__ : num 1 #> .. ..$ alpha : num 1 #> .. ..$ beta : num 3 #> .. ..$ log_lik: num 100 #> ..$ stan_variables : chr [1:4] "lp__" "alpha" "beta" "log_lik" #> ..$ model_params : chr [1:105] "lp__" "alpha" "beta[1]" "beta[2]" ... #> $ time :List of 2 #> ..$ total : int NA #> ..$ chains:'data.frame': 4 obs. of 4 variables: #> .. ..$ chain_id: num [1:4] 1 2 3 4 #> .. ..$ warmup : num [1:4] 0.093 0.087 0.149 0.092 #> .. ..$ sampling: num [1:4] 0.081 0.096 0.1 0.087 #> .. ..$ total : num [1:4] 0.174 0.183 0.249 0.179 #> $ inv_metric :List of 4 #> ..$ 1: num [1:4] 0.046 0.0637 0.0532 0.0736 #> ..$ 2: num [1:4] 0.0421 0.0566 0.0523 0.0756 #> ..$ 3: num [1:4] 0.0493 0.0528 0.0523 0.0753 #> ..$ 4: num [1:4] 0.0365 0.0565 0.0397 0.0632 #> $ step_size :List of 4 #> ..$ 1: num 0.729 #> ..$ 2: num 0.767 #> ..$ 3: num 0.747 #> ..$ 4: num 0.752 #> $ warmup_draws : 'draws_array' num [1:1000, 1:4, 1:105] -66.8 -66.8 -66.8 -65.8 -66.3 ... #> ..- attr(*, "dimnames")=List of 3 #> .. ..$ iteration: chr [1:1000] "1" "2" "3" "4" ... #> .. ..$ chain : chr [1:4] "1" "2" "3" "4" #> .. ..$ variable : chr [1:105] "lp__" "alpha" "beta[1]" "beta[2]" ... #> $ post_warmup_draws : 'draws_array' num [1:1000, 1:4, 1:105] -65 -65.7 -64.4 -64.2 -65.3 ... #> ..- attr(*, "dimnames")=List of 3 #> .. ..$ iteration: chr [1:1000] "1" "2" "3" "4" ... #> .. ..$ chain : chr [1:4] "1" "2" "3" "4" #> .. ..$ variable : chr [1:105] "lp__" "alpha" "beta[1]" "beta[2]" ... #> $ warmup_sampler_diagnostics : 'draws_array' num [1:1000, 1:4, 1:6] 1 0 0 0.941 0.933 ... #> ..- attr(*, "dimnames")=List of 3 #> .. ..$ iteration: chr [1:1000] "1" "2" "3" "4" ... #> .. ..$ chain : chr [1:4] "1" "2" "3" "4" #> .. ..$ variable : chr [1:6] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__" ... #> $ post_warmup_sampler_diagnostics: 'draws_array' num [1:1000, 1:4, 1:6] 1 0.916 0.998 0.972 0.911 ... #> ..- attr(*, "dimnames")=List of 3 #> .. ..$ iteration: chr [1:1000] "1" "2" "3" "4" ... #> .. ..$ chain : chr [1:4] "1" "2" "3" "4" #> .. ..$ variable : chr [1:6] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__" ...
# Don't read in any of the sampler diagnostic variables x <- read_cmdstan_csv(csv_files, sampler_diagnostics = "") # Don't read in any of the parameters or generated quantities x <- read_cmdstan_csv(csv_files, variables = "") # Read in only specific parameters and sampler diagnostics x <- read_cmdstan_csv( csv_files, variables = c("alpha", "beta[2]"), sampler_diagnostics = c("n_leapfrog__", "accept_stat__") ) # For non-scalar parameters all elements can be selected or only some elements, # e.g. all of the vector "beta" but only one element of the vector "log_lik" x <- read_cmdstan_csv( csv_files, variables = c("beta", "log_lik[3]") ) # }