vignettes/cmdstanr-internals.Rmd
cmdstanr-internals.Rmd
This vignette is intended to be read after the Getting started with CmdStanR vignette. Please read that first for important background. In this document we provide additional details about compiling models, passing in data, and how CmdStan output is saved and read back into R.
We will only use the $sample()
method in examples, but
all model fitting methods work in a similar way under the hood.
library(cmdstanr)
check_cmdstan_toolchain(fix = TRUE, quiet = TRUE)
The cmdstan_model()
function creates a new
CmdStanModel
object. The CmdStanModel
object
stores the path to a Stan program as well as the path to a compiled
executable.
stan_file <- file.path(cmdstan_path(), "examples", "bernoulli", "bernoulli.stan")
mod <- cmdstan_model(stan_file)
mod$print()
data {
int<lower=0> N;
array[N] int<lower=0, upper=1> y;
}
parameters {
real<lower=0, upper=1> theta;
}
model {
theta ~ beta(1, 1); // uniform prior on interval 0,1
y ~ bernoulli(theta);
}
mod$stan_file()
[1] "/Users/jgabry/.cmdstan/cmdstan-2.35.0/examples/bernoulli/bernoulli.stan"
mod$exe_file()
[1] "/Users/jgabry/.cmdstan/cmdstan-2.35.0/examples/bernoulli/bernoulli"
Subsequently, if you create a CmdStanModel
object from
the same Stan file then compilation will be skipped (assuming the file
hasn’t changed).
mod <- cmdstan_model(stan_file)
Internally, cmdstan_model()
first creates the
CmdStanModel
object from just the Stan file and then calls
its $compile()
method. Optional arguments to the $compile()
method can be
passed via ...
.
mod <- cmdstan_model(
stan_file,
force_recompile = TRUE,
include_paths = "paths/to/directories/with/included/files",
cpp_options = list(stan_threads = TRUE, STANC2 = TRUE)
)
It is also possible to delay compilation when creating the
CmdStanModel
object by specifying
compile=FALSE
and then later calling the
$compile()
method directly.
unlink(mod$exe_file())
mod <- cmdstan_model(stan_file, compile = FALSE)
mod$exe_file() # not yet created
character(0)
mod$compile()
mod$exe_file()
[1] "/Users/jgabry/.cmdstan/cmdstan-2.35.0/examples/bernoulli/bernoulli"
If you are using CmdStan version 2.24 or later and CmdStanR version 0.2.1 or later, you can run a pedantic check for your model. CmdStanR will always check that your Stan program does not contain any invalid syntax but with pedantic mode enabled the check will also warn you about other potential issues in your model, for example:
~
symbols).For the latest information on the checks performed in pedantic mode see the Pedantic mode chapter in the Stan Reference Manual.
Pedantic mode is available when compiling the model or when using the
separate $check_syntax()
method of a
CmdStanModel
object. Internally this corresponds to setting
the stanc
(Stan transpiler) option
warn-pedantic
. Here we demonstrate pedantic mode with a
Stan program that is syntactically correct but is missing a lower bound
and a prior for a parameter.
stan_file_pedantic <- write_stan_file("
data {
int N;
array[N] int y;
}
parameters {
// should have <lower=0> but omitting to demonstrate pedantic mode
real lambda;
}
model {
y ~ poisson(lambda);
}
")
To turn on pedantic mode at compile time you can set
pedantic=TRUE
in the call to cmdstan_model()
(or when calling the $compile()
method directly if using
the delayed compilation approach described above).
mod_pedantic <- cmdstan_model(stan_file_pedantic, pedantic = TRUE)
Warning in '/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a1418059d6f.stan', line 11, column 14: A
poisson distribution is given parameter lambda as a rate parameter
(argument 1), but lambda was not constrained to be strictly positive.
Warning: The parameter lambda has no priors. This means either no prior is
provided, or the prior(s) depend on data variables. In the later case,
this may be a false positive.
To turn on pedantic mode separately from compilation use the
pedantic
argument to the $check_syntax()
method.
mod_pedantic$check_syntax(pedantic = TRUE)
Warning in '/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model_febb1e69c7387a0e64cf13583e078104.stan', line 11, column 14: A
poisson distribution is given parameter lambda as a rate parameter
(argument 1), but lambda was not constrained to be strictly positive.
Warning: The parameter lambda has no priors. This means either no prior is
provided, or the prior(s) depend on data variables. In the later case,
this may be a false positive.
Stan program is syntactically correct
Using pedantic=TRUE
via the $check_syntax()
method also has the advantage that it can be used even if the model
hasn’t been compiled yet. This can be helpful because the pedantic and
syntax checks themselves are much faster than compilation.
rm(mod_pedantic)
mod_pedantic <- cmdstan_model(stan_file_pedantic, compile = FALSE)
mod_pedantic$check_syntax(pedantic = TRUE)
Warning in '/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model_febb1e69c7387a0e64cf13583e078104.stan', line 11, column 14: A
poisson distribution is given parameter lambda as a rate parameter
(argument 1), but lambda was not constrained to be strictly positive.
Warning: The parameter lambda has no priors. This means either no prior is
provided, or the prior(s) depend on data variables. In the later case,
this may be a false positive.
Stan program is syntactically correct
If using CmdStan 2.27 or newer, you can obtain the names, types and
dimensions of the data, parameters, transformed parameters and generated
quantities variables of a Stan model using the $variables()
method of the CmdStanModel
object.
stan_file_variables <- write_stan_file("
data {
int<lower=1> J;
vector<lower=0>[J] sigma;
vector[J] y;
}
parameters {
real mu;
real<lower=0> tau;
vector[J] theta_raw;
}
transformed parameters {
vector[J] theta = mu + tau * theta_raw;
}
model {
target += normal_lpdf(tau | 0, 10);
target += normal_lpdf(mu | 0, 10);
target += normal_lpdf(theta_raw | 0, 1);
target += normal_lpdf(y | theta, sigma);
}
")
mod_v <- cmdstan_model(stan_file_variables)
variables <- mod_v$variables()
The $variables()
method returns a list with
data
, parameters
,
transformed_parameters
and
generated_quantities
elements, each corresponding to
variables in their respective block of the program. Transformed data
variables are not listed as they are not used in the model’s input or
output.
names(variables)
[1] "parameters" "included_files" "data"
[4] "transformed_parameters" "generated_quantities"
names(variables$data)
[1] "J" "sigma" "y"
names(variables$parameters)
[1] "mu" "tau" "theta_raw"
names(variables$transformed_parameters)
[1] "theta"
names(variables$generated_quantities)
character(0)
Each variable is represented as a list containing the type
information (currently limited to real
or int
)
and the number of dimensions.
variables$data$J
$type
[1] "int"
$dimensions
[1] 0
variables$data$sigma
$type
[1] "real"
$dimensions
[1] 1
variables$parameters$tau
$type
[1] "real"
$dimensions
[1] 0
variables$transformed_parameters$theta
$type
[1] "real"
$dimensions
[1] 1
By default, the executable is created in the same directory as the
file containing the Stan program. You can also specify a different
location with the dir
argument.
mod <- cmdstan_model(stan_file, dir = "path/to/directory/for/executable")
There are three data formats that CmdStanR allows when fitting a model:
Like the RStan interface, CmdStanR accepts a named list of R objects
where the names correspond to variables declared in the data block of
the Stan program. In the Bernoulli model the data is N
, the
number of data points, and y
an integer array of
observations.
mod$print()
data {
int<lower=0> N;
array[N] int<lower=0, upper=1> y;
}
parameters {
real<lower=0, upper=1> theta;
}
model {
theta ~ beta(1, 1); // uniform prior on interval 0,1
y ~ bernoulli(theta);
}
# data block has 'N' and 'y'
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))
fit <- mod$sample(data = data_list)
Because CmdStan doesn’t accept lists of R objects, CmdStanR will
first write the data to a temporary JSON file using
write_stan_json()
. This happens internally, but it is also
possible to call write_stan_json()
directly.
data_list <- list(N = 10, y = c(0,1,0,0,0,0,0,0,0,1))
json_file <- tempfile(fileext = ".json")
write_stan_json(data_list, json_file)
cat(readLines(json_file), sep = "\n")
{
"N": 10,
"y": [0, 1, 0, 0, 0, 0, 0, 0, 0, 1]
}
If you already have your data in a JSON file you can just pass that
file directly to CmdStanR instead of using a list of R objects. For
example, we could pass in the JSON file we created above using
write_stan_json()
:
fit <- mod$sample(data = json_file)
Finally, it is also possible to use the R dump file format. This is
not recommended because CmdStan can process JSON faster than R
dump, but CmdStanR allows it because CmdStan will accept files created
by rstan::stan_rdump()
:
When fitting a model, the default behavior is to write the output from CmdStan to CSV files in a temporary directory.
fit$output_files()
[1] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-202407021546-1-2808db.csv"
[2] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-202407021546-2-2808db.csv"
[3] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-202407021546-3-2808db.csv"
[4] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-202407021546-4-2808db.csv"
These files will be lost if you end your R session or if you remove
the fit
object and force (or wait for) garbage
collection.
files <- fit$output_files()
file.exists(files)
[1] TRUE TRUE TRUE TRUE
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 1155803 61.8 2350794 125.6 NA 1542121 82.4
Vcells 2048173 15.7 8388608 64.0 32768 2851724 21.8
file.exists(files)
[1] FALSE FALSE FALSE FALSE
To save these files to a non-temporary location there are two
options. You can either specify the output_dir
argument to
mod$sample()
or use fit$save_output_files()
after fitting the model.
# see ?save_output_files for info on optional arguments
fit$save_output_files(dir = "path/to/directory")
fit <- mod$sample(
data = data_list,
output_dir = "path/to/directory"
)
With the exception of some diagnostic information, the CSV files are
not read into R until their contents are requested by calling a method
that requires them (e.g., fit$draws()
,
fit$summary()
, etc.). If we examine the structure of the
fit
object, notice how the Private
slot
draws_
is NULL
, indicating that the CSV files
haven’t yet been read into R.
str(fit)
Classes 'CmdStanMCMC', 'CmdStanFit', 'R6' <CmdStanMCMC>
Inherits from: <CmdStanFit>
Public:
clone: function (deep = FALSE)
cmdstan_diagnose: function ()
cmdstan_summary: function (flags = NULL)
code: function ()
config_files: function (include_failed = FALSE)
constrain_variables: function (unconstrained_variables, transformed_parameters = TRUE,
data_file: function ()
diagnostic_summary: function (diagnostics = c("divergences", "treedepth", "ebfmi"),
draws: function (variables = NULL, inc_warmup = FALSE, format = getOption("cmdstanr_draws_format",
expose_functions: function (global = FALSE, verbose = FALSE)
functions: environment
grad_log_prob: function (unconstrained_variables, jacobian = TRUE, jacobian_adjustment = NULL)
hessian: function (unconstrained_variables, jacobian = TRUE, jacobian_adjustment = NULL)
init: function ()
init_model_methods: function (seed = 1, verbose = FALSE, hessian = FALSE)
initialize: function (runset)
inv_metric: function (matrix = TRUE)
latent_dynamics_files: function (include_failed = FALSE)
log_prob: function (unconstrained_variables, jacobian = TRUE, jacobian_adjustment = NULL)
loo: function (variables = "log_lik", r_eff = TRUE, moment_match = FALSE,
lp: function ()
metadata: function ()
metric_files: function (include_failed = FALSE)
num_chains: function ()
num_procs: function ()
output: function (id = NULL)
output_files: function (include_failed = FALSE)
print: function (variables = NULL, ..., digits = 2, max_rows = getOption("cmdstanr_max_rows",
profile_files: function (include_failed = FALSE)
profiles: function ()
return_codes: function ()
runset: CmdStanRun, R6
sampler_diagnostics: function (inc_warmup = FALSE, format = getOption("cmdstanr_draws_format",
save_config_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_data_file: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_latent_dynamics_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_metric_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_object: function (file, ...)
save_output_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_profile_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
summary: function (variables = NULL, ...)
time: function ()
unconstrain_draws: function (files = NULL, draws = NULL, format = getOption("cmdstanr_draws_format",
unconstrain_variables: function (variables)
variable_skeleton: function (transformed_parameters = TRUE, generated_quantities = TRUE)
Private:
draws_: NULL
init_: NULL
inv_metric_: list
metadata_: list
model_methods_env_: environment
profiles_: NULL
read_csv_: function (variables = NULL, sampler_diagnostics = NULL, format = getOption("cmdstanr_draws_format",
return_codes_: 0 0 0 0
sampler_diagnostics_: 1 1 2 1 1 2 1 2 2 2 2 1 2 1 1 1 2 2 1 1 1 2 1 2 1 2 2 1 ...
warmup_draws_: NULL
warmup_sampler_diagnostics_: NULL
After we call a method that requires the draws then if we reexamine
the structure of the object we will see that the draws_
slot in Private
is no longer empty.
draws <- fit$draws() # force CSVs to be read into R
str(fit)
Classes 'CmdStanMCMC', 'CmdStanFit', 'R6' <CmdStanMCMC>
Inherits from: <CmdStanFit>
Public:
clone: function (deep = FALSE)
cmdstan_diagnose: function ()
cmdstan_summary: function (flags = NULL)
code: function ()
config_files: function (include_failed = FALSE)
constrain_variables: function (unconstrained_variables, transformed_parameters = TRUE,
data_file: function ()
diagnostic_summary: function (diagnostics = c("divergences", "treedepth", "ebfmi"),
draws: function (variables = NULL, inc_warmup = FALSE, format = getOption("cmdstanr_draws_format",
expose_functions: function (global = FALSE, verbose = FALSE)
functions: environment
grad_log_prob: function (unconstrained_variables, jacobian = TRUE, jacobian_adjustment = NULL)
hessian: function (unconstrained_variables, jacobian = TRUE, jacobian_adjustment = NULL)
init: function ()
init_model_methods: function (seed = 1, verbose = FALSE, hessian = FALSE)
initialize: function (runset)
inv_metric: function (matrix = TRUE)
latent_dynamics_files: function (include_failed = FALSE)
log_prob: function (unconstrained_variables, jacobian = TRUE, jacobian_adjustment = NULL)
loo: function (variables = "log_lik", r_eff = TRUE, moment_match = FALSE,
lp: function ()
metadata: function ()
metric_files: function (include_failed = FALSE)
num_chains: function ()
num_procs: function ()
output: function (id = NULL)
output_files: function (include_failed = FALSE)
print: function (variables = NULL, ..., digits = 2, max_rows = getOption("cmdstanr_max_rows",
profile_files: function (include_failed = FALSE)
profiles: function ()
return_codes: function ()
runset: CmdStanRun, R6
sampler_diagnostics: function (inc_warmup = FALSE, format = getOption("cmdstanr_draws_format",
save_config_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_data_file: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_latent_dynamics_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_metric_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_object: function (file, ...)
save_output_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
save_profile_files: function (dir = ".", basename = NULL, timestamp = TRUE, random = TRUE)
summary: function (variables = NULL, ...)
time: function ()
unconstrain_draws: function (files = NULL, draws = NULL, format = getOption("cmdstanr_draws_format",
unconstrain_variables: function (variables)
variable_skeleton: function (transformed_parameters = TRUE, generated_quantities = TRUE)
Private:
draws_: -7.37069 -7.97509 -8.91967 -8.36551 -6.96118 -10.4896 -6 ...
init_: NULL
inv_metric_: list
metadata_: list
model_methods_env_: environment
profiles_: NULL
read_csv_: function (variables = NULL, sampler_diagnostics = NULL, format = getOption("cmdstanr_draws_format",
return_codes_: 0 0 0 0
sampler_diagnostics_: 1 1 2 1 1 2 1 2 2 2 2 1 2 1 1 1 2 2 1 1 1 2 1 2 1 2 2 1 ...
warmup_draws_: NULL
warmup_sampler_diagnostics_: NULL
For models with many parameters, transformed parameters, or generated
quantities, if only some are requested (e.g., by specifying the
variables
argument to fit$draws()
) then
CmdStanR will only read in the requested variables (unless they have
already been read in).
Internally, the read_cmdstan_csv()
function is used to
read the CmdStan CSV files into R. This function is exposed to users, so
you can also call it directly.
# see ?read_cmdstan_csv for info on optional arguments controlling
# what information is read in
csv_contents <- read_cmdstan_csv(fit$output_files())
str(csv_contents)
List of 8
$ metadata :List of 42
..$ stan_version_major : num 2
..$ stan_version_minor : num 35
..$ stan_version_patch : num 0
..$ start_datetime : chr "2024-07-02 21:46:33 UTC"
..$ method : chr "sample"
..$ save_warmup : int 0
..$ thin : num 1
..$ gamma : num 0.05
..$ kappa : num 0.75
..$ t0 : num 10
..$ init_buffer : num 75
..$ term_buffer : num 50
..$ window : num 25
..$ save_metric : int 0
..$ algorithm : chr "hmc"
..$ engine : chr "nuts"
..$ metric : chr "diag_e"
..$ stepsize_jitter : num 0
..$ num_chains : num 1
..$ id : num [1:4] 1 2 3 4
..$ init : num [1:4] 2 2 2 2
..$ seed : num 1.84e+09
..$ refresh : num 100
..$ sig_figs : num -1
..$ profile_file : chr "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-profile-202407021546-1-27d079.csv"
..$ save_cmdstan_config : int 0
..$ stanc_version : chr "stanc3 v2.35.0"
..$ sampler_diagnostics : chr [1:6] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__" ...
..$ variables : chr [1:2] "lp__" "theta"
..$ step_size_adaptation: num [1:4] 1.068 0.891 1.039 1.129
..$ model_name : chr "bernoulli_model"
..$ adapt_engaged : int 1
..$ adapt_delta : num 0.8
..$ max_treedepth : num 10
..$ step_size : num [1:4] 1 1 1 1
..$ iter_warmup : num 1000
..$ iter_sampling : num 1000
..$ threads_per_chain : num 1
..$ time :'data.frame': 4 obs. of 4 variables:
.. ..$ chain_id: num [1:4] 1 2 3 4
.. ..$ warmup : num [1:4] 0.006 0.006 0.006 0.007
.. ..$ sampling: num [1:4] 0.018 0.018 0.017 0.017
.. ..$ total : num [1:4] 0.024 0.024 0.023 0.024
..$ stan_variable_sizes :List of 2
.. ..$ lp__ : num 1
.. ..$ theta: num 1
..$ stan_variables : chr [1:2] "lp__" "theta"
..$ model_params : chr [1:2] "lp__" "theta"
$ time :List of 2
..$ total : int NA
..$ chains:'data.frame': 4 obs. of 4 variables:
.. ..$ chain_id: num [1:4] 1 2 3 4
.. ..$ warmup : num [1:4] 0.006 0.006 0.006 0.007
.. ..$ sampling: num [1:4] 0.018 0.018 0.017 0.017
.. ..$ total : num [1:4] 0.024 0.024 0.023 0.024
$ inv_metric :List of 4
..$ 1: num 0.42
..$ 2: num 0.491
..$ 3: num 0.51
..$ 4: num 0.49
$ step_size :List of 4
..$ 1: num 1.07
..$ 2: num 0.891
..$ 3: num 1.04
..$ 4: num 1.13
$ warmup_draws : NULL
$ post_warmup_draws : 'draws_array' num [1:1000, 1:4, 1:2] -7.37 -7.98 -8.92 -8.37 -6.96 ...
..- attr(*, "dimnames")=List of 3
.. ..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
.. ..$ chain : chr [1:4] "1" "2" "3" "4"
.. ..$ variable : chr [1:2] "lp__" "theta"
$ warmup_sampler_diagnostics : NULL
$ post_warmup_sampler_diagnostics: 'draws_array' num [1:1000, 1:4, 1:6] 0.819 0.915 0.905 1 0.991 ...
..- attr(*, "dimnames")=List of 3
.. ..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
.. ..$ chain : chr [1:4] "1" "2" "3" "4"
.. ..$ variable : chr [1:6] "accept_stat__" "stepsize__" "treedepth__" "n_leapfrog__" ...
If you need to manually create fitted model objects from CmdStan CSV
files use as_cmdstan_fit()
.
fit2 <- as_cmdstan_fit(fit$output_files())
This is pointless in our case since we have the original
fit
object, but this function can be used to create fitted
model objects (CmdStanMCMC
, CmdStanMLE
, etc.)
from any CmdStan CSV files.
If save_latent_dynamics
is set to TRUE
when
running the $sample()
method then additional CSV files are
created (one per chain) that provide access to quantities used under the
hood by Stan’s implementation of dynamic Hamiltonian Monte Carlo.
CmdStanR does not yet provide a special method for processing these files but they can be read into R using R’s standard CSV reading functions.
fit <- mod$sample(data = data_list, save_latent_dynamics = TRUE)
fit$latent_dynamics_files()
[1] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-diagnostic-202407021546-1-54a823.csv"
[2] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-diagnostic-202407021546-2-54a823.csv"
[3] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-diagnostic-202407021546-3-54a823.csv"
[4] "/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-diagnostic-202407021546-4-54a823.csv"
# read one of the files in
x <- utils::read.csv(fit$latent_dynamics_files()[1], comment.char = "#")
head(x)
lp__ accept_stat__ stepsize__ treedepth__ n_leapfrog__ divergent__
1 -6.77025 0.987691 0.980824 2 3 0
2 -7.20991 0.932461 0.980824 2 3 0
3 -7.66830 0.933065 0.980824 1 1 0
4 -7.37078 1.000000 0.980824 1 1 0
5 -7.59961 0.948892 0.980824 2 3 0
6 -7.95801 0.940559 0.980824 2 3 0
energy__ theta p_theta g_theta
1 7.08007 -0.959636 -1.160790 0.323414
2 7.21726 -1.779880 0.178790 -1.268180
3 7.67261 -2.089350 -0.136971 -1.678370
4 7.66302 -1.898700 -1.127370 -1.436940
5 8.20617 -2.047820 -1.624180 -1.628720
6 8.67205 -2.252980 1.762220 -1.858880
The column lp__
is also provided via
fit$draws()
, and the columns accept_stat__
,
stepsize__
, treedepth__
,
n_leapfrog__
, divergent__
, and
energy__
are also provided by
fit$sampler_diagnostics()
, but there are several columns
unique to the latent dynamics file.
theta p_theta g_theta
1 -0.959636 -1.160790 0.323414
2 -1.779880 0.178790 -1.268180
3 -2.089350 -0.136971 -1.678370
4 -1.898700 -1.127370 -1.436940
5 -2.047820 -1.624180 -1.628720
6 -2.252980 1.762220 -1.858880
Our model has a single parameter theta
and the three
columns above correspond to theta
in the
unconstrained space (theta
on the constrained
space is accessed via fit$draws()
), the auxiliary momentum
p_theta
, and the gradient g_theta
. In general,
each of these three columns will exist for every parameter in
the model.
CmdStanR can of course be used for developing other packages that require compiling and running Stan models as well as using new or custom Stan features available through CmdStan.
You may compile a Stan model at runtime (e.g. just before sampling), or you may compile all the models inside the package file system in advance at installation time. The latter avoids compilations at runtime, which matters in centrally managed R installations where users should not compile their own software.
To pre-compile all the models in a package, you may create top-level
scripts configure
and configure.win
which run
cmdstan_model()
with compile = TRUE
and save
the compiled executables somewhere inside the inst/
folder
of the package source. The instantiate
package helps developers configure packages this way, and it documents
other topics such as submitting to CRAN and administering CmdStan. Kevin
Ushey’s configure
package helps create and manage package configuration files in
general.
When developing or testing new features it might be useful to have
more information on how CmdStan is called internally and to see more
information printed when compiling or running models. This can be
enabled for an entire R session by setting the option
"cmdstanr_verbose"
to TRUE
.
options("cmdstanr_verbose"=TRUE)
mod <- cmdstan_model(stan_file, force_recompile = TRUE)
Running make \
/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326 \
"STANCFLAGS += --name='bernoulli_model'"
--- Translating Stan model to C++ code ---
bin/stanc --name='bernoulli_model' --o=/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.hpp /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.stan
--- Compiling C++ code ---
clang++ -Wno-deprecated-declarations -Wno-deprecated-declarations -std=c++17 -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes -I stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.84.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -c -include-pch stan/src/stan/model/model_header.hpp.gch/model_header_15_0.hpp.gch -x c++ -o /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.o /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.hpp
--- Linking model ---
clang++ -Wno-deprecated-declarations -Wno-deprecated-declarations -std=c++17 -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare -D_REENTRANT -Wno-ignored-attributes -I stan/lib/stan_math/lib/tbb_2020.3/include -O3 -I src -I stan/src -I stan/lib/rapidjson_1.1.0/ -I lib/CLI11-1.9.1/ -I stan/lib/stan_math/ -I stan/lib/stan_math/lib/eigen_3.4.0 -I stan/lib/stan_math/lib/boost_1.84.0 -I stan/lib/stan_math/lib/sundials_6.1.1/include -I stan/lib/stan_math/lib/sundials_6.1.1/src/sundials -DBOOST_DISABLE_ASSERTS -Wl,-L,"/Users/jgabry/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/lib/tbb" -Wl,-rpath,"/Users/jgabry/.cmdstan/cmdstan-2.35.0/stan/lib/stan_math/lib/tbb" /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.o src/cmdstan/main.o -ltbb stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_nvecserial.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_cvodes.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_idas.a stan/lib/stan_math/lib/sundials_6.1.1/lib/libsundials_kinsol.a stan/lib/stan_math/lib/tbb/libtbb.dylib stan/lib/stan_math/lib/tbb/libtbbmalloc.dylib stan/lib/stan_math/lib/tbb/libtbbmalloc_proxy.dylib -o /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326
rm /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.hpp /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/model-9a14560da326.o
fit <- mod$sample(
data = data_list,
chains = 1,
iter_warmup = 100,
iter_sampling = 100
)
Running MCMC with 1 chain...
Running ./bernoulli 'id=1' random 'seed=707272264' data \
'file=/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/standata-9a144d3dc244.json' \
output \
'file=/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-202407021546-1-38ac27.csv' \
'profile_file=/var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-profile-202407021546-1-910909.csv' \
'method=sample' 'num_samples=100' 'num_warmup=100' 'save_warmup=0' \
'algorithm=hmc' 'engine=nuts' adapt 'engaged=1'
Chain 1 method = sample (Default)
Chain 1 sample
Chain 1 num_samples = 100
Chain 1 num_warmup = 100
Chain 1 save_warmup = false (Default)
Chain 1 thin = 1 (Default)
Chain 1 adapt
Chain 1 engaged = true (Default)
Chain 1 gamma = 0.05 (Default)
Chain 1 delta = 0.8 (Default)
Chain 1 kappa = 0.75 (Default)
Chain 1 t0 = 10 (Default)
Chain 1 init_buffer = 75 (Default)
Chain 1 term_buffer = 50 (Default)
Chain 1 window = 25 (Default)
Chain 1 save_metric = false (Default)
Chain 1 algorithm = hmc (Default)
Chain 1 hmc
Chain 1 engine = nuts (Default)
Chain 1 nuts
Chain 1 max_depth = 10 (Default)
Chain 1 metric = diag_e (Default)
Chain 1 metric_file = (Default)
Chain 1 stepsize = 1 (Default)
Chain 1 stepsize_jitter = 0 (Default)
Chain 1 num_chains = 1 (Default)
Chain 1 id = 1 (Default)
Chain 1 data
Chain 1 file = /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/standata-9a144d3dc244.json
Chain 1 init = 2 (Default)
Chain 1 random
Chain 1 seed = 707272264
Chain 1 output
Chain 1 file = /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-202407021546-1-38ac27.csv
Chain 1 diagnostic_file = (Default)
Chain 1 refresh = 100 (Default)
Chain 1 sig_figs = -1 (Default)
Chain 1 profile_file = /var/folders/s0/zfzm55px2nd2v__zlw5xfj2h0000gn/T/Rtmp2hQnhs/bernoulli-profile-202407021546-1-910909.csv
Chain 1 save_cmdstan_config = false (Default)
Chain 1 num_threads = 1 (Default)
Chain 1 Gradient evaluation took 1.8e-05 seconds
Chain 1 1000 transitions using 10 leapfrog steps per transition would take 0.18 seconds.
Chain 1 Adjust your expectations accordingly!
Chain 1 WARNING: There aren't enough warmup iterations to fit the
Chain 1 three stages of adaptation as currently configured.
Chain 1 Reducing each adaptation stage to 15%/75%/10% of
Chain 1 the given number of warmup iterations:
Chain 1 init_buffer = 15
Chain 1 adapt_window = 75
Chain 1 term_buffer = 10
Chain 1 Iteration: 1 / 200 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 200 [ 50%] (Warmup)
Chain 1 Iteration: 101 / 200 [ 50%] (Sampling)
Chain 1 Iteration: 200 / 200 [100%] (Sampling)
Chain 1 Elapsed Time: 0.001 seconds (Warm-up)
Chain 1 0.001 seconds (Sampling)
Chain 1 0.002 seconds (Total)
Chain 1 finished in 0.0 seconds.