API Reference¶
The following documents the public API of CmdStanPy. It is expected to be stable between versions, with backwards compatibility between minor versions and deprecation warnings preceding breaking changes. The documentation for the internal API is also provided, but the internal API does not guarantee either stability and backwards compatibility.
Classes¶
CmdStanModel¶
A CmdStanModel object encapsulates the Stan program. It manages program compilation and provides the following inference methods:
sample()
runs the HMC-NUTS sampler to produce a set of draws from the posterior distribution.
optimize()
produce a penalized maximum likelihood estimate or maximum a posteriori estimate (point estimate) of the model parameters.
laplace_sample()
draw from a Laplace approximatation centered at the posterior mode found by
optimize
.pathfinder()
runs the Pathfinder variational inference parameters to recieve approximate draws from the posterior.
variational()
run CmdStan’s automatic differentiation variational inference (ADVI) algorithm to approximate the posterior distribution.
generate_quantities()
runs CmdStan’s generate_quantities method to produce additional quantities of interest based on draws from an existing sample.
- class cmdstanpy.CmdStanModel(model_name=None, stan_file=None, exe_file=None, force_compile=False, stanc_options=None, cpp_options=None, user_header=None, *, compile=None)[source]¶
The constructor method allows model instantiation given either the Stan program source file or the compiled executable, or both. This will compile the model if provided a Stan file and no executable,
- Parameters:
model_name (Optional[str]) – Model name, used for output file names. Optional, default is the base filename of the Stan program file. Deprecated: In version 2.0.0, model name cannot be specified and will always be taken from executable.
stan_file (Optional[Union[str, PathLike]]) – Path to Stan program file.
exe_file (Optional[Union[str, PathLike]]) – Path to compiled executable file. Optional, unless no Stan program file is specified. If both the program file and the compiled executable file are specified, the base filenames must match, (but different directory locations are allowed).
force_compile (bool) – If
True
, always compile, even if there is an existing executable file for this model.stanc_options (Optional[Dict[str, Any]]) – Options for stanc compiler, specified as a Python dictionary containing Stanc3 compiler option name, value pairs. Optional.
cpp_options (Optional[Dict[str, Any]]) – Options for C++ compiler, specified as a Python dictionary containing C++ compiler option name, value pairs. Optional.
user_header (Optional[Union[str, PathLike]]) – A path to a header file to include during C++ compilation. Optional.
compile (Optional[Union[bool, Literal['force']]]) – Whether or not to compile the model. Default is
True
. If set to the string"force"
, it will always compile even if an existing executable is found. Deprecated: Useforce_compile
instead. The ability to instantiate a CmdStanModel without an executable will be removed in version 2.0.0.
- compile(force=False, stanc_options=None, cpp_options=None, user_header=None, override_options=False, *, _internal=False)[source]¶
Deprecated: To compile a model, use the
CmdStanModel
constructor orcmdstanpy.compile_stan_file()
.Compile the given Stan program file. Translates the Stan code to C++, then calls the C++ compiler.
By default, this function compares the timestamps on the source and executable files; if the executable is newer than the source file, it will not recompile the file, unless argument
force
isTrue
or unless the compiler options have been changed.- Parameters:
force (bool) – When
True
, always compile, even if the executable file is newer than the source file. Used for Stan models which have#include
directives in order to force recompilation when changes are made to the included files.stanc_options (Optional[Dict[str, Any]]) – Options for stanc compiler.
cpp_options (Optional[Dict[str, Any]]) – Options for C++ compiler.
user_header (Optional[Union[str, PathLike]]) – A path to a header file to include during C++ compilation.
override_options (bool) – When
True
, override existing option. WhenFalse
, add/replace existing options. Default isFalse
._internal (bool) –
- Return type:
None
- exe_info()[source]¶
Run model with option ‘info’. Parse output statements, which all have form ‘key = value’ into a Dict. If exe file compiled with CmdStan < 2.27, option ‘info’ isn’t available and the method returns an empty dictionary.
- format(overwrite_file=False, canonicalize=False, max_line_length=78, *, backup=True)[source]¶
Deprecated: Use
cmdstanpy.format_stan_file()
instead.Run stanc’s auto-formatter on the model code. Either saves directly back to the file or prints for inspection
- Parameters:
overwrite_file (bool) – If True, save the updated code to disk, rather than printing it. By default False
canonicalize (Union[bool, str, Iterable[str]]) – Whether or not the compiler should ‘canonicalize’ the Stan model, removing things like deprecated syntax. Default is False. If True, all canonicalizations are run. If it is a list of strings, those options are passed to stanc (new in Stan 2.29)
max_line_length (int) – Set the wrapping point for the formatter. The default value is 78, which wraps most lines by the 80th character.
backup (bool) – If True, create a stanfile.bak backup before writing to the file. Only disable this if you’re sure you have other copies of the file or are using a version control system like Git.
- Return type:
None
- generate_quantities(data=None, previous_fit=None, seed=None, gq_output_dir=None, sig_figs=None, show_console=False, refresh=None, time_fmt='%Y%m%d%H%M%S', timeout=None, *, mcmc_sample=None)[source]¶
Run CmdStan’s generate_quantities method which runs the generated quantities block of a model given an existing sample.
This function takes one of the Stan fit objects
CmdStanMCMC
,CmdStanMLE
, orCmdStanVB
and the data required for the model and calls to the CmdStangenerate_quantities
method to generate additional quantities of interest.The
CmdStanGQ
object records the command, the return code, and the paths to the generate method output CSV and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.Output files are either written to a temporary directory or to the specified output directory. Output filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.
- Parameters:
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
previous_fit (Optional[Union[Fit, List[str]]]) – Can be either a
CmdStanMCMC
,CmdStanMLE
, orCmdStanVB
or a list of stan-csv files generated by fitting the model to the data using any Stan interface.seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified,
numpy.random.default_rng()
is used to generate a seed which will be used for all chains. NOTE: Specifying the seed will guarantee the same result for multiple invocations of this method with the same inputs. However this will not reproduce results from the sample method given the same inputs because the RNG will be in a different state.gq_output_dir (Optional[Union[str, PathLike]]) – Name of the directory in which the CmdStan output files are saved. If unspecified, files will be written to a temporary directory which is deleted upon session exit.
sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.
show_console (bool) – If
True
, stream CmdStan messages sent to stdout and stderr to the console. Default isFalse
.refresh (Optional[int]) – Specify the number of iterations CmdStan will take between progress messages. Default value is 100.
time_fmt (str) – A format string passed to
strftime()
to decide the file names for output CSVs. Defaults to “%Y%m%d%H%M%S”timeout (Optional[float]) – Duration at which generation times out in seconds.
mcmc_sample (Optional[Union[CmdStanMCMC, List[str]]]) –
- Returns:
CmdStanGQ object
- Return type:
CmdStanGQ[Fit]
- laplace_sample(data=None, mode=None, draws=None, *, jacobian=True, seed=None, output_dir=None, sig_figs=None, save_profile=False, show_console=False, refresh=None, time_fmt='%Y%m%d%H%M%S', timeout=None, opt_args=None)[source]¶
Run a Laplace approximation around the posterior mode.
- Parameters:
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
mode (Optional[Union[CmdStanMLE, str, PathLike]]) –
The mode around which to place the approximation, either
A
CmdStanMLE
objectA path to a CSV file containing the output of an optimization run.
None
- use default optimizer settings and/or anyopt_args
.
draws (Optional[int]) – Number of approximate draws to return. Defaults to 1000
jacobian (bool) – Whether or not to enable the Jacobian adjustment for constrained parameters. Defaults to
True
. Note: This must match the argument used in the creation ofmode
, if supplied.output_dir (Optional[Union[str, PathLike]]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.
sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.
save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If
True
, CSV outputs are written to file ‘<model_name>-<YYYYMMDDHHMM>-profile-<path_id>’. Introduced in CmdStan-2.26, see https://mc-stan.org/docs/cmdstan-guide/stan_csv.html, section “Profiling CSV output file” for details.show_console (bool) – If
True
, stream CmdStan messages sent to stdout and stderr to the console. Default isFalse
.refresh (Optional[int]) – Specify the number of iterations CmdStan will take between progress messages. Default value is 100.
time_fmt (str) – A format string passed to
strftime()
to decide the file names for output CSVs. Defaults to “%Y%m%d%H%M%S”timeout (Optional[float]) – Duration at which Pathfinder times out in seconds. Defaults to None.
opt_args (Optional[Dict[str, Any]]) – Dictionary of additional arguments which will be passed to
optimize()
- Returns:
A
CmdStanLaplace
object.- Return type:
- log_prob(params, data=None, *, jacobian=True, sig_figs=None)[source]¶
Calculate the log probability and gradient at the given parameter values.
Note
This function is NOT an efficient way to evaluate the log density of the model. It should be used for diagnostics ONLY. Please, do not use this for other purposes such as testing new sampling algorithms!
- Parameters:
params (Union[Dict[str, Any], str, PathLike]) –
Values for all parameters in the model, specified either as a dictionary with entries matching the parameter variables, or as the path of a data file in JSON or Rdump format.
These should be given on the constrained (natural) scale.
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
jacobian (bool) – Whether or not to enable the Jacobian adjustment for constrained parameters. Defaults to
True
.sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6.
- Returns:
A pandas.DataFrame containing columns “lp__” and additional columns for the gradient values. These gradients will be for the unconstrained parameters of the model.
- Return type:
- optimize(data=None, seed=None, inits=None, output_dir=None, sig_figs=None, save_profile=False, algorithm=None, init_alpha=None, tol_obj=None, tol_rel_obj=None, tol_grad=None, tol_rel_grad=None, tol_param=None, history_size=None, iter=None, save_iterations=False, require_converged=True, show_console=False, refresh=None, time_fmt='%Y%m%d%H%M%S', timeout=None, jacobian=False)[source]¶
Run the specified CmdStan optimize algorithm to produce a penalized maximum likelihood estimate of the model parameters.
This function validates the specified configuration, composes a call to the CmdStan
optimize
method and spawns one subprocess to run the optimizer and waits for it to run to completion. Unspecified arguments are not included in the call to CmdStan, i.e., those arguments will have CmdStan default values.The
CmdStanMLE
object records the command, the return code, and the paths to the optimize method output CSV and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.Output files are either written to a temporary directory or to the specified output directory. Output filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.
- Parameters:
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified,
numpy.random.default_rng()
is used to generate a seed.inits (Optional[Union[Mapping[str, Any], float, str, PathLike]]) –
Specifies how the sampler initializes parameter values. Initialization is either uniform random on a range centered on 0, exactly 0, or a dictionary or file of initial values for some or all parameters in the model. The default initialization behavior will initialize all parameter values on range [-2, 2] on the unconstrained support. If the expected parameter values are too far from this range, this option may improve estimation. The following value types are allowed:
Single number, n > 0 - initialization range is [-n, n].
0 - all parameters are initialized to 0.
dictionary - pairs parameter name : initial value.
string - pathname to a JSON or Rdump data file.
output_dir (Optional[Union[str, PathLike]]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.
sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.
save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If
True
, CSV outputs are written to file ‘<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>’. Introduced in CmdStan-2.26.algorithm (Optional[str]) – Algorithm to use. One of: ‘BFGS’, ‘LBFGS’, ‘Newton’
init_alpha (Optional[float]) – Line search step size for first iteration
tol_obj (Optional[float]) – Convergence tolerance on changes in objective function value
tol_rel_obj (Optional[float]) – Convergence tolerance on relative changes in objective function value
tol_grad (Optional[float]) – Convergence tolerance on the norm of the gradient
tol_rel_grad (Optional[float]) – Convergence tolerance on the relative norm of the gradient
tol_param (Optional[float]) – Convergence tolerance on changes in parameter value
history_size (Optional[int]) – Size of the history for LBFGS Hessian approximation. The value should be less than the dimensionality of the parameter space. 5-10 usually sufficient
save_iterations (bool) – When
True
, save intermediate approximations to the output CSV file. Default isFalse
.require_converged (bool) – Whether or not to raise an error if Stan reports that “The algorithm may not have converged”.
show_console (bool) – If
True
, stream CmdStan messages sent to stdout and stderr to the console. Default isFalse
.refresh (Optional[int]) – Specify the number of iterations cmdstan will take between progress messages. Default value is 100.
time_fmt (str) – A format string passed to
strftime()
to decide the file names for output CSVs. Defaults to “%Y%m%d%H%M%S”timeout (Optional[float]) – Duration at which optimization times out in seconds.
jacobian (bool) – Whether or not to use the Jacobian adjustment for constrained variables in optimization. By default this is false, meaning optimization yields the Maximum Likehood Estimate (MLE). Setting it to true yields the Maximum A Posteriori Estimate (MAP).
- Returns:
CmdStanMLE object
- Return type:
- pathfinder(data=None, *, init_alpha=None, tol_obj=None, tol_rel_obj=None, tol_grad=None, tol_rel_grad=None, tol_param=None, history_size=None, num_paths=None, max_lbfgs_iters=None, draws=None, num_single_draws=None, num_elbo_draws=None, psis_resample=True, calculate_lp=True, seed=None, inits=None, output_dir=None, sig_figs=None, save_profile=False, show_console=False, refresh=None, time_fmt='%Y%m%d%H%M%S', timeout=None, num_threads=None)[source]¶
Run CmdStan’s Pathfinder variational inference algorithm.
- Parameters:
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
num_paths (Optional[int]) – Number of single-path Pathfinders to run. Default is 4, when the number of paths is 1 then no importance sampling is done.
draws (Optional[int]) – Number of approximate draws to return.
num_single_draws (Optional[int]) – Number of draws each single-pathfinder will draw. If
num_paths
is 1, only one of this anddraws
should be used.max_lbfgs_iters (Optional[int]) – Maximum number of L-BFGS iterations.
num_elbo_draws (Optional[int]) – Number of Monte Carlo draws to evaluate ELBO.
psis_resample (bool) – Whether or not to use Pareto Smoothed Importance Sampling on the result of the individual Pathfinders. If False, the result contains the draws from each path.
calculate_lp (bool) – Whether or not to calculate the log probability for approximate draws. If False, this also implies that
psis_resample
will be set to False.seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified,
numpy.random.default_rng()
is used to generate a seed.inits (Optional[Union[Dict[str, float], float, str, PathLike]]) –
Specifies how the algorithm initializes parameter values. Initialization is either uniform random on a range centered on 0, exactly 0, or a dictionary or file of initial values for some or all parameters in the model. The default initialization behavior will initialize all parameter values on range [-2, 2] on the unconstrained support. If the expected parameter values are too far from this range, this option may improve adaptation. The following value types are allowed:
Single number n > 0 - initialization range is [-n, n].
0 - all parameters are initialized to 0.
dictionary - pairs parameter name : initial value.
string - pathname to a JSON or Rdump data file.
list of strings - per-path pathname to data file.
list of dictionaries - per-path initial values.
init_alpha (Optional[float]) – For internal L-BFGS: Line search step size for first iteration
tol_obj (Optional[float]) – For internal L-BFGS: Convergence tolerance on changes in objective function value
tol_rel_obj (Optional[float]) – For internal L-BFGS: Convergence tolerance on relative changes in objective function value
tol_grad (Optional[float]) – For internal L-BFGS: Convergence tolerance on the norm of the gradient
tol_rel_grad (Optional[float]) – For internal L-BFGS: Convergence tolerance on the relative norm of the gradient
tol_param (Optional[float]) – For internal L-BFGS: Convergence tolerance on changes in parameter value
history_size (Optional[int]) – For internal L-BFGS: Size of the history for LBFGS Hessian approximation. The value should be less than the dimensionality of the parameter space. 5-10 is usually sufficient
output_dir (Optional[Union[str, PathLike]]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.
sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.
save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If
True
, CSV outputs are written to file ‘<model_name>-<YYYYMMDDHHMM>-profile-<path_id>’. Introduced in CmdStan-2.26, see https://mc-stan.org/docs/cmdstan-guide/stan_csv.html, section “Profiling CSV output file” for details.show_console (bool) – If
True
, stream CmdStan messages sent to stdout and stderr to the console. Default isFalse
.refresh (Optional[int]) – Specify the number of iterations CmdStan will take between progress messages. Default value is 100.
time_fmt (str) – A format string passed to
strftime()
to decide the file names for output CSVs. Defaults to “%Y%m%d%H%M%S”timeout (Optional[float]) – Duration at which Pathfinder times out in seconds. Defaults to None.
num_threads (Optional[int]) – Number of threads to request for parallel execution. A number other than
1
requires the model to have been compiled with STAN_THREADS=True.
- Returns:
A
CmdStanPathfinder
object- Return type:
References
Zhang, L., Carpenter, B., Gelman, A., & Vehtari, A. (2022). Pathfinder: Parallel quasi-Newton variational inference. Journal of Machine Learning Research, 23(306), 1–49. Retrieved from http://jmlr.org/papers/v23/21-0889.html
- sample(data=None, chains=None, parallel_chains=None, threads_per_chain=None, seed=None, chain_ids=None, inits=None, iter_warmup=None, iter_sampling=None, save_warmup=False, thin=None, max_treedepth=None, metric=None, step_size=None, adapt_engaged=True, adapt_delta=None, adapt_init_phase=None, adapt_metric_window=None, adapt_step_size=None, fixed_param=False, output_dir=None, sig_figs=None, save_latent_dynamics=False, save_profile=False, show_progress=True, show_console=False, refresh=None, time_fmt='%Y%m%d%H%M%S', timeout=None, *, force_one_process_per_chain=None)[source]¶
Run or more chains of the NUTS-HMC sampler to produce a set of draws from the posterior distribution of a model conditioned on some data.
This function validates the specified configuration, composes a call to the CmdStan
sample
method and spawns one subprocess per chain to run the sampler and waits for all chains to run to completion. Unspecified arguments are not included in the call to CmdStan, i.e., those arguments will have CmdStan default values.For each chain, the
CmdStanMCMC
object records the command, the return code, the sampler output file paths, and the corresponding console outputs, if any. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.Output files are either written to a temporary directory or to the specified output directory. Ouput filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.
- Parameters:
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
chains (Optional[int]) – Number of sampler chains, must be a positive integer.
parallel_chains (Optional[int]) – Number of processes to run in parallel. Must be a positive integer. Defaults to
multiprocessing.cpu_count()
, i.e., it will only run as many chains in parallel as there are cores on the machine. Note that CmdStan 2.28 and higher can run all chains in parallel providing that the model was compiled with threading support.threads_per_chain (Optional[int]) – The number of threads to use in parallelized sections within an MCMC chain (e.g., when using the Stan functions
reduce_sum()
ormap_rect()
). This will only have an effect if the model was compiled with threading support. For such models, CmdStan version 2.28 and higher will run all chains in parallel from within a single process. The total number of threads used will beparallel_chains * threads_per_chain
, where the default value for parallel_chains is the number of cpus, not chains.seed (Optional[Union[int, List[int]]]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified,
numpy.random.default_rng()
is used to generate a seed which will be used for all chains. When the same seed is used across all chains, the chain-id is used to advance the RNG to avoid dependent samples.chain_ids (Optional[Union[int, List[int]]]) – The offset for the random number generator, either an integer or a list of unique per-chain offsets. If unspecified, chain ids are numbered sequentially starting from 1.
inits (Optional[Union[Mapping[str, Any], float, str, List[str], List[Mapping[str, Any]]]]) –
Specifies how the sampler initializes parameter values. Initialization is either uniform random on a range centered on 0, exactly 0, or a dictionary or file of initial values for some or all parameters in the model. The default initialization behavior will initialize all parameter values on range [-2, 2] on the unconstrained support. If the expected parameter values are too far from this range, this option may improve adaptation. The following value types are allowed:
Single number n > 0 - initialization range is [-n, n].
0 - all parameters are initialized to 0.
dictionary - pairs parameter name : initial value.
string - pathname to a JSON or Rdump data file.
list of strings - per-chain pathname to data file.
list of dictionaries - per-chain initial values.
iter_warmup (Optional[int]) – Number of warmup iterations for each chain.
iter_sampling (Optional[int]) – Number of draws from the posterior for each chain.
save_warmup (bool) – When
True
, sampler saves warmup draws as part of the Stan CSV output file.thin (Optional[int]) – Period between recorded iterations. Default is 1, i.e., all iterations are recorded.
max_treedepth (Optional[int]) – Maximum depth of trees evaluated by NUTS sampler per iteration.
metric (Optional[Union[str, Dict[str, Any], List[str], List[Dict[str, Any]]]]) –
Specification of the mass matrix, either as a vector consisting of the diagonal elements of the covariance matrix (‘diag’ or ‘diag_e’) or the full covariance matrix (‘dense’ or ‘dense_e’).
If the value of the metric argument is a string other than ‘diag’, ‘diag_e’, ‘dense’, or ‘dense_e’, it must be a valid filepath to a JSON or Rdump file which contains an entry ‘inv_metric’ whose value is either the diagonal vector or the full covariance matrix.
If the value of the metric argument is a list of paths, its length must match the number of chains and all paths must be unique.
If the value of the metric argument is a Python dict object, it must contain an entry ‘inv_metric’ which specifies either the diagnoal or dense matrix.
If the value of the metric argument is a list of Python dicts, its length must match the number of chains and all dicts must containan entry ‘inv_metric’ and all ‘inv_metric’ entries must have the same shape.
step_size (Optional[Union[float, List[float]]]) – Initial step size for HMC sampler. The value is either a single number or a list of numbers which will be used as the global or per-chain initial step size, respectively. The length of the list of step sizes must match the number of chains.
adapt_engaged (bool) – When
True
, adapt step size and metric.adapt_delta (Optional[float]) – Adaptation target Metropolis acceptance rate. The default value is 0.8. Increasing this value, which must be strictly less than 1, causes adaptation to use smaller step sizes which improves the effective sample size, but may increase the time per iteration.
adapt_init_phase (Optional[int]) – Iterations for initial phase of adaptation during which step size is adjusted so that the chain converges towards the typical set.
adapt_metric_window (Optional[int]) – The second phase of adaptation tunes the metric and step size in a series of intervals. This parameter specifies the number of iterations used for the first tuning interval; window size increases for each subsequent interval.
adapt_step_size (Optional[int]) – Number of iterations given over to adjusting the step size given the tuned metric during the final phase of adaptation.
fixed_param (bool) – When
True
, call CmdStan with argumentalgorithm=fixed_param
which runs the sampler without updating the Markov Chain, thus the values of all parameters and transformed parameters are constant across all draws and only those values in the generated quantities block that are produced by RNG functions may change. This provides a way to use Stan programs to generate simulated data via the generated quantities block. Default value isFalse
.output_dir (Optional[Union[str, PathLike]]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.
sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.
save_latent_dynamics (bool) – Whether or not to output the position and momentum information for the model parameters (unconstrained). If
True
, CSV outputs are written to an output file ‘<model_name>-<YYYYMMDDHHMM>-diagnostic-<chain_id>’, e.g. ‘bernoulli-201912081451-diagnostic-1.csv’, see https://mc-stan.org/docs/cmdstan-guide/stan_csv.html, section “Diagnostic CSV output file” for details.save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If
True
, CSV outputs are written to file ‘<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>’. Introduced in CmdStan-2.26, see https://mc-stan.org/docs/cmdstan-guide/stan_csv.html, section “Profiling CSV output file” for details.show_progress (bool) – If
True
, display progress bar to track progress for warmup and sampling iterations. Default isTrue
, unless package tqdm progress bar encounter errors.show_console (bool) – If
True
, stream CmdStan messages sent to stdout and stderr to the console. Default isFalse
.refresh (Optional[int]) – Specify the number of iterations CmdStan will take between progress messages. Default value is 100.
time_fmt (str) – A format string passed to
strftime()
to decide the file names for output CSVs. Defaults to “%Y%m%d%H%M%S”force_one_process_per_chain (Optional[bool]) –
If
True
, run multiple chains in distinct processes regardless of model ability to run parallel chains (CmdStan 2.28+ feature). IfFalse
, always run multiple chains in one process (does not check that this is valid).If None (Default): Check that CmdStan version is >=2.28, and that model was compiled with STAN_THREADS=True, and utilize the parallel chain functionality if those conditions are met.
timeout (Optional[float]) – Duration at which sampling times out in seconds.
- Returns:
CmdStanMCMC object
- Return type:
- src_info()[source]¶
Run stanc with option ‘–info’.
If stanc is older than 2.27 or if the stan file cannot be found, returns an empty dictionary.
- variational(data=None, seed=None, inits=None, output_dir=None, sig_figs=None, save_latent_dynamics=False, save_profile=False, algorithm=None, iter=None, grad_samples=None, elbo_samples=None, eta=None, adapt_engaged=True, adapt_iter=None, tol_rel_obj=None, eval_elbo=None, draws=None, require_converged=True, show_console=False, refresh=None, time_fmt='%Y%m%d%H%M%S', timeout=None, *, output_samples=None)[source]¶
Run CmdStan’s variational inference algorithm to approximate the posterior distribution of the model conditioned on the data.
This function validates the specified configuration, composes a call to the CmdStan
variational
method and spawns one subprocess to run the optimizer and waits for it to run to completion. Unspecified arguments are not included in the call to CmdStan, i.e., those arguments will have CmdStan default values.The
CmdStanVB
object records the command, the return code, and the paths to the variational method output CSV and console files. The output files are written either to a specified output directory or to a temporary directory which is deleted upon session exit.Output files are either written to a temporary directory or to the specified output directory. Output filenames correspond to the template ‘<model_name>-<YYYYMMDDHHMM>-<chain_id>’ plus the file suffix which is either ‘.csv’ for the CmdStan output or ‘.txt’ for the console messages, e.g. ‘bernoulli-201912081451-1.csv’. Output files written to the temporary directory contain an additional 8-character random string, e.g. ‘bernoulli-201912081451-1-5nm6as7u.csv’.
- Parameters:
data (Optional[Union[Mapping[str, Any], str, PathLike]]) – Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.
seed (Optional[int]) – The seed for random number generator. Must be an integer between 0 and 2^32 - 1. If unspecified,
numpy.random.default_rng()
is used to generate a seed which will be used for all chains.inits (Optional[float]) – Specifies how the sampler initializes parameter values. Initialization is uniform random on a range centered on 0 with default range of 2. Specifying a single number n > 0 changes the initialization range to [-n, n].
output_dir (Optional[Union[str, PathLike]]) – Name of the directory to which CmdStan output files are written. If unspecified, output files will be written to a temporary directory which is deleted upon session exit.
sig_figs (Optional[int]) – Numerical precision used for output CSV and text files. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. Introduced in CmdStan-2.25.
save_latent_dynamics (bool) – Whether or not to save diagnostics. If
True
, CSV outputs are written to output file ‘<model_name>-<YYYYMMDDHHMM>-diagnostic-<chain_id>’, e.g. ‘bernoulli-201912081451-diagnostic-1.csv’.save_profile (bool) – Whether or not to profile auto-diff operations in labelled blocks of code. If
True
, CSV outputs are written to file ‘<model_name>-<YYYYMMDDHHMM>-profile-<chain_id>’. Introduced in CmdStan-2.26.algorithm (Optional[str]) – Algorithm to use. One of: ‘meanfield’, ‘fullrank’.
grad_samples (Optional[int]) – Number of MC draws for computing the gradient. Default is 10. If problems arise, try doubling current value.
elbo_samples (Optional[int]) – Number of MC draws for estimate of ELBO.
adapt_engaged (bool) – Whether eta adaptation is engaged.
adapt_iter (Optional[int]) – Number of iterations for eta adaptation.
tol_rel_obj (Optional[float]) – Relative tolerance parameter for convergence.
eval_elbo (Optional[int]) – Number of iterations between ELBO evaluations.
draws (Optional[int]) – Number of approximate posterior output draws to save.
require_converged (bool) – Whether or not to raise an error if Stan reports that “The algorithm may not have converged”.
show_console (bool) – If
True
, stream CmdStan messages sent to stdout and stderr to the console. Default isFalse
.refresh (Optional[int]) – Specify the number of iterations CmdStan will take between progress messages. Default value is 100.
time_fmt (str) – A format string passed to
strftime()
to decide the file names for output CSVs. Defaults to “%Y%m%d%H%M%S”timeout (Optional[float]) – Duration at which variational Bayesian inference times out in seconds.
- Returns:
CmdStanVB object
- Return type:
CmdStanMCMC¶
- class cmdstanpy.CmdStanMCMC(runset)[source]¶
Container for outputs from CmdStan sampler run. Provides methods to summarize and diagnose the model fit and accessor methods to access the entire sample or individual items. Created by
CmdStanModel.sample()
The sample is lazily instantiated on first access of either the resulting sample or the HMC tuning parameters, i.e., the step size and metric.
- Parameters:
runset (RunSet) –
- diagnose()[source]¶
Run cmdstan/bin/diagnose over all output CSV files, return console output.
The diagnose utility reads the outputs of all chains and checks for the following potential problems:
Transitions that hit the maximum treedepth
Divergent transitions
Low E-BFMI values (sampler transitions HMC potential energy)
Low effective sample sizes
High R-hat values
- draws(*, inc_warmup=False, concat_chains=False)[source]¶
Returns a numpy.ndarray over all draws from all chains which is stored column major so that the values for a parameter are contiguous in memory, likewise all draws from a chain are contiguous. By default, returns a 3D array arranged (draws, chains, columns); parameter
concat_chains=True
will return a 2D array where all chains are flattened into a single column, preserving chain order, so that given M chains of N draws, the first N draws are from chain 1, up through the last N draws from chain M.- Parameters:
inc_warmup (bool) – When
True
and the warmup draws are present in the output, i.e., the sampler was run withsave_warmup=True
, then the warmup draws are included. Default value isFalse
.concat_chains (bool) – When
True
return a 2D array flattening all all draws from all chains. Default value isFalse
.
- Return type:
- draws_pd(vars=None, inc_warmup=False)[source]¶
Returns the sample draws as a pandas DataFrame. Flattens all chains into single column. Container variables (array, vector, matrix) will span multiple columns, one column per element. E.g. variable ‘matrix[2,2] foo’ spans 4 columns: ‘foo[1,1], … foo[2,2]’.
- method_variables()[source]¶
Returns a dictionary of all sampler variables, i.e., all output column names ending in __. Assumes that all variables are scalar variables where column name is variable name. Maps each column name to a numpy.ndarray (draws x chains x 1) containing per-draw diagnostic values.
- save_csvfiles(dir=None)[source]¶
Move output CSV files to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.
- stan_variable(var, inc_warmup=False)[source]¶
Return a numpy.ndarray which contains the set of draws for the named Stan program variable. Flattens the chains, leaving the draws in chain order. The first array dimension, corresponds to number of draws or post-warmup draws in the sample, per argument
inc_warmup
. The remaining dimensions correspond to the shape of the Stan program variable.Underlyingly draws are in chain order, i.e., for a sample with N chains of M draws each, the first M array elements are from chain 1, the next M are from chain 2, and the last M elements are from chain N.
If the variable is a scalar variable, the return array has shape ( draws * chains, 1).
If the variable is a vector, the return array has shape ( draws * chains, len(vector))
If the variable is a matrix, the return array has shape ( draws * chains, size(dim 1), size(dim 2) )
If the variable is an array with N dimensions, the return array has shape ( draws * chains, size(dim 1), …, size(dim N))
For example, if the Stan program variable
theta
is a 3x3 matrix, and the sample consists of 4 chains with 1000 post-warmup draws, this function will return a numpy.ndarray with shape (4000,3,3).This functionaltiy is also available via a shortcut using
.
- writingfit.a
is a synonym forfit.stan_variable("a")
- stan_variables()[source]¶
Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.
- summary(percentiles=(5, 50, 95), sig_figs=6)[source]¶
Run cmdstan/bin/stansummary over all output CSV files, assemble summary into DataFrame object. The first row contains statistics for the total joint log probability lp__, but is omitted when the Stan model has no parameters. The remaining rows contain summary statistics for all parameters, transformed parameters, and generated quantities variables, in program declaration order.
- Parameters:
percentiles (Sequence[int]) – Ordered non-empty sequence of percentiles to report. Must be integers from (1, 99), inclusive. Defaults to
(5, 50, 95)
sig_figs (int) – Number of significant figures to report. Must be an integer between 1 and 18. If unspecified, the default precision for the system file I/O is used; the usual value is 6. If precision above 6 is requested, sample must have been produced by CmdStan version 2.25 or later and sampler output precision must equal to or greater than the requested summary precision.
- Returns:
pandas.DataFrame
- Return type:
- property column_names: Tuple[str, ...]¶
Names of all outputs from the sampler, comprising sampler parameters and all components of all model parameters, transformed parameters, and quantities of interest. Corresponds to Stan CSV file header row, with names munged to array notation, e.g. beta[1] not beta.1.
- property divergences: Optional[ndarray]¶
Per-chain total number of post-warmup divergent iterations. When sampler algorithm ‘fixed_param’ is specified, returns None.
- property max_treedepths: Optional[ndarray]¶
Per-chain total number of post-warmup iterations where the NUTS sampler reached the maximum allowed treedepth. When sampler algorithm ‘fixed_param’ is specified, returns None.
- property metadata: InferenceMetadata¶
Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.
- property metric: Optional[ndarray]¶
Metric used by sampler for each chain. When sampler algorithm ‘fixed_param’ is specified, metric is None.
- property metric_type: Optional[str]¶
Metric type used for adaptation, either ‘diag_e’ or ‘dense_e’, according to CmdStan arg ‘metric’. When sampler algorithm ‘fixed_param’ is specified, metric_type is None.
- property num_draws_sampling: int¶
Number of sampling (post-warmup) draws per chain, i.e., thinned sampling iterations.
CmdStanMLE¶
- class cmdstanpy.CmdStanMLE(runset)[source]¶
Container for outputs from CmdStan optimization. Created by
CmdStanModel.optimize()
.- Parameters:
runset (RunSet) –
- save_csvfiles(dir=None)[source]¶
Move output CSV files to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.
- stan_variable(var, *, inc_iterations=False, warn=True)[source]¶
Return a numpy.ndarray which contains the estimates for the for the named Stan program variable where the dimensions of the numpy.ndarray match the shape of the Stan program variable.
This functionaltiy is also available via a shortcut using
.
- writingfit.a
is a synonym forfit.stan_variable("a")
- Parameters:
- Return type:
- stan_variables(inc_iterations=False)[source]¶
Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.
- property column_names: Tuple[str, ...]¶
Names of estimated quantities, includes joint log probability, and all parameters, transformed parameters, and generated quantities.
- property metadata: InferenceMetadata¶
Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.
- property optimized_iterations_np: Optional[ndarray]¶
Returns all saved iterations from the optimizer and final estimate as a numpy.ndarray which contains all optimizer outputs, i.e., the value for lp__ as well as all Stan program variables.
- property optimized_iterations_pd: Optional[DataFrame]¶
Returns all saved iterations from the optimizer and final estimate as a pandas.DataFrame which contains all optimizer outputs, i.e., the value for lp__ as well as all Stan program variables.
- property optimized_params_dict: Dict[str, float64]¶
Returns all estimates from the optimizer, including lp__ as a Python Dict. Only returns estimate from final iteration.
CmdStanLaplace¶
- class cmdstanpy.CmdStanLaplace(runset, mode)[source]¶
- Parameters:
runset (RunSet) –
mode (CmdStanMLE) –
- draws()[source]¶
Return a numpy.ndarray containing the draws from the approximate posterior distribution. This is a 2-D array of shape (draws, parameters).
- Return type:
- draws_xr(vars=None)[source]¶
Returns the sampler draws as a xarray Dataset.
See also
- method_variables()[source]¶
Returns a dictionary of all sampler variables, i.e., all output column names ending in __. Assumes that all variables are scalar variables where column name is variable name. Maps each column name to a numpy.ndarray (draws x chains x 1) containing per-draw diagnostic values.
- save_csvfiles(dir=None)[source]¶
Move output CSV files to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.
- stan_variable(var)[source]¶
Return a numpy.ndarray which contains the estimates for the for the named Stan program variable where the dimensions of the numpy.ndarray match the shape of the Stan program variable.
This functionaltiy is also available via a shortcut using
.
- writingfit.a
is a synonym forfit.stan_variable("a")
- stan_variables()[source]¶
Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.
- property column_names: Tuple[str, ...]¶
Names of all outputs from the sampler, comprising sampler parameters and all components of all model parameters, transformed parameters, and quantities of interest. Corresponds to Stan CSV file header row, with names munged to array notation, e.g. beta[1] not beta.1.
- property metadata: InferenceMetadata¶
Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.
- property mode: CmdStanMLE¶
Return the maximum a posteriori estimate (mode) as a
CmdStanMLE
object.
CmdStanPathfinder¶
- class cmdstanpy.CmdStanPathfinder(runset)[source]¶
Container for outputs from the Pathfinder algorithm. Created by
CmdStanModel.pathfinder()
.- Parameters:
runset (RunSet) –
- create_inits(seed=None, chains=4)[source]¶
Create initial values for the parameters of the model by randomly selecting draws from the Pathfinder approximation.
- Parameters:
- Returns:
The initial values for the parameters of the model.
- Return type:
If
chains
is 1, a dictionary is returned, otherwise a list of dictionaries is returned, in the format expected for theinits
argument. ofCmdStanModel.sample()
.
- draws()[source]¶
Return a numpy.ndarray containing the draws from the approximate posterior distribution. This is a 2-D array of shape (draws, parameters).
- Return type:
- method_variables()[source]¶
Returns a dictionary of all sampler variables, i.e., all output column names ending in __. Assumes that all variables are scalar variables where column name is variable name. Maps each column name to a numpy.ndarray (draws x chains x 1) containing per-draw diagnostic values.
- save_csvfiles(dir=None)[source]¶
Move output CSV files to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.
- stan_variable(var)[source]¶
Return a numpy.ndarray which contains the estimates for the for the named Stan program variable where the dimensions of the numpy.ndarray match the shape of the Stan program variable.
This functionaltiy is also available via a shortcut using
.
- writingfit.a
is a synonym forfit.stan_variable("a")
- stan_variables()[source]¶
Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.
- property column_names: Tuple[str, ...]¶
Names of all outputs from the sampler, comprising sampler parameters and all components of all model parameters, transformed parameters, and quantities of interest. Corresponds to Stan CSV file header row, with names munged to array notation, e.g. beta[1] not beta.1.
- property is_resampled: bool¶
Returns True if the draws were resampled from several Pathfinder approximations, False otherwise.
- property metadata: InferenceMetadata¶
Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.
CmdStanVB¶
- class cmdstanpy.CmdStanVB(runset)[source]¶
Container for outputs from CmdStan variational run. Created by
CmdStanModel.variational()
.- Parameters:
runset (RunSet) –
- save_csvfiles(dir=None)[source]¶
Move output CSV files to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.
- stan_variable(var, *, mean=None)[source]¶
Return a numpy.ndarray which contains the estimates for the for the named Stan program variable where the dimensions of the numpy.ndarray match the shape of the Stan program variable, with a leading axis added for the number of draws from the variational approximation.
If the variable is a scalar variable, the return array has shape ( draws, ).
If the variable is a vector, the return array has shape ( draws, len(vector))
If the variable is a matrix, the return array has shape ( draws, size(dim 1), size(dim 2) )
If the variable is an array with N dimensions, the return array has shape ( draws, size(dim 1), …, size(dim N))
This functionaltiy is also available via a shortcut using
.
- writingfit.a
is a synonym forfit.stan_variable("a")
- stan_variables(*, mean=None)[source]¶
Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.
- property column_names: Tuple[str, ...]¶
Names of information items returned by sampler for each draw. Includes approximation information and names of model parameters and computed quantities.
- property columns: int¶
Total number of information items returned by sampler. Includes approximation information and names of model parameters and computed quantities.
- property metadata: InferenceMetadata¶
Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.
CmdStanGQ¶
- class cmdstanpy.CmdStanGQ(runset, previous_fit)[source]¶
Container for outputs from CmdStan generate_quantities run. Created by
CmdStanModel.generate_quantities()
.- Parameters:
runset (RunSet) –
previous_fit (Fit) –
- draws(*, inc_warmup=False, inc_iterations=False, concat_chains=False, inc_sample=False)[source]¶
Returns a numpy.ndarray over the generated quantities draws from all chains which is stored column major so that the values for a parameter are contiguous in memory, likewise all draws from a chain are contiguous. By default, returns a 3D array arranged (draws, chains, columns); parameter
concat_chains=True
will return a 2D array where all chains are flattened into a single column, preserving chain order, so that given M chains of N draws, the first N draws are from chain 1, …, and the the last N draws are from chain M.- Parameters:
inc_warmup (bool) – When
True
and the warmup draws are present in the output, i.e., the sampler was run withsave_warmup=True
, then the warmup draws are included. Default value isFalse
.concat_chains (bool) – When
True
return a 2D array flattening all all draws from all chains. Default value isFalse
.inc_sample (bool) – When
True
include all columns in the previous_fit draws array as well, excepting columns for variables already present in the generated quantities drawset. Default value isFalse
.inc_iterations (bool) –
- Return type:
- draws_pd(vars=None, inc_warmup=False, inc_sample=False)[source]¶
Returns the generated quantities draws as a pandas DataFrame. Flattens all chains into single column. Container variables (array, vector, matrix) will span multiple columns, one column per element. E.g. variable ‘matrix[2,2] foo’ spans 4 columns: ‘foo[1,1], … foo[2,2]’.
- Parameters:
- Return type:
- draws_xr(vars: Optional[Union[str, List[str]]] = None, inc_warmup: bool = False, inc_sample: bool = False) NoReturn [source]¶
- draws_xr(vars: Optional[Union[str, List[str]]] = None, inc_warmup: bool = False, inc_sample: bool = False) Dataset
Returns the generated quantities draws as a xarray Dataset.
This method can only be called when the underlying fit was made through sampling, it cannot be used on MLE or VB outputs.
- Parameters:
vars – optional list of variable names.
inc_warmup – When
True
and the warmup draws are present in the MCMC sample, then the warmup draws are included. Default value isFalse
.
- save_csvfiles(dir=None)[source]¶
Move output CSV files to specified directory. If files were written to the temporary session directory, clean filename. E.g., save ‘bernoulli-201912081451-1-5nm6as7u.csv’ as ‘bernoulli-201912081451-1.csv’.
- stan_variable(var, **kwargs)[source]¶
Return a numpy.ndarray which contains the set of draws for the named Stan program variable. Flattens the chains, leaving the draws in chain order. The first array dimension, corresponds to number of draws in the sample. The remaining dimensions correspond to the shape of the Stan program variable.
Underlyingly draws are in chain order, i.e., for a sample with N chains of M draws each, the first M array elements are from chain 1, the next M are from chain 2, and the last M elements are from chain N.
If the variable is a scalar variable, the return array has shape ( draws * chains, 1).
If the variable is a vector, the return array has shape ( draws * chains, len(vector))
If the variable is a matrix, the return array has shape ( draws * chains, size(dim 1), size(dim 2) )
If the variable is an array with N dimensions, the return array has shape ( draws * chains, size(dim 1), …, size(dim N))
For example, if the Stan program variable
theta
is a 3x3 matrix, and the sample consists of 4 chains with 1000 post-warmup draws, this function will return a numpy.ndarray with shape (4000,3,3).This functionaltiy is also available via a shortcut using
.
- writingfit.a
is a synonym forfit.stan_variable("a")
- stan_variables(**kwargs)[source]¶
Return a dictionary mapping Stan program variables names to the corresponding numpy.ndarray containing the inferred values.
- property metadata: InferenceMetadata¶
Returns object which contains CmdStan configuration as well as information about the names and structure of the inference method and model output variables.
Functions¶
compile_stan_file¶
- cmdstanpy.compile_stan_file(src, force=False, stanc_options=None, cpp_options=None, user_header=None)[source]¶
Compile the given Stan program file. Translates the Stan code to C++, then calls the C++ compiler.
By default, this function compares the timestamps on the source and executable files; if the executable is newer than the source file, it will not recompile the file, unless argument
force
isTrue
or unless the compiler options have been changed.- Parameters:
force (bool) – When
True
, always compile, even if the executable file is newer than the source file. Used for Stan models which have#include
directives in order to force recompilation when changes are made to the included files.stanc_options (Optional[Dict[str, Any]]) – Options for stanc compiler.
cpp_options (Optional[Dict[str, Any]]) – Options for C++ compiler.
user_header (Optional[Union[str, PathLike]]) – A path to a header file to include during C++ compilation.
- Return type:
format_stan_file¶
- cmdstanpy.format_stan_file(stan_file, *, overwrite_file=False, canonicalize=False, max_line_length=78, backup=True, stanc_options=None)[source]¶
Run stanc’s auto-formatter on the model code. Either saves directly back to the file or prints for inspection
- Parameters:
stan_file (Union[str, PathLike]) – Path to Stan program file.
overwrite_file (bool) – If True, save the updated code to disk, rather than printing it. By default False
canonicalize (Union[bool, str, Iterable[str]]) – Whether or not the compiler should ‘canonicalize’ the Stan model, removing things like deprecated syntax. Default is False. If True, all canonicalizations are run. If it is a list of strings, those options are passed to stanc (new in Stan 2.29)
max_line_length (int) – Set the wrapping point for the formatter. The default value is 78, which wraps most lines by the 80th character.
backup (bool) – If True, create a stanfile.bak backup before writing to the file. Only disable this if you’re sure you have other copies of the file or are using a version control system like Git.
stanc_options (Optional[Dict[str, Any]]) – Additional options to pass to the stanc compiler.
- Return type:
None
show_versions¶
cmdstan_path¶
install_cmdstan¶
- cmdstanpy.install_cmdstan(version=None, dir=None, overwrite=False, compiler=False, progress=False, verbose=False, cores=1, *, interactive=False)[source]¶
Download and install a CmdStan release from GitHub. Downloads the release tar.gz file to temporary storage. Retries GitHub requests in order to allow for transient network outages. Builds CmdStan executables and tests the compiler by building example model
bernoulli.stan
.- Parameters:
version (Optional[str]) – CmdStan version string, e.g. “2.29.2”. Defaults to latest CmdStan release. If
git
is installed, a git tag or branch of stan-dev/cmdstan can be specified, e.g. “git:develop”.dir (Optional[str]) – Path to install directory. Defaults to hidden directory
$HOME/.cmdstan
. If no directory is specified and the above directory does not exist, directory$HOME/.cmdstan
will be created and populated.overwrite (bool) – Boolean value; when
True
, will overwrite and rebuild an existing CmdStan installation. Default isFalse
.compiler (bool) – Boolean value; when
True
on WINDOWS ONLY, use the C++ compiler from theinstall_cxx_toolchain
command or install one if none is found.progress (bool) – Boolean value; when
True
, show a progress bar for downloading and unpacking CmdStan. Default isFalse
.verbose (bool) – Boolean value; when
True
, show console output from all intallation steps, i.e., download, build, and test CmdStan release. Default isFalse
.cores (int) – Integer, number of cores to use in the
make
command. Default is 1 core.interactive (bool) –
Boolean value; if true, ignore all other arguments to this function and run in an interactive mode, prompting the user to provide the other information manually through the standard input.
This flag should only be used in interactive environments, e.g. on the command line.
- Returns:
Boolean value;
True
for success.- Return type:
rebuild_cmdstan¶
set_cmdstan_path¶
cmdstan_version¶
- cmdstanpy.cmdstan_version()[source]¶
Parses version string out of CmdStan makefile variable CMDSTAN_VERSION, returns Tuple(Major, minor).
If CmdStan installation is not found or cannot parse version from makefile logs warning and returns None. Lenient behavoir required for CI tests, per comment: https://github.com/stan-dev/cmdstanpy/pull/321#issuecomment-733817554
set_make_env¶
from_csv¶
- cmdstanpy.from_csv(path=None, method=None)[source]¶
Instantiate a CmdStan object from a the Stan CSV files from a CmdStan run. CSV files are specified from either a list of Stan CSV files or a single filepath which can be either a directory name, a Stan CSV filename, or a pathname pattern (i.e., a Python glob). The optional argument ‘method’ checks that the CSV files were produced by that method. Stan CSV files from CmdStan methods ‘sample’, ‘optimize’, and ‘variational’ result in objects of class CmdStanMCMC, CmdStanMLE, and CmdStanVB, respectively.
- Parameters:
- Returns:
either a CmdStanMCMC, CmdStanMLE, or CmdStanVB object
- Return type:
Optional[Union[CmdStanMCMC, CmdStanMLE, CmdStanVB, CmdStanPathfinder, CmdStanLaplace]]
write_stan_json¶
- cmdstanpy.write_stan_json(path, data)[source]¶
Dump a mapping of strings to data to a JSON file.
Values can be any numeric type, a boolean (converted to int), or any collection compatible with
numpy.asarray()
, e.g apandas.Series
.Produces a file compatible with the Json Format for Cmdstan
- Parameters:
path (str) – File path for the created json. Will be overwritten if already in existence.
data (Mapping[str, Any]) – A mapping from strings to values. This can be a dictionary or something more exotic like an
xarray.Dataset
. This will be copied before type conversion, not modified
- Return type:
None