8 Command-Line Interface Overview
A CmdStan executable is built from the Stan model concept and the CmdStan command line parser. The command line argument syntax consists of sets of keywords and keyword-value pairs. Arguments are grouped by the following keywords:
method
- specifies the kind of inference done on the model.
Each kind of inference requires further configuration via sub-arguments. Themethod
argument is required. It can be specified overtly as the a keyword-value pairmethod=<inference>
or implicitly as one of the following:sample
- obtain a sample from the posterior using HMCoptimize
- penalized maximum likelihood estimationvariational
- automatic variational inferencegenerate_quantities
- run model’sgenerated quantities
block on existing sample to obtain new quantities of interest.
diagnose
- compute and compare sampler gradient calculations to finite differences.
data
- specifies the input data file, if any.output
- specifies program outputs, both disk files and terminal window outputs.init
- specifies initial values for the model parameters, if any.random
- specifies the seed for the psuedo-random number.
The remainder of this chapter covers the general configuration options used for all processing. The following chapters cover the per-inference configuration options.
8.1 Input Data Argument
The values for all variables declared in the data block of the model are read in from an input data file in either JSON or Rdump format. The syntax for the input data argument is:
data file=<filepath>
The keyword data
must be followed directly by the keyword-value pair file=<filepath>
.
If the model doesn’t declare any data variables, this argument is ignored.
The input data file must contain definitions for all data variables declared in the data block.
If one or more data block variables are missing from the input data file, the program will
print and error message to the terminal.
For example, the model bernoulli.stan
defines two data variables N
and y
.
If the input data file doesn’t include both variables, or if the data variable
doesn’t match the declared type and dimensions, the program will exit with an error message
at the point where it first encounters missing data.
For example if the input data file doesn’t include the definition for variable y
,
the executable exits with the following message:
Exception: variable does not exist; processing stage=data initialization; variable name=y; base type=int (in 'examples/bernoulli/bernoulli.stan', line 3, column 2 to column 28)
8.2 Output Control Arguments
The output
keyword is used to specify non-default options for
output files and messages written to the terminal window.
The output
keyword takes several keyword-value pair sub-arguments.
The keyword value pair file=<filepath>
specifies the location of the
Stan CSV output file. If unspecified, the output file is written to a file named output.csv
in the current working directory.
The keyword value pair diagnostic_file=<filepath>
specifies the location of the
auxiliary output file. By default, no auxiliary output file is produced.
This option is only valid for the iterative algorithms sample
and variational
.
The keyword value pair refresh=<int>
specifies the
number of iterations between progress messages written to the terminal window.
The default value is 100 iterations.
Note: Numeric values in the Stan CSV output and diagnostics file provide only 6 decimal places of precision.
8.3 Initialize Model Parameters Argument
Initialization is only applied to parameters defined in the parameters block. By default, all parameters are initialized to random draws from a uniform distribution over the range \([-2, 2]\). These values are on the unconstrained scale, so must be inverse transformed back to satisfy the constraints declared for parameters. Because zero is chosen to be a reasonable default initial value for most parameters, the interval around zero provides a fairly diffuse starting point. For instance, unconstrained variables are initialized randomly in \((-2, 2)\), variables constrained to be positive are initialized roughly in \((0.14, 7.4)\), variables constrained to fall between 0 and 1 are initialized with values roughly in \((0.12, 0.88)\).
The initialization argument is specified as keyword-value pair with keyword init
.
The value can be one of the following:
positive real number \(x\). All parameters will be initialized to random draws from a uniform distribution over the range \([-x, x]\).
\(0\) - All parameters will be initialized to zero values on the unconstrained scale. The transforms are arranged in such a way that zero initialization provides reasonable variable initializations: \(0\) for unconstrained parameters; \(1\) for parameters constrained to be positive; \(0.5\) for variables to constrained to lie between \(0\) and \(1\); a symmetric (uniform) vector for simplexes; unit matrices for both correlation and covariance matrices; and so on.
filepath - A data file in JSON or Rdump format containing initial parameters values for some or all of the model parameters. User specified initial values must satisfy the constraints declared in the model (i.e., they are on the constrained scale). Parameters which aren’t explicitly initializied will be initialized randomly over the range \([-2, 2]\).
8.4 Random Number Generator Arguments
The random-number generator’s behavior is determined by the unsigned seed (positive integer) it is started with. If a seed is not specified, or a seed of 0 or less is specified, the system time is used to generate a seed. The seed is recorded and included with Stan’s output regardless of whether it was specified or generated randomly from the system time.
The syntax for the random seed argument is:
random seed=<int>
The keyword random
must be followed directly by the keyword-value pair seed=<int>
.
8.5 Chain Identifier Argument: id
The chain identifier argument is used in conjunction with the random seed
argument when running multiple Markov chains for sampling.
The chain identifier is used to advance the random number generator a very large number of random variates so that two chains
with the same seed and different identifiers draw from non-overlapping subsequences
of the random-number sequence determined by the seed.
Together, the seed and chain identifier determine the behavior of the random number generator.
The syntax for the random seed argument is:
id=<int>
The default value is 0.
When running a set of chains from the command line with a specified seed, this argument should be set to the chain index. E.g., when running 4 chains, the value should be 1,..,4, successively. When running multiple chains from a single command, Stan’s interfaces manage the chain identifier arguments automatically.
For complete reproducibility, every aspect of the environment needs to be locked down from the OS and version to the C++ compiler and version to the version of Stan and all dependent libraries. See the Stan Reference Manual Reproducibility chapter for further details.
8.6 Command Line Help
CmdStan provides a help
and help-all
mechanism that displays either the
available top-level or keyword-specific key-value argument pairs.
To display top-level help, call the CmdStan executable with keyword help
:
> ./bernoulli help
Usage: ./bernoulli <arg1> <subarg1_1> ... <subarg1_m> ... <arg_n> <subarg_n_1> ... <subarg_n_m>
Begin by selecting amongst the following inference methods and diagnostics,
sample Bayesian inference with Markov Chain Monte Carlo
optimize Point estimation
variational Variational inference
diagnose Model diagnostics
generate_quantities Generate quantities of interest
Or see help information with
help Prints help
help-all Prints entire argument tree
Additional configuration available by specifying
id Unique process identifier
data Input data options
init Initialization method: "x" initializes randomly between [-x, x], "0" initializes to 0, anything else identifies a file of values
random Random number configuration
output File output options
See ./bernoulli <arg1> [ help | help-all ] for details on individual arguments.