8.1 Overview of Stan’s program blocks
The full set of named program blocks is exemplified in the following skeletal Stan program.
functions {
// ... function declarations and definitions ...
}data {
// ... declarations ...
}transformed data {
// ... declarations ... statements ...
}parameters {
// ... declarations ...
}transformed parameters {
// ... declarations ... statements ...
}model {
// ... declarations ... statements ...
}generated quantities {
// ... declarations ... statements ...
}
The function-definition block contains user-defined functions. The data block declares the required data for the model. The transformed data block allows the definition of constants and transforms of the data. The parameters block declares the model’s parameters — the unconstrained version of the parameters is what’s sampled or optimized. The transformed parameters block allows variables to be defined in terms of data and parameters that may be used later and will be saved. The model block is where the log probability function is defined. The generated quantities block allows derived quantities based on parameters, data, and optionally (pseudo) random number generation.
Optionality and ordering
All of the blocks are optional. A consequence of this is that the empty string is a valid Stan program, although it will trigger a warning message from the Stan compiler. The Stan program blocks that occur must occur in the order presented in the skeletal program above. Within each block, both declarations and statements are optional, subject to the restriction that the declarations come before the statements.
Variable scope
The variables declared in each block have scope over all subsequent statements. Thus a variable declared in the transformed data block may be used in the model block. But a variable declared in the generated quantities block may not be used in any earlier block, including the model block. The exception to this rule is that variables declared in the model block are always local to the model block and may not be accessed in the generated quantities block; to make a variable accessible in the model and generated quantities block, it must be declared as a transformed parameter.
Variables declared as function parameters have scope only within that function definition’s body, and may not be assigned to (they are constant).
Function scope
Functions defined in the function block may be used in any appropriate block. Most functions can be used in any block and applied to a mixture of parameters and data (including constants or program literals).
Random-number-generating functions are restricted to transformed data
and generated quantities blocks, and within user-defined functions
ending in _rng
; such functions are suffixed with _rng
.
Log-probability modifying functions to blocks where the log
probability accumulator is in scope (transformed parameters and
model); such functions are suffixed with _lp
.
Density functions defined in the program may be used in sampling statements.
Automatic variable definitions
The variables declared in the data
and parameters
block
are treated differently than other variables in that they are
automatically defined by the context in which they are used. This is
why there are no statements allowed in the data or parameters block.
The variables in the data
block are read from an external input
source such as a file or a designated R data structure. The
variables in the parameters
block are read from the sampler’s
current parameter values (either standard HMC or NUTS). The initial
values may be provided through an external input source, which is also
typically a file or a designated R data structure. In each case, the
parameters are instantiated to the values for which the model defines
a log probability function.
Transformed variables
The transformed data
and transformed parameters
block
behave similarly to each other. Both allow new variables to be
declared and then defined through a sequence of statements. Because
variables scope over every statement that follows them, transformed
data variables may be defined in terms of the data variables.
Before generating any draws, data variables are read in, then the transformed data variables are declared and the associated statements executed to define them. This means the statements in the transformed data block are only ever evaluated once.10
Transformed parameters work the same way, being defined in terms of the parameters, transformed data, and data variables. The difference is the frequency of evaluation. Parameters are read in and (inverse) transformed to constrained representations on their natural scales once per log probability and gradient evaluation. This means the inverse transforms and their log absolute Jacobian determinants are evaluated once per leapfrog step. Transformed parameters are then declared and their defining statements executed once per leapfrog step.
Generated quantities
The generated quantity variables are defined once per sample after all the leapfrog steps have been completed. These may be random quantities, so the block must be rerun even if the Metropolis adjustment of HMC or NUTS rejects the update proposal.
Variable read, write, and definition summary
A table summarizing the point at which variables are read, written, and defined is given in the block actions table.
Block Actions Table. The read, write, transform, and evaluate actions and periodicities listed in the last column correspond to the Stan program blocks in the first column. The middle column indicates whether the block allows statements. The last row indicates that parameter initialization requires a read and transform operation applied once per chain.
block | statement | action / period |
---|---|---|
data |
no | read / chain |
transformed data |
yes | evaluate / chain |
parameters |
no | inv. transform, Jacobian / leapfrog |
inv. transform, write / sample | ||
transformed parameters |
yes | evaluate / leapfrog |
write / sample | ||
model |
yes | evaluate / leapfrog step |
generated quantities |
yes | eval / sample |
write / sample | ||
\slshape (initialization) |
n/a | read, transform / chain |
Variable Declaration Table. This table indicates where variables that are not basic data or parameters should be declared, based on whether it is defined in terms of parameters, whether it is used in the log probability function defined in the model block, and whether it is printed. The two lines marked with asterisks (\(*\)) should not be used as there is no need to print a variable every iteration that does not depend on the value of any parameters.
param depend | in target | save | declare in |
---|---|---|---|
+ | + | + | transformed parameters |
+ | + | - | model (local) |
+ | - | + | generated quantities |
+ | - | - | generated quantities (local) |
- | + | + | transformed data and generated quantities |
- | + | - | transformed data |
- | - | + | generated quantities |
- | - | - | transformed data (local) |
Another way to look at the variables is in terms of their function. To decide which variable to use, consult the charts in the variable declaration table. The last line has no corresponding location, as there is no need to print a variable every iteration that does not depend on parameters.11
The rest of this chapter provides full details on when and how the variables and statements in each block are executed.
If the C++ code is configured for concurrent threads, the data and transformed data blocks can be executed once and reused for multiple chains.↩︎
It is possible to print a variable every iteration that does not depend on parameters—just define it (or redefine it if it is transformed data) in the
generated quantities
block.↩︎