7 Generating Quantities of Interest from a Fitted Model
The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:
- generate simulated data for model testing by forward sampling
- generate predictions for new data
- calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
- calculating posterior expectations
- transform parameters for reporting
- apply full Bayesian decision theory
- calculate log likelihoods, deviances, etc. for model comparison
The generate_quantities
method allows you to generate additional quantities
of interest from a fitted model without re-running the sampler.
Instead, you write a modified version of the original Stan program
and add a generated quantities block or modify the existing one
which specifies how to compute the new quantities of interest.
Running the generate_quantities
method on the new program
together with sampler outputs (i.e., a set of draws)
from the fitted model runs the generated quantities block
of the new program using the the existing sample by plugging
in the per-draw parameter estimates for the computations in
the generated quantities block.
See the Stan User’s Guide section
Stand-alone generated quantities and ongoing prediction
for further details.
To illustrate how this works we use the generate_quantities
to do posterior predictive checks using the estimate of theta
the example bernoulli model and data, following the
posterior predictive simulation
procedure in the Stan User’s Guide.
We write a program bernoulli_ppc.stan
which contains
the following generated quantities block, with comments
to explain the procedure:
generated quantities {
real<lower=0, upper=1> theta_rep;
array[N] int y_sim;
// use current estimate of theta to generate new sample
for (n in 1:N) {
y_sim[n] = bernoulli_rng(theta);
// estimate theta_rep from new sample
theta_rep = sum(y_sim) * 1.0 / N;
The rest of the program is the same as in bernoulli.stan
The generate_method
requires the sub-argument fitted_params
which takes as its value the name of a Stan CSV file.
The per-draw parameter estimates from the fitted_params
file will
be used to run the generated quantities block.
If we run the bernoulli.stan
program for a single chain to
generate a sample in file bernoulli_fit.csv
> ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv
Then we can run the bernoulli_ppc.stan
to carry out the posterior predictive
> ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \
data file=bernoulli.data.json \
output file=bernoulli_ppc.csv
The output file bernoulli_ppc.csv
consists of just the values for the variables declared in the generated quantities block, i.e., theta_rep
and the elements of y_sim
# model = bernoulli_ppc_model
# method = generate_quantities
# generate_quantities
# fitted_params = bernoulli_fit.csv
# id = 0 (Default)
# data
# file = bernoulli.data.json
# init = 2 (Default)
# random
# seed = 2135140492 (Default)
# output
# file = bernoulli_ppc.csv
# diagnostic_file = (Default)
# refresh = 100 (Default)
Note: the only relevant analysis of the resulting CSV output is computing per-column statistics; this can easily be done in Python, R, Excel or similar, or you can use the CmdStanPy and CmdStanR interfaces which provide a better user experience for this workflow.
Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.:
generated quantities {
real<lower=0, upper=1> theta_cp = theta;
real<lower=0, upper=1> theta_rep;
array[N] int y_sim;
// use current estimate of theta to generate new sample
for (n in 1:N) {
y_sim[n] = bernoulli_rng(theta);
// estimate theta_rep from new sample
theta_rep = sum(y_sim) * 1.0 / N;
Now the output is slightly more interpretable: theta_cp
is the same as the theta
used to generate the values y_sim[1]
through y_sim[1]
Comparing columns theta_cp
and theta_rep
allows us to see how the
uncertainty in our estimate of theta
is carried forward
into our predictions: