7 Generating Quantities of Interest from a Fitted Model

The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:

  • generate simulated data for model testing by forward sampling
  • generate predictions for new data
  • calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
  • calculating posterior expectations
  • transform parameters for reporting
  • apply full Bayesian decision theory
  • calculate log likelihoods, deviances, etc. for model comparison

The generate_quantities method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. Instead, you write a modified version of the original Stan program and add a generated quantities block or modify the existing one which specifies how to compute the new quantities of interest. Running the generate_quantities method on the new program together with sampler outputs (i.e., a set of draws) from the fitted model runs the generated quantities block of the new program using the the existing sample by plugging in the per-draw parameter estimates for the computations in the generated quantities block. See the Stan User’s Guide section Stand-alone generated quantities and ongoing prediction for further details.

To illustrate how this works we use the generate_quantities method to do posterior predictive checks using the estimate of theta given the example bernoulli model and data, following the posterior predictive simulation procedure in the Stan User’s Guide.

We write a program bernoulli_ppc.stan which contains the following generated quantities block, with comments to explain the procedure:

generated quantities {
  real<lower=0, upper=1> theta_rep;
  array[N] int y_sim;
  // use current estimate of theta to generate new sample
  for (n in 1:N) {
    y_sim[n] = bernoulli_rng(theta);
  }
  // estimate theta_rep from new sample
  theta_rep = sum(y_sim) * 1.0 / N;
}

The rest of the program is the same as in bernoulli.stan.

The generate_method requires the sub-argument fitted_params which takes as its value the name of a Stan CSV file. The per-draw parameter estimates from the fitted_params file will be used to run the generated quantities block.

If we run the bernoulli.stan program for a single chain to generate a sample in file bernoulli_fit.csv:

> ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv

Then we can run the bernoulli_ppc.stan to carry out the posterior predictive checks:

> ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \
                  data file=bernoulli.data.json \
                  output file=bernoulli_ppc.csv

The output file bernoulli_ppc.csv consists of just the values for the variables declared in the generated quantities block, i.e., theta_rep and the elements of y_sim:

# model = bernoulli_ppc_model
# method = generate_quantities
#   generate_quantities
#     fitted_params = bernoulli_fit.csv
# id = 0 (Default)
# data
#   file = bernoulli.data.json
# init = 2 (Default)
# random
#   seed = 2135140492 (Default)
# output
#   file = bernoulli_ppc.csv
#   diagnostic_file =  (Default)
#   refresh = 100 (Default)
theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10
0.2,0,0,1,0,0,0,0,0,1,0
0.3,1,0,0,1,0,1,0,0,0,0
0.8,1,0,1,1,1,1,1,1,1,0
0.1,0,0,0,0,0,1,0,0,0,0
0.3,0,0,0,0,0,0,1,1,1,0

Note: the only relevant analysis of the resulting CSV output is computing per-column statistics; this can easily be done in Python, R, Excel or similar, or you can use the CmdStanPy and CmdStanR interfaces which provide a better user experience for this workflow.

Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.:

generated quantities {
  real<lower=0, upper=1> theta_cp = theta;
  real<lower=0, upper=1> theta_rep;
  array[N] int y_sim;
  // use current estimate of theta to generate new sample
  for (n in 1:N) {
    y_sim[n] = bernoulli_rng(theta);
  }
  // estimate theta_rep from new sample
  theta_rep = sum(y_sim) * 1.0 / N;
}

Now the output is slightly more interpretable: theta_cp is the same as the theta used to generate the values y_sim[1] through y_sim[1]. Comparing columns theta_cp and theta_rep allows us to see how the uncertainty in our estimate of theta is carried forward into our predictions:

theta_cp,theta_rep,y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10
0.102391,0,0,0,0,0,0,0,0,0,0,0
0.519567,0.2,0,1,0,0,1,0,0,0,0,0
0.544634,0.6,1,0,0,0,0,1,1,1,1,1
0.167651,0,0,0,0,0,0,0,0,0,0,0
0.167651,0.1,1,0,0,0,0,0,0,0,0,0