Generating Quantities of Interest from a Fitted Model

The generate_quantities method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. Instead, you write a modified version of the original Stan program and add a generated quantities block or modify the existing one which specifies how to compute the new quantities of interest. Running the generate_quantities method on the new program together with sampler outputs (i.e., a set of draws) from the fitted model runs the generated quantities block of the new program using the the existing sample by plugging in the per-draw parameter estimates for the computations in the generated quantities block.

This method requires sub-argument fitted_params which takes as its value an existing Stan CSV file that contains a parameter values from an equivalent model, i.e., a model with the same parameters block, conditioned on the same data.

The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:

generate simulated data for model testing by forward sampling
generate predictions for new data
calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
calculate posterior expectations
transform parameters for reporting
apply full Bayesian decision theory
calculate log likelihoods, deviances, etc. for model comparison

For an overview of the uses of this feature, see the Stan User’s Guide section on Stand-alone generated quantities and ongoing prediction.

Example

To illustrate how this works we use the generate_quantities method to do posterior predictive checks using the estimate of theta given the example bernoulli model and data, following the posterior predictive simulation procedure in the Stan User’s Guide.

We write a program bernoulli_ppc.stan which contains the following generated quantities block, with comments to explain the procedure:

generated quantities {
  array[N] int y_sim;
  // use current estimate of theta to generate new sample
  for (n in 1:N) {
    y_sim[n] = bernoulli_rng(theta);
  }
  // estimate theta_rep from new sample
  real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
}

The rest of the program is the same as in bernoulli.stan.

The generate_method requires the sub-argument fitted_params which takes as its value the name of a Stan CSV file. The per-draw parameter values from the fitted_params file will be used to run the generated quantities block.

If we run the bernoulli.stan program for a single chain to generate a sample in file bernoulli_fit.csv:

> ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv

Then we can run the bernoulli_ppc.stan to carry out the posterior predictive checks:

> ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \
                  data file=bernoulli.data.json \
                  output file=bernoulli_ppc.csv

The output file bernoulli_ppc.csv contains only the values for the variables declared in the generated quantities block, i.e., theta_rep and the elements of y_sim:

# model = bernoulli_ppc_model
# method = generate_quantities
#   generate_quantities
#     fitted_params = bernoulli_fit.csv
# id = 1 (Default)
# data
#   file = bernoulli.data.json
# init = 2 (Default)
# random
#   seed = 2983956445 (Default)
# output
#   file = output.csv (Default)
y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_rep
1,1,1,0,0,0,1,1,0,1,0.6
1,1,0,1,0,0,1,0,1,0,0.5
1,0,1,1,1,1,1,1,0,1,0.8
0,1,0,1,0,1,0,1,0,0,0.4
1,0,0,0,0,0,0,0,0,0,0.1
0,0,0,0,0,1,1,1,0,0,0.3
0,0,1,0,1,0,0,0,0,0,0.2
1,0,1,0,1,1,0,1,1,0,0.6
...

Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.:

generated quantities {
  array[N] int y_sim;
  // use current estimate of theta to generate new sample
  for (n in 1:N) {
    y_sim[n] = bernoulli_rng(theta);
  }
  real<lower=0, upper=1> theta_cp = theta;
  // estimate theta_rep from new sample
  real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
}

Now the output is slightly more interpretable: theta_cp is the same as the theta used to generate the values y_sim[1] through y_sim[1]. Comparing columns theta_cp and theta_rep allows us to see how the uncertainty in our estimate of theta is carried forward into our predictions:

y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_cp,theta_rep
0,1,1,0,1,0,0,1,1,0,0.545679,0.5
1,1,1,1,1,1,0,1,1,0,0.527164,0.8
1,1,1,1,0,1,1,1,1,0,0.529116,0.8
1,0,1,1,1,1,0,0,1,0,0.478844,0.6
0,1,0,0,0,0,1,0,1,0,0.238793,0.3
0,0,0,0,0,1,1,0,0,0,0.258294,0.2
1,1,1,0,0,0,0,0,0,0,0.258465,0.3

Errors

The fitted_params file must be a Stan CSV file; attempts to use a regular CSV file will result an error message of the form:

Error reading fitted param names from sample csv file <filename.csv>

The fitted_params file must contain columns corresponding to legal values for all parameters defined in the model. If any parameters are missing, the program will exit with an error message of the form:

Error reading fitted param names from sample csv file <filename.csv>

The parameter values of the fitted_params are on the constrained scale and must obey all constraints. For example, if we modify the contents of the first reported draw in bernoulli_fit.csv so that the value of theta is outside the declared bounds real<lower=0, upper=1>, the program will return the following error message:

Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] \
(in 'bernoulli_ppc.stan', line 5, column 2 to column 30)