# Generating Quantities of Interest from a Fitted Model

The `generate_quantities` method allows you to generate additional quantities of interest from a fitted model without re-running the sampler. Instead, you write a modified version of the original Stan program and add a generated quantities block or modify the existing one which specifies how to compute the new quantities of interest. Running the `generate_quantities` method on the new program together with sampler outputs (i.e., a set of draws) from the fitted model runs the generated quantities block of the new program using the the existing sample by plugging in the per-draw parameter estimates for the computations in the generated quantities block.

This method requires sub-argument `fitted_params` which takes as its value an existing Stan CSV file that contains a parameter values from an equivalent model, i.e., a model with the same parameters block, conditioned on the same data.

The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:

• generate simulated data for model testing by forward sampling
• generate predictions for new data
• calculate posterior event probabilities, including multiple comparisons, sign tests, etc.
• calculate posterior expectations
• transform parameters for reporting
• apply full Bayesian decision theory
• calculate log likelihoods, deviances, etc. for model comparison

For an overview of the uses of this feature, see the Stan User’s Guide section on Stand-alone generated quantities and ongoing prediction.

## Example

To illustrate how this works we use the `generate_quantities` method to do posterior predictive checks using the estimate of `theta` given the example bernoulli model and data, following the posterior predictive simulation procedure in the Stan User’s Guide.

We write a program `bernoulli_ppc.stan` which contains the following generated quantities block, with comments to explain the procedure:

``````generated quantities {
array[N] int y_sim;
// use current estimate of theta to generate new sample
for (n in 1:N) {
y_sim[n] = bernoulli_rng(theta);
}
// estimate theta_rep from new sample
real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
}``````

The rest of the program is the same as in `bernoulli.stan`.

The `generate_method` requires the sub-argument `fitted_params` which takes as its value the name of a Stan CSV file. The per-draw parameter values from the `fitted_params` file will be used to run the generated quantities block.

If we run the `bernoulli.stan` program for a single chain to generate a sample in file `bernoulli_fit.csv`:

``> ./bernoulli sample data file=bernoulli.data.json output file=bernoulli_fit.csv``

Then we can run the `bernoulli_ppc.stan` to carry out the posterior predictive checks:

``````> ./bernoulli_ppc generate_quantities fitted_params=bernoulli_fit.csv \
data file=bernoulli.data.json \
output file=bernoulli_ppc.csv``````

The output file `bernoulli_ppc.csv` contains only the values for the variables declared in the `generated quantities` block, i.e., `theta_rep` and the elements of `y_sim`:

``````# model = bernoulli_ppc_model
# method = generate_quantities
#   generate_quantities
#     fitted_params = bernoulli_fit.csv
# id = 1 (Default)
# data
#   file = bernoulli.data.json
# init = 2 (Default)
# random
#   seed = 2983956445 (Default)
# output
#   file = output.csv (Default)
y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_rep
1,1,1,0,0,0,1,1,0,1,0.6
1,1,0,1,0,0,1,0,1,0,0.5
1,0,1,1,1,1,1,1,0,1,0.8
0,1,0,1,0,1,0,1,0,0,0.4
1,0,0,0,0,0,0,0,0,0,0.1
0,0,0,0,0,1,1,1,0,0,0.3
0,0,1,0,1,0,0,0,0,0,0.2
1,0,1,0,1,1,0,1,1,0,0.6
...``````

Given the current implementation, to see the fitted parameter values for each draw, create a copy variable in the generated quantities block, e.g.:

``````generated quantities {
array[N] int y_sim;
// use current estimate of theta to generate new sample
for (n in 1:N) {
y_sim[n] = bernoulli_rng(theta);
}
real<lower=0, upper=1> theta_cp = theta;
// estimate theta_rep from new sample
real<lower=0, upper=1> theta_rep = sum(y_sim) * 1.0 / N;
}``````

Now the output is slightly more interpretable: `theta_cp` is the same as the `theta` used to generate the values `y_sim[1]` through `y_sim[1]`. Comparing columns `theta_cp` and `theta_rep` allows us to see how the uncertainty in our estimate of `theta` is carried forward into our predictions:

``````y_sim.1,y_sim.2,y_sim.3,y_sim.4,y_sim.5,y_sim.6,y_sim.7,y_sim.8,y_sim.9,y_sim.10,theta_cp,theta_rep
0,1,1,0,1,0,0,1,1,0,0.545679,0.5
1,1,1,1,1,1,0,1,1,0,0.527164,0.8
1,1,1,1,0,1,1,1,1,0,0.529116,0.8
1,0,1,1,1,1,0,0,1,0,0.478844,0.6
0,1,0,0,0,0,1,0,1,0,0.238793,0.3
0,0,0,0,0,1,1,0,0,0,0.258294,0.2
1,1,1,0,0,0,0,0,0,0,0.258465,0.3``````

## Errors

The `fitted_params` file must be a Stan CSV file; attempts to use a regular CSV file will result an error message of the form:

``Error reading fitted param names from sample csv file <filename.csv>``

The `fitted_params` file must contain columns corresponding to legal values for all parameters defined in the model. If any parameters are missing, the program will exit with an error message of the form:

``Error reading fitted param names from sample csv file <filename.csv>``

The parameter values of the `fitted_params` are on the constrained scale and must obey all constraints. For example, if we modify the contents of the first reported draw in `bernoulli_fit.csv` so that the value of `theta` is outside the declared bounds `real<lower=0, upper=1>`, the program will return the following error message:

``````Exception: lub_free: Bounded variable is 1.21397, but must be in the interval [0, 1] \
(in 'bernoulli_ppc.stan', line 5, column 2 to column 30)``````