Run Generated Quantities

The generated quantities block computes quantities of interest (QOIs) based on the data, transformed data, parameters, and transformed parameters. It can be used to:

  • generate simulated data for model testing by forward sampling

  • generate predictions for new data

  • calculate posterior event probabilities, including multiple comparisons, sign tests, etc.

  • calculating posterior expectations

  • transform parameters for reporting

  • apply full Bayesian decision theory

  • calculate log likelihoods, deviances, etc. for model comparison

The CmdStanModel class generate_quantities method is useful once you have successfully fit a model to your data and have a valid sample from the posterior and a version of the original model where the generated quantities block contains the necessary statements to compute additional quantities of interest.

By running the generate_quantities method on the new model with a sample generated by the existing model, the sampler uses the per-draw parameter estimates from the sample to compute the generated quantities block of the new model.

The generate_quantities method returns a CmdStanGQ object which provides properties to retrieve information about the sample:

  • chains

  • column_names

  • generated_quantities

  • generated_quantities_pd

  • sample_plus_quantities

  • save_csvfiles()

The sample_plus_quantities combines the existing sample and new quantities of interest into a pandas DataFrame object which can be used for downstream analysis and visualization. In this way you add more columns of information to an existing sample.

Configuration

  • mcmc_sample - either a CmdStanMCMC object or a list of stan-csv files

  • data: Values for all data variables in the model, specified either as a dictionary with entries matching the data variables, or as the path of a data file in JSON or Rdump format.

  • seed: The seed for random number generator.

  • gq_output_dir: A path or file name which will be used as the basename for the CmdStan output files.

Example: add posterior predictive checks to bernoulli.stan

In this example we use the CmdStan example model bernoulli.stan and data file bernoulli.data.json as our existing model and data. We create the program bernoulli_ppc.stan by adding a generated quantities block to bernoulli.stan which generates a new data vector y_rep using the current estimate of theta.

generated quantities {
  int y_sim[N];
  real<lower=0,upper=1> theta_rep;
  for (n in 1:N)
    y_sim[n] = bernoulli_rng(theta);
  theta_rep = sum(y) / N;
}

The first step is to fit model bernoulli to the data:

import os
from cmdstanpy import CmdStanModel, cmdstan_path

bernoulli_dir = os.path.join(cmdstan_path(), 'examples', 'bernoulli')
bernoulli_path = os.path.join(bernoulli_dir, 'bernoulli.stan')
bernoulli_data = os.path.join(bernoulli_dir, 'bernoulli.data.json')

# instantiate, compile bernoulli model
bernoulli_model = CmdStanModel(stan_file=bernoulli_path)

# fit the model to the data
bern_fit = bernoulli_model.sample(data=bernoulli_data)

Then we compile the model bernoulli_ppc and use the fit parameter estimates to generate quantities of interest:

bernoulli_ppc_model = CmdStanModel(stan_file='bernoulli_ppc.stan')
new_quantities = bernoulli_ppc_model.generate_quantities(data=bern_data, mcmc_sample=bern_fit)

The generate_quantities method returns a CmdStanGQ object which contains the values for all variables in the generated quantitites block of the program bernoulli_ppc.stan. Unlike the output from the sample method, it doesn’t contain any information on the joint log probability density, sampler state, or parameters or transformed parameter values.

new_quantities.column_names
new_quantities.generated_quantities.shape
for i in range(len(new_quantities.column_names)):
    print(new_quantities.generated_quantities[:,i].mean())

The method sample_plus_quantities returns a pandas DataFrame which combines the input drawset with the generated quantities.

sample_plus = new_quantities.sample_plus_quantities
print(sample_plus.shape)
print(sample_plus.columns)