`posterior_predict.stanreg.Rd`

The posterior predictive distribution is the distribution of the outcome implied by the model after using the observed data to update our beliefs about the unknown parameters in the model. Simulating data from the posterior predictive distribution using the observed predictors is useful for checking the fit of the model. Drawing from the posterior predictive distribution at interesting values of the predictors also lets us visualize how a manipulation of a predictor affects (a function of) the outcome(s). With new observations of predictor variables we can use the posterior predictive distribution to generate predicted outcomes.

# S3 method for stanreg posterior_predict(object, newdata = NULL, draws = NULL, re.form = NULL, fun = NULL, seed = NULL, offset = NULL, ...) # S3 method for stanmvreg posterior_predict(object, m = 1, newdata = NULL, draws = NULL, re.form = NULL, fun = NULL, seed = NULL, ...)

object | A fitted model object returned by one of the
rstanarm modeling functions. See |
---|---|

newdata | Optionally, a data frame in which to look for variables with
which to predict. If omitted, the model matrix is used. If |

draws | An integer indicating the number of draws to return. The default and maximum number of draws is the size of the posterior sample. |

re.form | If |

fun | An optional function to apply to the results. |

seed | An optional |

offset | A vector of offsets. Only required if |

... | For |

m | Integer specifying the number or name of the submodel |

A `draws`

by `nrow(newdata)`

matrix of simulations from the
posterior predictive distribution. Each row of the matrix is a vector of
predictions generated using a single draw of the model parameters from the
posterior distribution. The returned matrix will also have class
`"ppd"`

to indicate it contains draws from the posterior predictive
distribution.

For binomial models with a number of trials greater than one (i.e., not
Bernoulli models), if `newdata`

is specified then it must include all
variables needed for computing the number of binomial trials to use for the
predictions. For example if the left-hand side of the model formula is
`cbind(successes, failures)`

then both `successes`

and
`failures`

must be in `newdata`

. The particular values of
`successes`

and `failures`

in `newdata`

do not matter so
long as their sum is the desired number of trials. If the left-hand side of
the model formula were `cbind(successes, trials - successes)`

then
both `trials`

and `successes`

would need to be in `newdata`

,
probably with `successes`

set to `0`

and `trials`

specifying
the number of trials. See the Examples section below and the
*How to Use the rstanarm Package* for examples.

For models estimated with `stan_clogit`

, the number of
successes per stratum is ostensibly fixed by the research design. Thus, when
doing posterior prediction with new data, the `data.frame`

passed to
the `newdata`

argument must contain an outcome variable and a stratifying
factor, both with the same name as in the original `data.frame`

. Then, the
posterior predictions will condition on this outcome in the new data.

`pp_check`

for graphical posterior predictive checks.
Examples of posterior predictive checking can also be found in the
rstanarm vignettes and demos.

if (!exists("example_model")) example(example_model)#> Warning: no help found for ‘example_model’yrep <- posterior_predict(example_model)#> Error in posterior_predict(example_model): object 'example_model' not foundtable(yrep)#> Error in table(yrep): object 'yrep' not found# Using newdata counts <- c(18,17,15,20,10,20,25,13,12) outcome <- gl(3,1,9) treatment <- gl(3,3) fit3 <- stan_glm(counts ~ outcome + treatment, family = poisson(link="log"), prior = normal(0, 1), prior_intercept = normal(0, 5))#> Warning: Omitting the 'data' argument is not recommended and may not be allowed in future versions of rstanarm. Some post-estimation functions (in particular 'update', 'loo', 'kfold') are not guaranteed to work properly unless 'data' is specified as a data frame.#> #> SAMPLING FOR MODEL 'count' NOW (CHAIN 1). #> #> Gradient evaluation took 2.7e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.059327 seconds (Warm-up) #> 0.0535 seconds (Sampling) #> 0.112827 seconds (Total) #> #> #> SAMPLING FOR MODEL 'count' NOW (CHAIN 2). #> #> Gradient evaluation took 1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.053829 seconds (Warm-up) #> 0.048628 seconds (Sampling) #> 0.102457 seconds (Total) #> #> #> SAMPLING FOR MODEL 'count' NOW (CHAIN 3). #> #> Gradient evaluation took 1.2e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.059951 seconds (Warm-up) #> 0.053054 seconds (Sampling) #> 0.113005 seconds (Total) #> #> #> SAMPLING FOR MODEL 'count' NOW (CHAIN 4). #> #> Gradient evaluation took 1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.1 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.058931 seconds (Warm-up) #> 0.046459 seconds (Sampling) #> 0.10539 seconds (Total) #>nd <- data.frame(treatment = factor(rep(1,3)), outcome = factor(1:3)) ytilde <- posterior_predict(fit3, nd, draws = 500) print(dim(ytilde)) # 500 by 3 matrix (draws by nrow(nd))#> [1] 500 3ytilde <- data.frame(count = c(ytilde), outcome = rep(nd$outcome, each = 500)) ggplot2::ggplot(ytilde, ggplot2::aes(x=outcome, y=count)) + ggplot2::geom_boxplot() + ggplot2::ylab("predicted count")# Using newdata with a binomial model. # example_model is binomial so we need to set # the number of trials to use for prediction. # This could be a different number for each # row of newdata or the same for all rows. # Here we'll use the same value for all. nd <- lme4::cbpp print(formula(example_model)) # cbind(incidence, size - incidence) ~ ...#> Error in formula(example_model): object 'example_model' not foundnd$size <- max(nd$size) + 1L # number of trials nd$incidence <- 0 # set to 0 so size - incidence = number of trials ytilde <- posterior_predict(example_model, newdata = nd)#> Error in posterior_predict(example_model, newdata = nd): object 'example_model' not found# Using fun argument to transform predictions mtcars2 <- mtcars mtcars2$log_mpg <- log(mtcars2$mpg) fit <- stan_glm(log_mpg ~ wt, data = mtcars2)#> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1). #> #> Gradient evaluation took 2.8e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.28 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.05233 seconds (Warm-up) #> 0.047743 seconds (Sampling) #> 0.100073 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2). #> #> Gradient evaluation took 1.2e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.12 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.051602 seconds (Warm-up) #> 0.045257 seconds (Sampling) #> 0.096859 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3). #> #> Gradient evaluation took 1.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.047697 seconds (Warm-up) #> 0.044885 seconds (Sampling) #> 0.092582 seconds (Total) #> #> #> SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4). #> #> Gradient evaluation took 1.1e-05 seconds #> 1000 transitions using 10 leapfrog steps per transition would take 0.11 seconds. #> Adjust your expectations accordingly! #> #> #> Iteration: 1 / 2000 [ 0%] (Warmup) #> Iteration: 200 / 2000 [ 10%] (Warmup) #> Iteration: 400 / 2000 [ 20%] (Warmup) #> Iteration: 600 / 2000 [ 30%] (Warmup) #> Iteration: 800 / 2000 [ 40%] (Warmup) #> Iteration: 1000 / 2000 [ 50%] (Warmup) #> Iteration: 1001 / 2000 [ 50%] (Sampling) #> Iteration: 1200 / 2000 [ 60%] (Sampling) #> Iteration: 1400 / 2000 [ 70%] (Sampling) #> Iteration: 1600 / 2000 [ 80%] (Sampling) #> Iteration: 1800 / 2000 [ 90%] (Sampling) #> Iteration: 2000 / 2000 [100%] (Sampling) #> #> Elapsed Time: 0.049339 seconds (Warm-up) #> 0.045774 seconds (Sampling) #> 0.095113 seconds (Total) #>ytilde <- posterior_predict(fit, fun = exp)