Various plots of predictive errors y - yrep
. See the
Details and Plot Descriptions sections, below.
ppc_error_hist(
y,
yrep,
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_error_hist_grouped(
y,
yrep,
group,
...,
facet_args = list(),
binwidth = NULL,
bins = NULL,
breaks = NULL,
freq = TRUE
)
ppc_error_scatter(y, yrep, ..., facet_args = list(), size = 2.5, alpha = 0.8)
ppc_error_scatter_avg(y, yrep, ..., size = 2.5, alpha = 0.8)
ppc_error_scatter_avg_grouped(
y,
yrep,
group,
...,
facet_args = list(),
size = 2.5,
alpha = 0.8
)
ppc_error_scatter_avg_vs_x(y, yrep, x, ..., size = 2.5, alpha = 0.8)
ppc_error_binned(
y,
yrep,
...,
facet_args = list(),
bins = NULL,
size = 1,
alpha = 0.25
)
ppc_error_data(y, yrep, group = NULL)
A vector of observations. See Details.
An S
by N
matrix of draws from the posterior (or prior)
predictive distribution. The number of rows, S
, is the size of the
posterior (or prior) sample used to generate yrep
. The number of columns,
N
is the number of predicted observations (length(y)
). The columns of
yrep
should be in the same order as the data points in y
for the plots
to make sense. See the Details and Plot Descriptions sections for
additional advice specific to particular plots.
Currently unused.
A named list of arguments (other than facets
) passed
to ggplot2::facet_wrap()
or ggplot2::facet_grid()
to control faceting. Note: if scales
is not included in facet_args
then bayesplot may use scales="free"
as the default (depending
on the plot) instead of the ggplot2 default of scales="fixed"
.
Passed to ggplot2::geom_histogram()
to override
the default binwidth.
For ppc_error_binned()
, the number of bins to use (approximately).
Passed to ggplot2::geom_histogram()
as an
alternative to binwidth
.
For histograms, freq=TRUE
(the default) puts count on the
y-axis. Setting freq=FALSE
puts density on the y-axis. (For many
plots the y-axis text is off by default. To view the count or density
labels on the y-axis see the yaxis_text()
convenience
function.)
A grouping variable of the same length as y
.
Will be coerced to factor if not already a factor.
Each value in group
is interpreted as the group level pertaining
to the corresponding observation.
For scatterplots, arguments passed to
ggplot2::geom_point()
to control the appearance of the points. For the
binned error plot, arguments controlling the size of the outline and
opacity of the shaded region indicating the 2-SE bounds.
A numeric vector the same length as y
to use as the x-axis
variable.
A ggplot object that can be further customized using the ggplot2 package.
All of these functions (aside from the *_scatter_avg
functions)
compute and plot predictive errors for each row of the matrix yrep
, so
it is usually a good idea for yrep
to contain only a small number of
draws (rows). See Examples, below.
For binomial and Bernoulli data the ppc_error_binned()
function can be used
to generate binned error plots. Bernoulli data can be input as a vector of 0s
and 1s, whereas for binomial data y
and yrep
should contain "success"
proportions (not counts). See the Examples section, below.
ppc_error_hist()
A separate histogram is plotted for the predictive errors computed from
y
and each dataset (row) in yrep
. For this plot yrep
should have
only a small number of rows.
ppc_error_hist_grouped()
Like ppc_error_hist()
, except errors are computed within levels of a
grouping variable. The number of histograms is therefore equal to the
product of the number of rows in yrep
and the number of groups
(unique values of group
).
ppc_error_scatter()
A separate scatterplot is displayed for y
vs. the predictive errors
computed from y
and each dataset (row) in yrep
. For this plot yrep
should have only a small number of rows.
ppc_error_scatter_avg()
A single scatterplot of y
vs. the average of the errors computed from
y
and each dataset (row) in yrep
. For each individual data point
y[n]
the average error is the average of the errors for y[n]
computed
over the the draws from the posterior predictive distribution.
ppc_error_scatter_avg_vs_x()
Same as ppc_error_scatter_avg()
, except the average is plotted on the
y-axis and a predictor variable x
is plotted on the x-axis.
ppc_error_binned()
Intended for use with binomial data. A separate binned error plot (similar
to arm::binnedplot()
) is generated for each dataset (row) in yrep
. For
this plot y
and yrep
should contain proportions rather than counts,
and yrep
should have only a small number of rows.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis. Chapman & Hall/CRC Press, London, third edition. (Ch. 6)
y <- example_y_data()
yrep <- example_yrep_draws()
ppc_error_hist(y, yrep[1:3, ])
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# errors within groups
group <- example_group_data()
(p1 <- ppc_error_hist_grouped(y, yrep[1:3, ], group))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p1 + yaxis_text() # defaults to showing counts on y-axis
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# \donttest{
table(group) # more obs in GroupB, can set freq=FALSE to show density on y-axis
#> group
#> GroupA GroupB
#> 93 341
(p2 <- ppc_error_hist_grouped(y, yrep[1:3, ], group, freq = FALSE))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
p2 + yaxis_text()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# }
# scatterplots
ppc_error_scatter(y, yrep[10:14, ])
ppc_error_scatter_avg(y, yrep)
x <- example_x_data()
ppc_error_scatter_avg_vs_x(y, yrep, x)
# \dontrun{
# binned error plot with binomial model from rstanarm
suppressPackageStartupMessages(library(rstanarm))
suppressWarnings(example("example_model", package = "rstanarm"))
#>
#> exmpl_> if (.Platform$OS.type != "windows" || .Platform$r_arch != "i386") {
#> exmpl_+ example_model <-
#> exmpl_+ stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd),
#> exmpl_+ data = lme4::cbpp, family = binomial, QR = TRUE,
#> exmpl_+ # this next line is only to keep the example small in size!
#> exmpl_+ chains = 2, cores = 1, seed = 12345, iter = 1000, refresh = 0)
#> exmpl_+ example_model
#> exmpl_+ }
#> stan_glmer
#> family: binomial [logit]
#> formula: cbind(incidence, size - incidence) ~ size + period + (1 | herd)
#> observations: 56
#> ------
#> Median MAD_SD
#> (Intercept) -1.5 0.6
#> size 0.0 0.0
#> period2 -1.0 0.3
#> period3 -1.1 0.4
#> period4 -1.6 0.4
#>
#> Error terms:
#> Groups Name Std.Dev.
#> herd (Intercept) 0.8
#> Num. levels: herd 15
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
formula(example_model)
#> cbind(incidence, size - incidence) ~ size + period + (1 | herd)
# get observed proportion of "successes"
y <- example_model$y # matrix of "success" and "failure" counts
trials <- rowSums(y)
y_prop <- y[, 1] / trials # proportions
# get predicted success proportions
yrep <- posterior_predict(example_model)
yrep_prop <- sweep(yrep, 2, trials, "/")
ppc_error_binned(y_prop, yrep_prop[1:6, ])
# }