Parameter selection in the style of dplyr and other tidyverse packages.
param_range(prefix, range, vars = NULL)
param_glue(pattern, ..., vars = NULL)
For param_range()
only, prefix
is a string naming a
parameter and range
is an integer vector providing the indices of a
subset of elements to select. For example, using
would select parameters named beta[1]
, beta[2]
, and beta[8]
.
param_range()
is only designed for the case that the indices are integers
surrounded by brackets. If there are no brackets use
num_range().
NULL
or a character vector of parameter names to choose from.
This is only needed for the atypical use case of calling the function as a
standalone function outside of vars()
, select()
, etc. Typically this is
left as NULL
and will be set automatically for the user.
For param_glue()
only, pattern
is a string containing
expressions enclosed in braces and ...
should be named arguments
providing one character vector per expression in braces in pattern
. It is
easiest to describe how to use these arguments with an example:
would select parameters with names
"beta_age[3]"
, "beta_income[3]"
, "beta_age[8]"
, "beta_income[8]"
.
As of version 1.7.0
, bayesplot allows the pars
argument for MCMC plots to use "tidy" variable selection (in the
style of the dplyr package). The vars()
function is
re-exported from dplyr for this purpose.
Features of tidy selection includes direct selection (vars(alpha, sigma)
),
everything-but selection (vars(-alpha)
), ranged selection
(vars(`beta[1]`:`beta[3]`)
), support for selection functions
(vars(starts_with("beta"))
), and combinations of these features. See the
Examples section, below.
When using pars
for tidy parameter selection, the regex_pars
argument is
ignored because bayesplot supports using tidyselect helper functions (starts_with()
, contains()
,
num_range()
, etc.) for the same purpose. bayesplot also exports some
additional helper functions to help with parameter selection:
param_range()
: like num_range()
but used
when parameter indexes are in brackets (e.g. beta[2]
).
param_glue()
: for more complicated parameter names with multiple
indexes (including variable names) inside the brackets
(e.g., beta[(Intercept) age_group:3]
).
These functions can be used inside of vars()
, dplyr::select()
,
and similar functions, just like the
tidyselect helper functions.
Parameter names in vars()
are not quoted. When the names contain special
characters like brackets, they should be wrapped in backticks, as in
vars(`beta[1]`)
.
To exclude a range of variables, wrap the sequence in parentheses and then
negate it. For example, (vars(-(`beta[1]`:`beta[3]`))
) would exclude
beta[1]
, beta[2]
, and beta[3]
.
vars()
is a helper function. It holds onto the names and expressions used
to select columns. When selecting variables inside a bayesplot
function, use vars(...)
: mcmc_hist(data, pars = vars(alpha))
. When
using select()
to prepare a dataframe for a bayesplot function, do
not use vars()
: data %>% select(alpha) %>% mcmc_hist()
.
Internally, tidy selection works by converting names and expressions
into position numbers. As a result, integers will select parameters;
vars(1, 3)
selects the first and third ones. We do not endorse this
approach because positions might change as variables are added and
removed from models. To select a parameter that happens to be called 1
,
use backticks to escape it vars(`1`)
.
x <- example_mcmc_draws(params = 6)
dimnames(x)
#> $Iteration
#> NULL
#>
#> $Chain
#> [1] "chain:1" "chain:2" "chain:3" "chain:4"
#>
#> $Parameter
#> [1] "alpha" "sigma" "beta[1]" "beta[2]" "beta[3]" "beta[4]"
#>
mcmc_hex(x, pars = vars(alpha, `beta[2]`))
mcmc_dens(x, pars = vars(sigma, contains("beta")))
mcmc_hist(x, pars = vars(-contains("beta")))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# using the param_range() helper
mcmc_hist(x, pars = vars(param_range("beta", c(1, 3, 4))))
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# \donttest{
#############################
## Examples using rstanarm ##
#############################
if (requireNamespace("rstanarm", quietly = TRUE)) {
# see ?rstanarm::example_model
fit <- example("example_model", package = "rstanarm", local=TRUE)$value
print(fit)
posterior <- as.data.frame(fit)
str(posterior)
color_scheme_set("brightblue")
mcmc_hist(posterior, pars = vars(size, contains("period")))
# same as previous but using dplyr::select() and piping
library("dplyr")
posterior %>%
select(size, contains("period")) %>%
mcmc_hist()
mcmc_intervals(posterior, pars = vars(contains("herd")))
mcmc_intervals(posterior, pars = vars(contains("herd"), -contains("Sigma")))
bayesplot_theme_set(ggplot2::theme_dark())
color_scheme_set("viridisC")
mcmc_areas_ridges(posterior, pars = vars(starts_with("b[")))
bayesplot_theme_set()
color_scheme_set("purple")
not_789 <- vars(starts_with("b["), -matches("[7-9]"))
mcmc_intervals(posterior, pars = not_789)
# using the param_glue() helper
just_149 <- vars(param_glue("b[(Intercept) herd:{level}]", level = c(1,4,9)))
mcmc_intervals(posterior, pars = just_149)
# same but using param_glue() with dplyr::select()
# before passing to bayesplot
posterior %>%
select(param_glue("b[(Intercept) herd:{level}]",
level = c(1, 4, 9))) %>%
mcmc_intervals()
}
#>
#> exmpl_> if (.Platform$OS.type != "windows" || .Platform$r_arch != "i386") {
#> exmpl_+ example_model <-
#> exmpl_+ stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd),
#> exmpl_+ data = lme4::cbpp, family = binomial, QR = TRUE,
#> exmpl_+ # this next line is only to keep the example small in size!
#> exmpl_+ chains = 2, cores = 1, seed = 12345, iter = 1000, refresh = 0)
#> exmpl_+ example_model
#> exmpl_+ }
#> stan_glmer
#> family: binomial [logit]
#> formula: cbind(incidence, size - incidence) ~ size + period + (1 | herd)
#> observations: 56
#> ------
#> Median MAD_SD
#> (Intercept) -1.5 0.6
#> size 0.0 0.0
#> period2 -1.0 0.3
#> period3 -1.1 0.4
#> period4 -1.6 0.4
#>
#> Error terms:
#> Groups Name Std.Dev.
#> herd (Intercept) 0.8
#> Num. levels: herd 15
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
#> stan_glmer
#> family: binomial [logit]
#> formula: cbind(incidence, size - incidence) ~ size + period + (1 | herd)
#> observations: 56
#> ------
#> Median MAD_SD
#> (Intercept) -1.5 0.6
#> size 0.0 0.0
#> period2 -1.0 0.3
#> period3 -1.1 0.4
#> period4 -1.6 0.4
#>
#> Error terms:
#> Groups Name Std.Dev.
#> herd (Intercept) 0.8
#> Num. levels: herd 15
#>
#> ------
#> * For help interpreting the printed output see ?print.stanreg
#> * For info on the priors used see ?prior_summary.stanreg
#> 'data.frame': 1000 obs. of 21 variables:
#> $ (Intercept) : num -1.47 -1.78 -1.72 -1.6 -1.53 ...
#> $ size : num 0.00888 0.01476 0.01494 0.01283 -0.01193 ...
#> $ period2 : num -0.667 -0.684 -0.883 -0.924 -0.825 ...
#> $ period3 : num -1.2 -1.05 -1.26 -1.34 -0.66 ...
#> $ period4 : num -1.89 -1.67 -1.46 -1.83 -1.44 ...
#> $ b[(Intercept) herd:1] : num 0.297 0.562 1.261 0.392 0.7 ...
#> $ b[(Intercept) herd:2] : num -0.451 -0.1626 -0.8416 -0.8944 0.0944 ...
#> $ b[(Intercept) herd:3] : num -0.211 -0.3071 0.0957 0.5282 0.5645 ...
#> $ b[(Intercept) herd:4] : num -0.109 0.213 -0.212 0.384 -0.357 ...
#> $ b[(Intercept) herd:5] : num -0.297 -0.17 -1.254 -0.591 0.155 ...
#> $ b[(Intercept) herd:6] : num -0.1702 0.0193 -0.7049 -0.6454 -0.1529 ...
#> $ b[(Intercept) herd:7] : num 1.702 1.663 1.069 0.75 0.968 ...
#> $ b[(Intercept) herd:8] : num -0.313 0.287 0.861 0.466 1.233 ...
#> $ b[(Intercept) herd:9] : num -0.783 -0.088 0.871 -0.407 -0.296 ...
#> $ b[(Intercept) herd:10] : num -0.6759 -1.0561 -0.8712 -0.0397 -0.7307 ...
#> $ b[(Intercept) herd:11] : num -0.541 -0.688 -0.137 -0.13 0.273 ...
#> $ b[(Intercept) herd:12] : num -0.2329 -0.1093 0.5054 0.0264 -0.3146 ...
#> $ b[(Intercept) herd:13] : num -1.411 -1.178 -0.714 -0.927 -0.386 ...
#> $ b[(Intercept) herd:14] : num 1.479 1.682 1.474 1.065 0.391 ...
#> $ b[(Intercept) herd:15] : num -0.691 -1.16 -0.526 -0.629 -0.484 ...
#> $ Sigma[herd:(Intercept),(Intercept)]: num 1.24 1.588 1.993 0.832 0.312 ...
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
# }
# \dontrun{
###################################
## More examples of param_glue() ##
###################################
library(dplyr)
posterior <- tibble(
b_Intercept = rnorm(1000),
sd_condition__Intercept = rexp(1000),
sigma = rexp(1000),
`r_condition[A,Intercept]` = rnorm(1000),
`r_condition[B,Intercept]` = rnorm(1000),
`r_condition[C,Intercept]` = rnorm(1000),
`r_condition[A,Slope]` = rnorm(1000),
`r_condition[B,Slope]` = rnorm(1000)
)
posterior
#> # A tibble: 1,000 × 8
#> b_Intercept sd_condition__Intercept sigma `r_condition[A,Intercept]`
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1.29 0.540 0.300 -0.780
#> 2 1.80 0.124 0.426 -0.110
#> 3 0.588 3.73 0.630 0.0687
#> 4 -1.27 0.623 1.19 -0.223
#> 5 -0.695 0.696 0.322 0.809
#> 6 -0.0812 0.112 0.731 0.0854
#> 7 -0.246 0.781 2.84 0.125
#> 8 1.62 2.40 1.59 -2.54
#> 9 -0.111 0.441 0.896 -0.244
#> 10 -0.846 0.764 1.85 0.798
#> # ℹ 990 more rows
#> # ℹ 4 more variables: `r_condition[B,Intercept]` <dbl>,
#> # `r_condition[C,Intercept]` <dbl>, `r_condition[A,Slope]` <dbl>,
#> # `r_condition[B,Slope]` <dbl>
# using one expression in braces
posterior %>%
select(
param_glue("r_condition[{level},Intercept]", level = c("A", "B"))
) %>%
mcmc_hist()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# using multiple expressions in braces
posterior %>%
select(
param_glue(
"r_condition[{level},{type}]",
level = c("A", "B"),
type = c("Intercept", "Slope"))
) %>%
mcmc_hist()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# }