We can easily customize the summary statistics reported by
$summary()
and $print()
.
fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
Warning: 124 of 4000 (3.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
Warning: 1 of 4 chains had an E-BFMI less than 0.3.
See https://mc-stan.org/misc/warnings for details.
fit$summary()
variable mean median sd mad q5 q95
1 lp__ -58.697829 -58.911800 5.056327 5.134318 -66.6875850 -49.89660
2 mu 6.353763 6.597480 4.381424 4.372565 -1.2281605 13.23484
3 tau 5.604807 4.871790 3.537844 3.371217 1.4006485 12.28454
4 theta[1] 9.416434 8.789030 7.469694 6.297677 -1.7534945 22.77244
5 theta[2] 6.708629 6.756395 5.775008 5.428570 -2.7675325 16.09375
6 theta[3] 5.013658 5.633410 7.082039 6.055554 -7.5991140 15.69620
7 theta[4] 6.479275 6.629210 6.113995 5.602212 -3.7474250 16.47548
8 theta[5] 4.415845 4.751705 5.837104 5.557445 -5.7668200 13.46467
9 theta[6] 5.316444 5.776025 6.132648 5.695771 -5.1632545 14.89951
10 theta[7] 9.159755 8.682990 6.276413 5.676549 -0.3210604 20.38977
11 theta[8] 7.114358 7.154305 6.994382 6.108230 -4.2959180 18.25591
rhat ess_bulk ess_tail
1 1.018582 172.4200 135.9782
2 1.008425 635.3354 1168.1366
3 1.021625 165.4672 120.1282
4 1.003830 1073.3788 1319.6507
5 1.004990 1050.8445 1693.6799
6 1.007712 871.0003 1537.3018
7 1.003560 1027.3246 1869.0154
8 1.007430 780.1467 1587.7245
9 1.007162 917.8167 1453.6511
10 1.003741 956.3136 1718.3496
11 1.002369 1338.3238 1986.1357
By default all variables are summaries with the follow functions:
posterior::default_summary_measures()
[1] "mean" "median" "sd" "mad" "quantile2"
To change the variables summarized, we use the variables argument
fit$summary(variables = c("mu", "tau"))
variable mean median sd mad q5 q95 rhat
1 mu 6.353763 6.59748 4.381424 4.372565 -1.228160 13.23484 1.008425
2 tau 5.604807 4.87179 3.537844 3.371217 1.400649 12.28454 1.021625
ess_bulk ess_tail
1 635.3354 1168.1366
2 165.4672 120.1282
We can additionally change which functions are used
fit$summary(variables = c("mu", "tau"), mean, sd)
variable mean sd
1 mu 6.353763 4.381424
2 tau 5.604807 3.537844
To summarize all variables with non-default functions, it is
necessary to set explicitly set the variables argument, either to
NULL
or the full vector of variable names.
fit$metadata()$model_params
[1] "lp__" "mu" "tau" "theta[1]" "theta[2]" "theta[3]"
[7] "theta[4]" "theta[5]" "theta[6]" "theta[7]" "theta[8]"
fit$summary(variables = NULL, "mean", "median")
variable mean median
1 lp__ -58.697829 -58.911800
2 mu 6.353763 6.597480
3 tau 5.604807 4.871790
4 theta[1] 9.416434 8.789030
5 theta[2] 6.708629 6.756395
6 theta[3] 5.013658 5.633410
7 theta[4] 6.479275 6.629210
8 theta[5] 4.415845 4.751705
9 theta[6] 5.316444 5.776025
10 theta[7] 9.159755 8.682990
11 theta[8] 7.114358 7.154305
Summary functions can be specified by character string, function, or
using a formula (or anything else supported by
rlang::as_function()
). If these arguments are named, those
names will be used in the tibble output. If the summary results are
named they will take precedence.
my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
c("mu", "tau"),
MEAN = mean,
"median",
my_sd,
~quantile(.x, probs = c(0.1, 0.9)),
Minimum = function(x) min(x)
)
variable MEAN median My_SD 10% 90% Minimum
1 mu 6.353763 6.59748 4.381424 0.756247 11.71914 -9.340540
2 tau 5.604807 4.87179 3.537844 1.723530 10.46083 0.749924
Arguments to all summary functions can also be specified with
.args
.
variable 2.5% 5% 95% 97.5%
1 mu -2.926103 -1.228160 13.23484 14.96275
2 tau 1.106000 1.400649 12.28454 14.09220
The summary functions are applied to the array of sample values, with
dimension iter_sampling
xchains
.
fit$summary(variables = NULL, dim, colMeans)
variable dim.1 dim.2 1 2 3 4
1 lp__ 1000 4 -57.746884 -58.396713 -59.488281 -59.159440
2 mu 1000 4 6.886773 6.098293 6.275602 6.154385
3 tau 1000 4 4.969696 5.504146 6.035791 5.909594
4 theta[1] 1000 4 9.584919 8.855289 9.788069 9.437459
5 theta[2] 1000 4 7.218764 6.441512 6.701321 6.472920
6 theta[3] 1000 4 6.039431 4.880572 4.532884 4.601744
7 theta[4] 1000 4 6.923818 6.048362 6.708068 6.236851
8 theta[5] 1000 4 5.339809 4.243510 3.959413 4.120650
9 theta[6] 1000 4 5.971513 4.851822 5.284337 5.158102
10 theta[7] 1000 4 9.310560 8.904282 9.350879 9.073298
11 theta[8] 1000 4 7.380020 7.069916 7.245772 6.761726
For this reason users may have unexpected results if they use
stats::var()
directly, as it will return a covariance
matrix. An alternative is the distributional::variance()
function, which can also be accessed via
posterior::variance()
.
variable posterior::variance ~var(as.vector(.x))
1 mu 19.19687 19.19687
2 tau 12.51634 12.51634
Summary functions need not be numeric, but these won’t work with
$print()
.
strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
variable Strictly Positive
1 lp__ no
2 mu no
3 tau yes
4 theta[1] no
5 theta[2] no
6 theta[3] no
7 theta[4] no
8 theta[5] no
9 theta[6] no
10 theta[7] no
11 theta[8] no
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)
For more information, see posterior::summarise_draws()
,
which is called by $summary()
.
The $draws()
method can be used to extract the posterior draws in formats provided by
the posterior
package. Here we demonstrate only the draws_array
and
draws_df
formats, but the posterior
package supports other useful formats as well.
# default is a 3-D draws_array object from the posterior package
# iterations x chains x variables
draws_arr <- fit$draws() # or format="array"
str(draws_arr)
'draws_array' num [1:1000, 1:4, 1:11] -66.1 -61.6 -58.3 -57.5 -55.3 ...
- attr(*, "dimnames")=List of 3
..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
..$ chain : chr [1:4] "1" "2" "3" "4"
..$ variable : chr [1:11] "lp__" "mu" "tau" "theta[1]" ...
# draws x variables data frame
draws_df <- fit$draws(format = "df")
str(draws_df)
draws_df [4,000 × 14] (S3: draws_df/draws/tbl_df/tbl/data.frame)
$ lp__ : num [1:4000] -66.1 -61.6 -58.3 -57.5 -55.3 ...
$ mu : num [1:4000] 1.93 8.57 4.05 9.53 9.94 ...
$ tau : num [1:4000] 10.73 6.28 6.1 4.65 3.2 ...
$ theta[1] : num [1:4000] 9.1 21.74 11.18 13.36 9.68 ...
$ theta[2] : num [1:4000] 11.006 5.842 -0.815 17.943 11.148 ...
$ theta[3] : num [1:4000] 0.4 12.17 7.94 6.17 3.94 ...
$ theta[4] : num [1:4000] -3.74 14.86 6.59 9.74 7.95 ...
$ theta[5] : num [1:4000] 12.97 5.5 1.32 8.77 12.83 ...
$ theta[6] : num [1:4000] -11.04 1.81 6.18 6.33 6.56 ...
$ theta[7] : num [1:4000] 22.21 10.53 7.75 14.3 11.54 ...
$ theta[8] : num [1:4000] 14.77 -2.98 11.19 7.25 10.37 ...
$ .chain : int [1:4000] 1 1 1 1 1 1 1 1 1 1 ...
$ .iteration: int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
$ .draw : int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
print(draws_df)
# A draws_df: 1000 iterations, 4 chains, and 11 variables
lp__ mu tau theta[1] theta[2] theta[3] theta[4] theta[5]
1 -66 1.9 10.7 9.1 11.01 0.4 -3.7 13.0
2 -62 8.6 6.3 21.7 5.84 12.2 14.9 5.5
3 -58 4.1 6.1 11.2 -0.82 7.9 6.6 1.3
4 -58 9.5 4.6 13.4 17.94 6.2 9.7 8.8
5 -55 9.9 3.2 9.7 11.15 3.9 7.9 12.8
6 -55 9.9 4.0 8.5 8.81 6.6 9.8 12.0
7 -54 9.7 2.7 14.5 10.87 5.8 6.8 10.2
8 -52 8.7 2.2 10.5 10.42 9.4 9.8 9.4
9 -57 10.1 5.0 13.4 10.29 11.4 5.8 12.0
10 -58 10.5 5.5 14.8 9.45 13.9 5.6 12.0
# ... with 3990 more draws, and 3 more variables
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
To convert an existing draws object to a different format use the
posterior::as_draws_*()
functions.
To manipulate the draws
objects use the various methods
described in the posterior package vignettes
and documentation.
rstan::extract()
The posterior package’s rvar
format
provides a multidimensional, sample-based representation of random
variables. See https://mc-stan.org/posterior/articles/rvar.html for
details. In addition to being useful in its own right, this format also
allows CmdStanR users to obtain draws in a similar format to
rstan::extract()
.
Suppose we have a parameter matrix[2,3] x
. The
rvar
format lets you interact with x
as if
it’s a 2 x 3
matrix and automatically applies operations
over the many posterior draws of x
. To instead directly
access the draws of x
while maintaining the structure of
the matrix use posterior::draws_of()
. For example:
draws <- posterior::as_draws_rvars(fit$draws())
x_rvar <- draws$x
x_array <- posterior::draws_of(draws$x)
The object x_rvar
will be an rvar
that can
be used like a 2 x 3
matrix, with the draws handled behind
the scenes. The object x_array
will be a
4000 x 2 x 3
array (assuming 4000
posterior
draws), which is the same as it would be after being extracted from the
list returned by rstan::extract()
.