We can easily customize the summary statistics reported by
$summary()
and $print()
.
fit <- cmdstanr::cmdstanr_example("schools", method = "sample")
Warning: 145 of 4000 (4.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.
fit$summary()
variable mean median sd mad q5 q95
1 lp__ -58.189615 -58.233950 4.983507 5.199033 -66.09672000 -49.48569
2 mu 6.770868 6.720615 4.160694 4.104430 0.05392642 13.50031
3 tau 5.360230 4.524270 3.536662 3.128657 1.41195350 12.01772
4 theta[1] 9.538150 8.703025 6.992732 6.139536 -0.46183645 21.76395
5 theta[2] 7.037750 7.084940 5.672420 5.463211 -1.89471350 16.19477
6 theta[3] 5.865357 5.940265 6.567071 5.598943 -4.94042600 15.87788
7 theta[4] 7.096898 7.021110 5.846662 5.507303 -2.11556450 16.63653
8 theta[5] 4.905340 5.104415 5.567945 5.102183 -4.69825650 13.42720
9 theta[6] 5.743860 5.979900 5.965230 5.476702 -4.29255000 15.06765
10 theta[7] 9.332145 8.785955 5.911919 5.517199 0.73314065 20.11738
11 theta[8] 7.283320 7.100935 6.510593 5.937568 -3.18226350 18.23657
rhat ess_bulk ess_tail
1 1.029985 154.8234 166.9326
2 1.004450 669.4002 1293.6704
3 1.029747 147.4551 128.0359
4 1.006213 1068.8761 2337.7828
5 1.002627 1067.0310 2241.5417
6 1.004902 1217.1050 1912.7529
7 1.002033 1030.8586 1665.2932
8 1.009740 637.5400 1725.4378
9 1.003866 1129.6591 2064.6751
10 1.003642 989.5674 1661.8675
11 1.005570 1163.3998 1970.5629
By default all variables are summaries with the follow functions:
posterior::default_summary_measures()
[1] "mean" "median" "sd" "mad" "quantile2"
To change the variables summarized, we use the variables argument
fit$summary(variables = c("mu", "tau"))
variable mean median sd mad q5 q95 rhat
1 mu 6.770868 6.720615 4.160694 4.104430 0.05392642 13.50031 1.004450
2 tau 5.360230 4.524270 3.536662 3.128657 1.41195350 12.01772 1.029747
ess_bulk ess_tail
1 669.4002 1293.6704
2 147.4551 128.0359
We can additionally change which functions are used
fit$summary(variables = c("mu", "tau"), mean, sd)
variable mean sd
1 mu 6.770868 4.160694
2 tau 5.360230 3.536662
To summarize all variables with non-default functions, it is
necessary to set explicitly set the variables argument, either to
NULL
or the full vector of variable names.
fit$metadata()$model_params
[1] "lp__" "mu" "tau" "theta[1]" "theta[2]" "theta[3]"
[7] "theta[4]" "theta[5]" "theta[6]" "theta[7]" "theta[8]"
fit$summary(variables = NULL, "mean", "median")
variable mean median
1 lp__ -58.189615 -58.233950
2 mu 6.770868 6.720615
3 tau 5.360230 4.524270
4 theta[1] 9.538150 8.703025
5 theta[2] 7.037750 7.084940
6 theta[3] 5.865357 5.940265
7 theta[4] 7.096898 7.021110
8 theta[5] 4.905340 5.104415
9 theta[6] 5.743860 5.979900
10 theta[7] 9.332145 8.785955
11 theta[8] 7.283320 7.100935
Summary functions can be specified by character string, function, or
using a formula (or anything else supported by
rlang::as_function()
). If these arguments are named, those
names will be used in the tibble output. If the summary results are
named they will take precedence.
my_sd <- function(x) c(My_SD = sd(x))
fit$summary(
c("mu", "tau"),
MEAN = mean,
"median",
my_sd,
~quantile(.x, probs = c(0.1, 0.9)),
Minimum = function(x) min(x)
)
variable MEAN median My_SD 10% 90% Minimum
1 mu 6.770868 6.720615 4.160694 1.577812 12.27689 -7.905940
2 tau 5.360230 4.524270 3.536662 1.751609 10.20606 0.836362
Arguments to all summary functions can also be specified with
.args
.
variable 2.5% 5% 95% 97.5%
1 mu -1.320617 0.05392642 13.50031 14.99084
2 tau 1.176680 1.41195350 12.01772 13.99329
The summary functions are applied to the array of sample values, with
dimension iter_sampling
xchains
.
fit$summary(variables = NULL, dim, colMeans)
variable dim.1 dim.2 1 2 3 4
1 lp__ 1000 4 -59.512825 -57.501953 -58.519416 -57.224267
2 mu 1000 4 6.687580 7.323996 6.485391 6.586506
3 tau 1000 4 6.113202 4.991316 5.617574 4.718827
4 theta[1] 1000 4 9.845526 10.063756 9.480842 8.762475
5 theta[2] 1000 4 7.109487 7.340212 6.950864 6.750437
6 theta[3] 1000 4 5.614556 6.465821 5.368069 6.012981
7 theta[4] 1000 4 7.095362 7.575608 6.862193 6.854430
8 theta[5] 1000 4 4.335314 5.915169 4.347114 5.023764
9 theta[6] 1000 4 5.495814 6.275650 5.270746 5.933231
10 theta[7] 1000 4 9.739179 9.685921 9.144607 8.758873
11 theta[8] 1000 4 7.154340 7.740947 7.134879 7.103115
For this reason users may have unexpected results if they use
stats::var()
directly, as it will return a covariance
matrix. An alternative is the distributional::variance()
function, which can also be accessed via
posterior::variance()
.
variable posterior::variance ~var(as.vector(.x))
1 mu 17.31137 17.31137
2 tau 12.50798 12.50798
Summary functions need not be numeric, but these won’t work with
$print()
.
strict_pos <- function(x) if (all(x > 0)) "yes" else "no"
fit$summary(variables = NULL, "Strictly Positive" = strict_pos)
variable Strictly Positive
1 lp__ no
2 mu no
3 tau yes
4 theta[1] no
5 theta[2] no
6 theta[3] no
7 theta[4] no
8 theta[5] no
9 theta[6] no
10 theta[7] no
11 theta[8] no
# fit$print(variables = NULL, "Strictly Positive" = strict_pos)
For more information, see posterior::summarise_draws()
,
which is called by $summary()
.
The $draws()
method can be used to extract the posterior draws in formats provided by
the posterior
package. Here we demonstrate only the draws_array
and
draws_df
formats, but the posterior
package supports other useful formats as well.
# default is a 3-D draws_array object from the posterior package
# iterations x chains x variables
draws_arr <- fit$draws() # or format="array"
str(draws_arr)
'draws_array' num [1:1000, 1:4, 1:11] -60.3 -61 -59.5 -58.6 -64.2 ...
- attr(*, "dimnames")=List of 3
..$ iteration: chr [1:1000] "1" "2" "3" "4" ...
..$ chain : chr [1:4] "1" "2" "3" "4"
..$ variable : chr [1:11] "lp__" "mu" "tau" "theta[1]" ...
# draws x variables data frame
draws_df <- fit$draws(format = "df")
str(draws_df)
draws_df [4,000 × 14] (S3: draws_df/draws/tbl_df/tbl/data.frame)
$ lp__ : num [1:4000] -60.3 -61 -59.5 -58.6 -64.2 ...
$ mu : num [1:4000] 13.44 3.8 5.99 5.51 10.76 ...
$ tau : num [1:4000] 5.55 7.85 6.33 5.48 8.88 ...
$ theta[1] : num [1:4000] 18.07 3.52 12.12 4.46 13.9 ...
$ theta[2] : num [1:4000] 16.476 -0.445 4.075 3.544 4.102 ...
$ theta[3] : num [1:4000] 11.7279 5.8634 -0.0525 -0.6551 3.4664 ...
$ theta[4] : num [1:4000] 14.84 3.94 -1.12 6.6 -2.9 ...
$ theta[5] : num [1:4000] 12.108 0.652 -2.931 7.275 -6.486 ...
$ theta[6] : num [1:4000] 7.16 9.55 9.69 3.59 6.77 ...
$ theta[7] : num [1:4000] 11.31 11.43 4.64 11.3 5.55 ...
$ theta[8] : num [1:4000] 24.13 -7.81 9.13 -3.8 20.05 ...
$ .chain : int [1:4000] 1 1 1 1 1 1 1 1 1 1 ...
$ .iteration: int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
$ .draw : int [1:4000] 1 2 3 4 5 6 7 8 9 10 ...
print(draws_df)
# A draws_df: 1000 iterations, 4 chains, and 11 variables
lp__ mu tau theta[1] theta[2] theta[3] theta[4] theta[5]
1 -60 13.4 5.6 18.1 16.48 11.728 14.8 12.11
2 -61 3.8 7.9 3.5 -0.45 5.863 3.9 0.65
3 -59 6.0 6.3 12.1 4.08 -0.053 -1.1 -2.93
4 -59 5.5 5.5 4.5 3.54 -0.655 6.6 7.28
5 -64 10.8 8.9 13.9 4.10 3.466 -2.9 -6.49
6 -61 6.6 11.3 19.3 10.54 6.469 5.6 -3.64
7 -49 7.2 2.1 5.6 6.76 6.771 7.5 7.51
8 -53 6.8 1.5 6.1 6.97 7.150 3.8 8.49
9 -53 6.8 1.5 6.1 6.97 7.150 3.8 8.49
10 -56 6.5 3.8 4.8 3.31 9.891 8.5 4.73
# ... with 3990 more draws, and 3 more variables
# ... hidden reserved variables {'.chain', '.iteration', '.draw'}
To convert an existing draws object to a different format use the
posterior::as_draws_*()
functions.
To manipulate the draws
objects use the various methods
described in the posterior package vignettes
and documentation.
rstan::extract()
The posterior package’s rvar
format
provides a multidimensional, sample-based representation of random
variables. See https://mc-stan.org/posterior/articles/rvar.html for
details. In addition to being useful in its own right, this format also
allows CmdStanR users to obtain draws in a similar format to
rstan::extract()
.
Suppose we have a parameter matrix[2,3] x
. The
rvar
format lets you interact with x
as if
it’s a 2 x 3
matrix and automatically applies operations
over the many posterior draws of x
. To instead directly
access the draws of x
while maintaining the structure of
the matrix use posterior::draws_of()
. For example:
draws <- posterior::as_draws_rvars(fit$draws())
x_rvar <- draws$x
x_array <- posterior::draws_of(draws$x)
The object x_rvar
will be an rvar
that can
be used like a 2 x 3
matrix, with the draws handled behind
the scenes. The object x_array
will be a
4000 x 2 x 3
array (assuming 4000
posterior
draws), which is the same as it would be after being extracted from the
list returned by rstan::extract()
.