Project the posterior of the reference model onto the parameter space of a single submodel consisting of a specific combination of predictor terms or (after variable selection) onto the parameter space of a single or multiple submodels of specific sizes.
Usage
project(
object,
nterms = NULL,
solution_terms = predictor_terms,
predictor_terms = NULL,
refit_prj = TRUE,
ndraws = 400,
nclusters = NULL,
seed = NA,
verbose = getOption("projpred.verbose", as.integer(interactive())),
...
)Arguments
- object
An object which can be used as input to
get_refmodel()(in particular, objects of classrefmodel).- nterms
Only relevant if
objectis of classvsel(returned byvarsel()orcv_varsel()). Ignored if!is.null(predictor_terms). Number of terms for the submodel (the corresponding combination of predictor terms is taken fromobject). If a numeric vector, then the projection is performed for each element of this vector. IfNULL(andis.null(predictor_terms)), then the value suggested bysuggest_size()is taken (with default arguments forsuggest_size(), implying that this suggested size is based on the ELPD). Note thatntermsdoes not count the intercept, so usenterms = 0for the intercept-only model.- solution_terms
Deprecated. Please use argument
predictor_termsinstead.- predictor_terms
If not
NULL, then this needs to be a character vector of predictor terms for the submodel onto which the projection will be performed. Argumentntermsis ignored in that case. For anobjectwhich is not of classvsel,predictor_termsmust not beNULL.- refit_prj
A single logical value indicating whether to fit the submodels (again) (
TRUE) or—ifobjectis of classvsel—to re-use the submodel fits from the full-data search that was run when creatingobject(FALSE). For anobjectwhich is not of classvsel,refit_prjmust beTRUE. See also section "Details" below.- ndraws
Only relevant if
refit_prjisTRUE. Number of posterior draws to be projected. Ignored ifnclustersis notNULLor if the reference model is of classdatafit(in which case one cluster is used). If both (nclustersandndraws) areNULL, the number of posterior draws from the reference model is used forndraws. See also section "Details" below.- nclusters
Only relevant if
refit_prjisTRUE. Number of clusters of posterior draws to be projected. Ignored if the reference model is of classdatafit(in which case one cluster is used). For the meaning ofNULL, see argumentndraws. See also section "Details" below.- seed
Pseudorandom number generation (PRNG) seed by which the same results can be obtained again if needed. Passed to argument
seedofset.seed(), but can also beNAto not callset.seed()at all. If notNA, then the PRNG state is reset (to the state before callingproject()) upon exitingproject(). Here,seedis used for clustering the reference model's posterior draws (if!is.null(nclusters)) and for drawing new group-level effects when predicting from a multilevel submodel (however, not yet in case of a GAMM) and having global optionprojpred.mlvl_pred_newset toTRUE. (Such a prediction takes place when calculating output elementsdisandce.)- verbose
A single integer value from the set \(\{0, 1, 2\}\) (if
!is.null(predictor_terms), \(1\) and \(2\) have the same effect), indicating how much information (if any) to print out during the computations. Higher values indicate that more information should be printed,0deactivates the verbose mode. Internally, argumentverboseis coerced to integer viaas.integer(), so technically, a single logical value or a single numeric value work as well.- ...
Arguments passed to
get_refmodel()(ifget_refmodel()is actually used; see argumentobject) as well as to the divergence minimizer (ifrefit_prjisTRUE).
Value
If the projection is performed onto a single submodel (i.e.,
length(nterms) == 1 || !is.null(predictor_terms)), an object of class
projection which is a list containing the following elements:
disProjected draws for the dispersion parameter.
ceThe cross-entropy part of the Kullback-Leibler (KL) divergence from the reference model to the submodel. For some families, this is not the actual cross-entropy, but a reduced one where terms which would cancel out when calculating the KL divergence have been dropped. In case of the Gaussian family, that reduced cross-entropy is further modified, yielding merely a proxy.
wdraws_prjWeights for the projected draws.
predictor_termsA character vector of the submodel's predictor terms.
outdminA
listcontaining the submodel fits (one fit per projected draw). This is the same as the return value of thediv_minimizerfunction (seeinit_refmodel()), except ifproject()was used with anobjectof classvselbased on an L1 search as well as withrefit_prj = FALSE, in which case this is the output from an internal L1-penalized divergence minimizer.cl_refA numeric vector of length equal to the number of posterior draws in the reference model, containing the cluster indices of these draws.
wdraws_refA numeric vector of length equal to the number of posterior draws in the reference model, giving the weights of these draws. These weights should be treated as not being normalized (i.e., they don't necessarily sum to
1).const_wdraws_prjA single logical value indicating whether the projected draws have constant weights (
TRUE) or not (FALSE).refmodelThe reference model object.
If the projection is performed onto more than one submodel, the output from
above is returned for each submodel, giving a list with one element for
each submodel.
The elements of an object of class projection are not meant to be
accessed directly but instead via helper functions (see the main vignette
and projpred-package; see also as_draws_matrix.projection(), argument
return_draws_matrix of proj_linpred(), and argument
nresample_clusters of proj_predict() for the intended use of the
weights stored in element wdraws_prj).
Details
Arguments ndraws and nclusters are automatically truncated at
the number of posterior draws in the reference model (which is 1 for
datafits). Using less draws or clusters in ndraws or nclusters than
posterior draws in the reference model may result in slightly inaccurate
projection performance. Increasing these arguments affects the computation
time linearly.
If refit_prj = FALSE (which is only possible if object is of class
vsel), project() retrieves the submodel fits from the full-data search
that was run when creating object. Usually, the search relies on a rather
coarse clustering or thinning of the reference model's posterior draws (by
default, varsel() and cv_varsel() use nclusters = 20). Consequently,
project() with refit_prj = FALSE then inherits this coarse clustering
or thinning.
Examples
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The `stanreg` fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Run varsel() (here without cross-validation, with L1 search, and with small
# values for `nterms_max` and `nclusters_pred`, but only for the sake of
# speed in this example; this is not recommended in general):
vs <- varsel(fit, method = "L1", nterms_max = 3, nclusters_pred = 10,
seed = 5555)
# Projection onto the best submodel with 2 predictor terms (with a small
# value for `nclusters`, but only for the sake of speed in this example;
# this is not recommended in general):
prj_from_vs <- project(vs, nterms = 2, nclusters = 10, seed = 9182)
# Projection onto an arbitrary combination of predictor terms (with a small
# value for `nclusters`, but only for the sake of speed in this example;
# this is not recommended in general):
prj <- project(fit, predictor_terms = c("X1", "X3", "X5"), nclusters = 10,
seed = 9182)