After the projection of the reference model onto a submodel, the linear
predictors (for the original or a new dataset) based on that submodel can be
calculated by proj_linpred(). These linear predictors can also be
transformed to response scale and averaged across the projected parameter
draws. Furthermore, proj_linpred() returns the corresponding log predictive
density values if the (original or new) dataset contains response values. The
proj_predict() function draws from the predictive distributions (there is
one such distribution for each observation from the original or new dataset)
of the submodel that the reference model has been projected onto. If the
projection has not been performed yet, both functions call project()
internally to perform the projection. Both functions can also handle multiple
submodels at once (for objects of class vsel or objects returned by a
project() call to an object of class vsel; see project()).
Usage
proj_linpred(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
transform = FALSE,
integrated = FALSE,
allow_nonconst_wdraws_prj = return_draws_matrix,
return_draws_matrix = FALSE,
.seed = NA,
...
)
proj_predict(
object,
newdata = NULL,
offsetnew = NULL,
weightsnew = NULL,
filter_nterms = NULL,
nresample_clusters = 1000,
return_draws_matrix = FALSE,
.seed = NA,
resp_oscale = TRUE,
...
)Arguments
- object
An object returned by
project()or an object that can be passed to argumentobjectofproject().- newdata
Passed to argument
newdataof the reference model'sextract_model_datafunction (seeinit_refmodel()). Provides the predictor (and possibly also the response) data for the new (or old) observations. May also beNULLfor using the original dataset. If notNULL, anyNAs will trigger an error.- offsetnew
Passed to argument
orhsof the reference model'sextract_model_datafunction (seeinit_refmodel()). Used to get the offsets for the new (or old) observations.- weightsnew
Passed to argument
wrhsof the reference model'sextract_model_datafunction (seeinit_refmodel()). Used to get the weights for the new (or old) observations.- filter_nterms
Only applies if
objectis an object returned byproject(). In that case,filter_ntermscan be used to filterobjectfor only those elements (submodels) with a number of predictor terms infilter_nterms. Therefore, needs to be a numeric vector orNULL. IfNULL, use all submodels.- transform
For
proj_linpred()only. A single logical value indicating whether the linear predictor should be transformed to response scale using the inverse-link function (TRUE) or not (FALSE). In case of the latent projection, argumenttransformis similar in spirit to argumentresp_oscalefrom other functions and affects the scale of both output elementspredandlpd(see sections "Details" and "Value" below).- integrated
For
proj_linpred()only. A single logical value indicating whether the output should be averaged across the projected posterior draws (TRUE) or not (FALSE).- allow_nonconst_wdraws_prj
Only relevant for
proj_linpred()and only ifintegratedisFALSE. A single logical value indicating whether to allow projected draws with different (i.e., nonconstant) weights (TRUE) or not (FALSE). Ifreturn_draws_matrixisTRUE,allow_nonconst_wdraws_prjis internally set toTRUEas well. CAUTION: Expert use only because if set toTRUE, the weights of the projected draws are stored in attributeswdraws_prjand handling these attributes requires special care (e.g., when subsetting the returned matrices).- return_draws_matrix
A single logical value indicating whether to return an object (in case of
proj_predict()) or objects (in case ofproj_linpred()) of classdraws_matrix(seeposterior::draws_matrix()). In case ofproj_linpred()and projected draws with nonconstant weights (as well asintegratedbeingFALSE),posterior::weight_draws()is applied internally.- .seed
Pseudorandom number generation (PRNG) seed by which the same results can be obtained again if needed. Passed to argument
seedofset.seed(), but can also beNAto not callset.seed()at all. If notNA, then the PRNG state is reset (to the state before callingproj_linpred()orproj_predict()) upon exitingproj_linpred()orproj_predict(). Here,.seedis used for drawing new group-level effects in case of a multilevel submodel (however, not yet in case of a GAMM) and for drawing from the predictive distributions of the submodel(s) in case ofproj_predict(). If a clustered projection was performed, then inproj_predict(),.seedis also used for drawing from the set of projected clusters of posterior draws (see argumentnresample_clusters). Ifproject()is called internally withseed = NA(or withseedbeing a lazily evaluated expression that uses the PRNG), then.seedalso affects the PRNG usage there.- ...
Arguments passed to
project()ifobjectis not already an object returned byproject().- nresample_clusters
For
proj_predict()with clustered projection (and nonconstant weights for the projected draws) only. Number of draws to return from the predictive distributions of the submodel(s). Not to be confused with argumentnclustersofproject():nresample_clustersgives the number of draws (with replacement) from the set of clustered posterior draws after projection (with this set being determined by argumentnclustersofproject()).- resp_oscale
Only relevant for the latent projection. A single logical value indicating whether to draw from the posterior-projection predictive distributions on the original response scale (
TRUE) or on latent scale (FALSE).
Value
In the following, \(S_{\mathrm{prj}}\), \(N\),
\(C_{\mathrm{cat}}\), and \(C_{\mathrm{lat}}\) from help
topic refmodel-init-get are used. (For proj_linpred() with integrated = TRUE, we have \(S_{\mathrm{prj}} = 1\).) Furthermore, let
\(C\) denote either \(C_{\mathrm{cat}}\) (if transform = TRUE)
or \(C_{\mathrm{lat}}\) (if transform = FALSE). Then, if the
prediction is done for one submodel only (i.e., length(nterms) == 1 || !is.null(predictor_terms) in the explicit or implicit call to project(),
see argument object):
proj_linpred()returns alistwith the following elements:Element
predcontains the actual predictions, i.e., the linear predictors, possibly transformed to response scale (depending on argumenttransform).Element
lpdis non-NULLonly ifnewdataisNULLor ifnewdatacontains response values in the corresponding column. In that case, it contains the log predictive density values (conditional on each of the projected parameter draws ifintegrated = FALSEand averaged across the projected parameter draws ifintegrated = TRUE).
In case of (i) the traditional projection, (ii) the latent projection with
transform = FALSE, or (iii) the latent projection withtransform = TRUEand<refmodel>$family$cats(where<refmodel>is an object resulting frominit_refmodel(); see alsoextend_family()'s argumentlatent_y_unqs) beingNULL, both elements are \(S_{\mathrm{prj}} \times N\) matrices (converted to a—possibly weighted—draws_matrixif argumentreturn_draws_matrixisTRUE, see the description of this argument). In case of (i) the augmented-data projection or (ii) the latent projection withtransform = TRUEand<refmodel>$family$catsbeing notNULL,predis an \(S_{\mathrm{prj}} \times N \times C\) array (if argumentreturn_draws_matrixisTRUE, this array is "compressed" to an \(S_{\mathrm{prj}} \times (N \cdot C)\) matrix—with the columns consisting of \(C\) blocks of \(N\) rows—and then converted to a—possibly weighted—draws_matrix) andlpdis an \(S_{\mathrm{prj}} \times N\) matrix (converted to a—possibly weighted—draws_matrixif argumentreturn_draws_matrixisTRUE). Ifreturn_draws_matrixisFALSEandallow_nonconst_wdraws_prjisTRUEandintegratedisFALSEand the projected draws have nonconstant weights, then bothlistelements have the weights of these draws stored in an attributewdraws_prj. (Ifreturn_draws_matrix,allow_nonconst_wdraws_prj, andintegratedare allFALSE, then projected draws with nonconstant weights cause an error.)proj_predict()returns an \(S_{\mathrm{prj}} \times N\) matrix of predictions where \(S_{\mathrm{prj}}\) denotesnresample_clustersin case of clustered projection (or, more generally, in case of projected draws with nonconstant weights). If argumentreturn_draws_matrixisTRUE, the returned matrix is converted to adraws_matrix(seeposterior::draws_matrix()). In case of (i) the augmented-data projection or (ii) the latent projection withresp_oscale = TRUEand<refmodel>$family$catsbeing notNULL, the returned matrix (ordraws_matrix) has an attribute calledcats(the character vector of response categories) and the values of the matrix (ordraws_matrix) are the predicted indices of the response categories (these indices refer to the order of the response categories from attributecats).
If the prediction is done for more than one submodel, the output from above
is returned for each submodel, giving a named list with one element for
each submodel (the names of this list being the numbers of predictor
terms of the submodels when counting the intercept, too).
Details
Currently, proj_predict() ignores observation weights that are not
equal to 1. A corresponding warning is thrown if this is the case.
In case of the latent projection and transform = FALSE:
Output element
predcontains the linear predictors without any modifications that may be due to the original response distribution (e.g., for abrms::cumulative()model, the ordered thresholds are not taken into account).Output element
lpdcontains the latent log predictive density values, i.e., those corresponding to the latent Gaussian distribution. Ifnewdatais notNULL, this requires the latent response values to be supplied in a column called.<response_name>ofnewdatawhere<response_name>needs to be replaced by the name of the original response variable (if<response_name>contained parentheses, these have been stripped off byinit_refmodel(); see the left-hand side offormula(<refmodel>)). For technical reasons, the existence of column<response_name>innewdatais another requirement (even though.<response_name>is actually used).
Examples
# Data:
dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
# The `stanreg` fit which will be used as the reference model (with small
# values for `chains` and `iter`, but only for technical reasons in this
# example; this is not recommended in general):
fit <- rstanarm::stan_glm(
y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
QR = TRUE, chains = 2, iter = 500, refresh = 0, seed = 9876
)
# Projection onto an arbitrary combination of predictor terms (with a small
# value for `ndraws`, but only for the sake of speed in this example; this
# is not recommended in general):
prj <- project(fit, predictor_terms = c("X1", "X3", "X5"), ndraws = 21,
seed = 9182)
# Predictions (at the training points) from the submodel onto which the
# reference model was projected:
prjl <- proj_linpred(prj)
prjp <- proj_predict(prj, .seed = 7364)