projpred 2.9.1
CRAN release: 2025-10-28
Major changes
- Fixed a major bug in forward search with 3-way or higher-order interactions. See “Bug fixes” below for details.
Bug fixes
- Fixed a bug that caused forward search not to place lower-order terms of a 3-way or higher-order interaction term before that interaction term. (GitHub: #531, #532)
- Fixed a bug that caused
project()to construct an incorrect internal table of predictor terms (from which argumentpredictor_termscan select one or several terms from) in case of a group-level term that does not contain a group-level intercept. (GitHub: #533, #534) - Relaxed test to prevent spurious failures and unblock reverse dependencies. No functional changes.
projpred 2.9.0
CRAN release: 2025-07-08
Major changes
Subsampled PSIS-LOO CV (usable via argument
nlooofcv_varsel()) has been fixed and is not experimental anymore. There are a few restrictions: Performance statistic"auc"(see argumentstatsofsummary.vsel()andplot.vsel(); argumentstatofsuggest_size()is concerned as well) is not supported in case of subsampled PSIS-LOO CV. Furthermore,baseline = "best"(insummary.vsel()andplot.vsel()) is not supported in case of subsampled PSIS-LOO CV either. (GitHub: #94, #496)The uncertainty interval for performance statistic
"mse"is now based on a log-normal approximation (instead of a normal approximation) if argumentdeltasofsummary.vsel()orplot.vsel()isFALSE. (GitHub: #496)The standard error for performance statistic
"rmse"is now computed via the delta method (instead of bootstrapping). The uncertainty interval for"rmse"is now based on a log-normal approximation (instead of bootstrapping) if argumentdeltasofsummary.vsel()orplot.vsel()isFALSEand based on a normal approximation (instead of bootstrapping) ifdeltasisTRUE. (GitHub: #496)Performance statistic
"R2"(R-squared) has been added, see argumentstatsofsummary.vsel()andplot.vsel(); argumentstatofsuggest_size()supports it as well. (GitHub: #483, #496)The performance evaluation part of
cv_varsel()withcv_method = "LOO"andvalidate_search = FALSEnow always applies Pareto smoothing when computing the importance sampling weights (as long as the number of importance ratios in the tail is large enough; otherwise, no Pareto smoothing is applied). Previously, in case of projected draws with nonconstant weights (i.e., in case of clustering), no Pareto smoothing had been applied. (GitHub: #496, #507)The threshold for high Pareto-\(\hat{k}\) values was updated to the one presented by Vehtari et al. (2024, “Pareto smoothed importance sampling”, Journal of Machine Learning Research, 25(72):1-58, https://www.jmlr.org/papers/v25/19-556.html). This threshold depends on the Monte Carlo sample size and is often close to the former fixed threshold of 0.7 (a short introduction may also be found in the LOO glossary). Correspondingly, the former “secondary” threshold of 0.5 is not used anymore either. (GitHub: #490, #498)
Argument
typeofsummary.vsel()has gained options"diff.lower"and"diff.upper"(see the documentation for details). (GitHub: #511)Argument
deltasofplot.vsel()has gained option"mixed"which combines the point estimates fromdeltas = FALSEwith the uncertainty bars fromdeltas = TRUE. (GitHub: #511)For the latent projection, the function passed to argument
latent_ll_oscaleofextend_family()now needs to have an argumentdis(at the second position). Similarly, the function passed to argumentlatent_ppd_oscaleofextend_family()now needs to have an argumentdis_resamp(again at the second position). This makes it possible, e.g., to use the latent projection for a log-normal response family. (GitHub: #513)Argument
verboseofproject(),varsel(), andcv_varsel()has been changed from logical to integer. However, logical values continue to work (sinceas.integer()is applied internally). Global optionsprojpred.extra_verboseandprojpred.verbose_projectare now deprecated because additional verbosity can be achieved via higher integer values for argumentverbose. The new global optionprojpred.verbosemay be used to set argumentverboseofproject(),varsel(), andcv_varsel()globally. (GitHub: #519)-
Some global options have been renamed, so please use their new names from now on (although the old names will continue to work for a while) (GitHub: #500, #521):
- Global option
projpred.prll_cvhas been renamed toprojpred.parallel_cv. - Global option
projpred.warn_prj_drawwisehas been renamed toprojpred.warn_proj_drawwise. - Global option
projpred.check_convhas been renamed toprojpred.check_convergence. - Global option
projpred.prll_prj_triggerhas been renamed toprojpred.parallel_proj_trigger.
- Global option
-
In
plot.vsel(), several defaults have been changed (GitHub: #517, #522):- Argument
text_anglenow defaults to45(previously, the default wasNULL). - Argument
size_positionnow defaults to"primary_x_top"(previously, the default was"primary_x_bottom"). - Argument
show_cv_proportionsnow defaults toFALSE(previously, the default wasTRUE).
These arguments can now also be controlled via global options, see section “Usage” of
?plot.vselfor their names and the main vignette for an illustration. - Argument
The changelog for version 2.6.0 did not contain a notification that
cvfolds()had been deprecated and that the new namecv_folds()should be used instead. This changelog entry has been added now (see below), but is also mentioned here to make users aware of it (although a deprecation warning was already added in version 2.6.0 and will be kept untilcvfolds()is eventually removed in a future release). (GitHub: #411)
Minor changes
- When using the doFuture backend for parallelization, progression updates can now be received via the progressr package, see
?`projpred-package`(section “Parallelization”). (GitHub: #504) - Several enhancements concerning verbosity, e.g., the number of projected draws (resulting from clustering or thinning) is now printed out during the different steps of the computations and verbose-mode output is redirected to
stderr()instead ofstdout(). (GitHub: #506, #518) - For the CV parallelization (see argument
parallelofcv_varsel()), a new global optionprojpred.export_to_workersmay be set to a character vector of names of objects to export from the global environment to the parallel workers. (GitHub: #497, #510) - Added global options
projpred.foreach_errorhandlingandprojpred.foreach_verbosewhose values are passed toforeach::foreach()’s arguments.errorhandlingand.verbose, respectively. The defaults for these new global options are the same as those for the respectiveforeach::foreach()arguments:"stop"for global optionprojpred.foreach_errorhandlingandFALSEfor global optionprojpred.foreach_verbose. (GitHub: commit 3231d13) - Added global options to control several arguments of
plot.vsel()andplot.cv_proportions()(see section “Usage” of the help pages of these two functions). (GitHub: commit 3333043) - Changed the maintainer to Osvaldo Martin.
Bug fixes
- Fixed a bug that caused an error when using the augmented-data or latent projection in combination with a single projected draw for performance evaluation in
cv_varsel()withcv_method = "LOO"andvalidate_search = FALSE. (GitHub: #512) - Previously, in case of PSIS-LOO CV with
validate_search = TRUEand thinned posterior draws for projection (i.e., argument(s)ndrawsorndraws_predbeing used, notnclustersornclusters_pred),print.vselsummary()incorrectly reported that the posterior draws had been clustered. This has now been fixed, so thinning is reported in such cases. (GitHub: #516) - Fixed the internal default
extract_model_datafunction when using the latent projection for a custom reference model object. (GitHub: #523)
projpred 2.8.0
CRAN release: 2023-12-14
Major changes
- Search results generated in an earlier
varsel()orcv_varsel()call can now be re-used by the help of the newvarsel.vsel()andcv_varsel.vsel()methods (i.e., by applyingvarsel()orcv_varsel()to the output of the earliervarsel()orcv_varsel()call). This can save a lot of time when re-running the predictive performance evaluation part multiple times based on the same search results. An illustration may be found in the updated main vignette (section “Preliminarycv_varsel()run”; a more general description may also be found in section “Speed”). (GitHub: #461, #463, #465, #466) - K-fold CV can now be combined with
validate_search = FALSE. Related to this is an internal change which may cause subsampled PSIS-LOO CV (an experimental feature controlled by argumentnlooofcv_varsel()) with clustered projection during the search (i.e.,1 < nclusters && nclusters < S, whereSdenotes the number of posterior draws in the reference model) to yield slightly different results due to different internal pseudorandom number generator (PRNG) states. Furthermore, ifis.na(seed), then the PRNG state for code downstream of such acv_varsel()call will be different due to this internal change. (GitHub: #464) -
print.vselsummary()(and hence alsoprint.vsel()) now prints the reference model’s performance evaluation results as well (not just those of the submodels). Correspondingly, a new helper functionperformances()has been added which allows to access the reference model’s (as well as the submodels’) performance evaluation results. (GitHub: #471) - Argument
solution_termsofproject()has been deprecated. Please use the new argumentpredictor_termsinstead. (GitHub: #472) - For expert users of the augmented-data projection only: Objects of class
augmatoraugvecdo not need to have an attribute callednobs_origanymore, but a new attribute calledndiscrete, giving the number of (possibly latent) response categories instead of the number of observations (see?`augdat-internals`). This simplifies the subsetting of such objects. (GitHub: #473) - By default, projpred now catches messages and warnings from the draw-wise divergence minimizers and throws their unique collection after performing all draw-wise divergence minimizations (i.e., draw-wise projections). This can be deactivated by setting global option
projpred.warn_prj_drawwisetoFALSE. Previously, projpred suppressed such messages and warnings. (GitHub: #478) - By default, projpred now checks the convergence of the draw-wise divergence minimizers and throws a warning in case of potential convergence problems. This can be deactivated by setting global option
projpred.check_convtoFALSE. (GitHub: #478)
Minor changes
- In
as.matrix.projection(),nm_scheme = "auto"is deprecated. Please usenm_scheme = NULLinstead. - The plot produced by
plot.vsel()now includes a title and a subtitle, with the subtitle mentioning the nominal coverage as well as the type of the confidence intervals (CIs) explicitly. However, in case of a facetted plot (i.e., in case of multiplestats) and somestatsimplying a different CI type than otherstats, the CI types are omitted (because mentioning them would make the subtitle too complicated). Note that title and subtitle can always be omitted with<plot.vsel() output object> + ggplot2::labs(title = NULL, subtitle = NULL). (GitHub: #468) -
plot.vsel()has gained a new argumentshow_cv_proportions, allowing to omit the CV ranking proportions. (GitHub: #470) - Renamed
summary.vsel()’s output elementselectiontoperf_suband made the names of thisdata.frame’s columns more consistent so that it is easier to handle thatdata.frameprogrammatically. This should not be a breaking change because elements ofvselsummaryobjects (i.e., elements of objects returned bysummary.vsel()) are not meant to be accessed directly (for elementsperf_subandperf_ref, the new helper functionperformances()has been added, see “Major changes” above). (GitHub: #471) - In
summary.vsel()andplot.vsel(), theNA_character_“string” (which was previously used as a placeholder for the predictor term of the intercept-only model at size0) was replaced by the string"(Intercept)". (GitHub: #471) - Renamed
project()’s output elementsolution_termstopredictor_terms. This should not be a breaking change because that element is meant to be accessed viapredictor_terms(). (GitHub: #472) - Renamed elements
solution_termsandsolution_terms_cvofvselobjects (returned byvarsel()andcv_varsel()) topredictor_rankingandpredictor_ranking_cv, respectively. This should not be a breaking change because those elements are meant to be accessed viaranking(). (GitHub: #472) - Global option
projpred.verbose_projectnow affects the verbosity of all projections performed by the built-in divergence minimizers (except for the built-in L1-projection divergence minimizer). In particular, the divergence minimizer (no matter whether built-in or user-specified) is also employed when callingvarsel()orcv_varsel(), so setting optionprojpred.verbose_projecttoTRUEnow shows the progress of the projections during avarsel()orcv_varsel()call. Previously, that option only affected the projections performed throughproject()(see the default forproject()’s argumentverbose). Usually, settingprojpred.verbose_projecttoTRUEonly makes sense when setting global optionprojpred.extra_verboseand argumentverbose(ofvarsel()orcv_varsel()) toTRUEas well. - Added
print()methods for objects of classrefmodelandprojection, mainly to avoid cluttering the console when printing such objects accidentally. - Argument
extract_model_dataofinit_refmodel()is now allowed to beNULLfor using an internal default. -
print.vselsummary()andprint.vsel()now use a minimum number of significant digits of2by default. The previous behavior can be restored by settingoptions(projpred.digits = getOption("digits")). - Added a new performance statistic, the geometric mean predictive density (GMPD). This is particularly useful for discrete outcomes because there, the GMPD is a geometric mean of probabilities and hence bounded by zero and one. For details, see argument
statsof the?summary.vselhelp. (GitHub: #476) -
project()’s argumentverbosenow gets passed to argumentverbose_divmin(notprojpred_verbose) of the divergence minimizer function (see argumentdiv_minimizerofinit_refmodel()). - Arguments
lambda_min_ratio,nlambda, andthreshofvarsel()andcv_varsel()have been deprecated. Instead,varsel()andcv_varsel()have gained a new argument calledsearch_controlwhich accepts control arguments for the search as alist. Thus, former argumentslambda_min_ratio,nlambda, andthreshshould now be specified viasearch_control(but note thatsearch_controlis more general because it also accepts control arguments for a forward search). (GitHub: #477) -
run_cvfun()has gained a new argumentfolds, accepting a vector of fold indices (the default isNULL, meaning that the folds are constructed internally, as before). This new argument is helpful, for example, to perform a stratified K-fold CV in a convenient manner (an example of this has been added to the?run_cvfunhelp). (GitHub: #480) -
plot.vsel()has gained a new argumentsize_position. Setting it to"primary_x_top"moves the text for the submodel sizes above the x-axis. Setting it to"secondary_x"moves that text into a secondary x-axis located at the top of the plot. (GitHub: #484) - The default alignments of the x-axis text in
plot.vsel()have been changed: x-axis text is now right-aligned (left-aligned) fortext_angle > 0(< 0) and also top-aligned for-90 < text_angle && text_angle < 90 && text_angle != 0. We emphasize that alignments can always be customized with<plot.vsel() output object> + ggplot2::theme(axis.text.x.bottom = ggplot2::element_text(hjust = <hjust_value>, vjust = <vjust_value>)). (GitHub: #484)
Bug fixes
- Fixed a bug sometimes causing
plot.vsel()to produce extra (“empty”) ticks on the x-axis. (GitHub: #462) - Fixed a bug in
summary.vsel()andplot.vsel()causing bootstrap results (i.e., standard error and confidence interval for RMSE and AUC) to be incorrect ifdeltas = TRUE. (GitHub: #474) - Fixed several bugs in
summary.vsel()andplot.vsel()sometimes causing incorrect predictive performance results in case of subsampled PSIS-LOO CV (an experimental feature controlled by argumentnlooofcv_varsel()). (GitHub: #475) - Fixed backward compatibility for the legacy structure of
cvfits(the new structure was introduced by version 2.7.0, see GitHub pull request #456).
projpred 2.7.0
CRAN release: 2023-09-30
Major changes
The default search
methodis now"forward"search for all kinds of models (previously,"L1"search was used by default where available). The reason for this change is that in general, forward search is more favorable compared to L1 search (see section “Details” in?varselor?cv_varsel). (GitHub: #453, #459)-
Several enhancements with respect to projected draws with different (i.e., nonconstant) weights, which typically occurs in case of clustered projection (GitHub: #206, #439):
-
as.matrix.projection()now throws an error if the projected draws have nonconstant weights. (This error is the default behavior; it can be avoided by setting the new argumentallow_nonconst_wdraws_prjtoTRUE, but this is for expert use only because in that case, the weights of the projected draws are stored in an attributewdraws_prjand handling this attribute requires special care, e.g., when subsetting the returned matrix.) Instead, aposterior::as_draws_matrix()method (as_draws_matrix.projection()) has been added which allows for a safer handling of these weights (e.g., with the help ofposterior::resample_draws(), see section “Examples” of the?as_draws_matrix.projectionhelp). Just likeas.matrix.projection(),as_draws_matrix.projection()also works for the more common case of projected draws with constant weights. Aposterior::as_draws()method (as_draws.projection()) has also been added, but this is merely a wrapper foras_draws_matrix.projection(). -
proj_linpred()now also throws an error (by default) if the projected draws have nonconstant weights and has gained the new argumentsallow_nonconst_wdraws_prjandreturn_draws_matrix. As inas.matrix.projection(), argumentallow_nonconst_wdraws_prjis for expert use only. Instead,return_draws_matrixis the intended argument in case of projected draws with nonconstant weights. Similarly toas_draws_matrix.projection(), it requires the posterior package and returns adraws_matrix(with weighted draws if the projected draws have nonconstant weights andintegratedisFALSE). -
proj_predict()has gained an argumentreturn_draws_matrixfor converting the returned matrix to adraws_matrix(which again requires the posterior package). Forproj_predict(), no further modifications were necessary because its argumentnresample_clustersalready takes weights of projected draws appropriately into account.
-
Added helper function
run_cvfun()which can be used to create input forcv_varsel.refmodel()’s new argumentcvfits(which is the same asinit_refmodel()’s argumentcvfits, but avoids having to callinit_refmodel()orget_refmodel()twice). See the documentation ofrun_cvfun()for details. (GitHub: #458)Users applying
varsel()orcv_varsel()to an object of classvselnow need to usevarsel(get_refmodel(<vsel_object>), <...>)orcv_varsel(get_refmodel(<vsel_object>), <...>)instead ofvarsel(<vsel_object>, <...>)andcv_varsel(<vsel_object>, <...>), respectively. The reason is that new methodsvarsel.vsel()andcv_varsel.vsel()have been added. Currently, these are only placeholders, but in a future release, they will offer new functionality.
Minor changes
If an L1 search selects an interaction term before all involved lower-order interaction terms (including main-effect terms) have been selected, the predictor ranking is now automatically modified so that the lower-order interaction terms come before this interaction term. A corresponding warning is thrown, which may be deactivated by setting the global option
projpred.warn_L1_interactionstoFALSE. Previously, beginning with version 2.5.0, only a warning was thrown and this only if an L1 search selected an interaction term before all involved main-effect terms had been selected. (GitHub: #420)Added a progress bar for
project()(when using the built-in divergence minimizers). For this,project()has gained a new argumentverbosewhich can also be controlled via the global optionprojpred.verbose_project. By default, the new progress bar is activated. (GitHub: #421)Added a new argument
paralleltocv_varsel(). Withparallel = TRUE, costly parts of projpred’s cross-validation (CV) can be run in parallel. See the documentation of that new argument (and section “Note” ofcv_varsel()’s documentation) for details. (GitHub: #422)Added a warning for issue #323 (for multilevel Gaussian models, the projection onto the full model can be instable). (GitHub: #426)
plot.vsel()has gained the new argumentspoint_sizeandbar_thicknesswhich control the size of the points and the thickness of the uncertainty bars, respectively. By default, the points are slightly larger now and the uncertainty bars slightly thicker than before. The previous appearance can be achieved by settingpoint_size = 1.5andbar_thickness = 0.5. (GitHub: #429, #443)plot.vsel(): Added argumentranking_coloredfor coloring the points and the uncertainty bars according to the magnitude of the (possibly cumulated) CV ranking proportions. (GitHub: #430; thanks to @yannmclatchie for the suggestion)Added warnings for most of the problems described in section “Troubleshooting” of the main vignette. (GitHub: #431)
Output element
p_typeofproject()has been removed. Instead, output elementconst_wdraws_prjhas been added, but its definition is essentially the inverse of former elementp_type(see the updated documentation ofproject()’s output). This should not be a breaking change for users (asp_typewas mainly intended for internal use and the new elementconst_wdraws_prjis so, too) but this slightly enhances the cases whereas.matrix.projection()used to throw a warning (and now throws an error; see “Major changes” above) concerning the weights of the projected draws and the cases whereproj_predict()resamples from the projected draws using argumentnresample_clusters. (GitHub: #432)Improved handling of PSIS-LOO CV warnings. (GitHub: #438, #451)
Reduced peak memory usage during forward search. A global option
projpred.run_gchas also been added, see the general package documentation (available online or by typing?`projpred-package`). (GitHub: #442)Slightly improved efficiency in K-fold and PSIS-LOO CV, especially in case of a large number of observations. Under very special conditions (
refit_prj = FALSE,1 < nclusters && nclusters < S, and1 < nclusters_pred && nclusters_pred < S; note that1 < nclustersrequires forward search,Sdenotes the number of posterior draws in the reference model, andnclusters_predis essentially unused ifrefit_prj = FALSE), this change might affect K-fold CV results, due to a different pseudorandom number generator (PRNG) state in folds other than the first one. Under similarly special conditions (refit_prj = FALSEand1 < nclusters_pred && nclusters_pred < S), the PRNG state for LOO subsampling (see argumentnloo) is affected. Furthermore, ifis.na(seed), then the PRNG state for code downstream of suchcv_varsel()calls will be different due to this change. (GitHub: #446)Slightly improved efficiency at the end of
cv_varsel(), especially in case of a large number of observations. Ifis.na(seed), then the PRNG state for code downstream of acv_varsel()call withrefit_prj = TRUEand1 < nclusters_pred && nclusters_pred < S(whereSdenotes the number of posterior draws in the reference model) will be different due to this change. (GitHub: #447)Slightly improved memory usage in
varsel(),cv_varsel(), andproject(). In case of LOO subsampling (see argumentnloo) with clustered projection (i.e.,1 < nclusters && nclusters < Sor1 < nclusters_pred && nclusters_pred < S, whereSdenotes the number of posterior draws in the reference model), this change may lead to slightly different results due to different internal PRNG states. Furthermore, ifis.na(seed), then the PRNG state for code downstream of such acv_varsel()call will be different due to this change. (GitHub: #448)The internal function
.extract_model_datahas been removed. As an alternative (with some differences compared to.extract_model_data), the new functiony_wobs_offs()is exported.-
Fixes/enhancements with respect to observation weights and offsets (GitHub: #449):
- In case of an rstanarm reference model, the defaults for arguments
weightsnewandoffsetnew(seeproj_linpred(),proj_predict(), andpredict.refmodel()) now cause the original observation weights and offsets to be used if possible (instead of ones and zeros, respectively, which could even be considered to have been a bug—hence why this is mentioned under “Bug fixes” as well). For brms reference models, this behavior had already been implemented before. - An error is now thrown if a length-zero element
weightsoroffsetis returned by the function supplied to argumentextract_model_dataofinit_refmodel()(before, a vector of ones or zeros was used silently for the observation weights and offsets, respectively).
- In case of an rstanarm reference model, the defaults for arguments
Added the helper function
force_search_terms()which allows to constructsearch_termswhere certain predictor terms are forced to be included (i.e., they are forced to be selected first) whereas other predictor terms are optional (i.e., they are subject to the variable selection, but only after the inclusion of the “forced” terms). (GitHub: #346)Reduced peak memory usage during performance evaluation (more precisely, during the re-projections done for the performance evaluation). This reduction is considerable especially for multilevel submodels, but possibly also for additive submodels. (GitHub: #440, #450)
A message is now thrown when cutting off the search at
nterms_max’s internal default of (currently)19. (GitHub: #452)Added sub-section “Speed” to the main vignette’s “Troubleshooting” section. (GitHub: #455)
In case of K-fold CV, the
listpassed to argumentcvfitsofinit_refmodel()should not have a sub-listcalledfitsanymore. Instead, the content of this former sub-listcalledfitsshould be moved one level up, i.e., should be placed directly in thelistpassed tocvfits(the empty elementfitsshould then be removed). For some time, the old structure will continue to work, but this possibility is deprecated and will be removed in the future. (GitHub: #456)In case of K-fold CV, the
Kreference model fits (i.e., the elements of the return value of the function passed to argumentcvfunofinit_refmodel()or the elements of thelistsupplied to argumentcvfitsofinit_refmodel()) do not need to belists anymore (see the documentation for argumentcvrefbuilderofinit_refmodel()). (GitHub: #457)
Bug fixes
- Fixed a bug in the printed number of projected draws for the performance evaluation when calling
print.vselsummary()based on output fromvarsel()withrefit_prj = FALSE. - Fixed a bug sometimes causing an error when predicting from a submodel that is a GLM and has interactions. (GitHub: #420)
- Fixed a bug introduced in version 2.6.0, causing an incompatibility of K-fold CV with R versions < 4.2.0. (GitHub: #423, #427)
- Fixed a bug for the augmented-data projection in combination with subsampled PSIS-LOO CV. (GitHub: #433)
-
cv_varsel()withvalidate_search = FALSEused to callloo::psis()(for the submodel performance evaluation PSIS-LOO CV) even in case of draws with different (i.e., nonconstant) weights. In such cases,loo::sis()is called now (with a warning). (GitHub: #438) - Fixed a bug for rstanarm (and custom) multilevel reference models with interactions (
:syntax) between grouping variables, caused by missing columns in the reference model’sdata.frame(for brms reference models, this was already done correctly). (GitHub: #445) - In case of an rstanarm reference model, the defaults for arguments
weightsnewandoffsetnew(seeproj_linpred(),proj_predict(), andpredict.refmodel()) now cause the original observation weights and offsets to be used if possible (instead of ones and zeros, respectively, which could be considered to have been a bug). For brms reference models, this behavior had already been implemented before. (GitHub: #449) - Fixed a bug causing PSIS-LOO CV with
validate_search = FALSEto fail in case of a single projected draw. (GitHub: #454)
projpred 2.6.0
CRAN release: 2023-06-01
Major changes
-
In anticipation of a larger overhaul of the projpred user interface, this release comes with several new functions for accessing and investigating solution paths (which are now termed predictor rankings by these new functions, a term that is hopefully easier to grasp for new users):
- Added a new function called
ranking()which returns the predictor ranking from the full-data search and possibly also the predictor rankings from fold-wise searches in case of cross-validation (CV). (More precisely,ranking()is a generic. The only method isranking.vsel(), applicable to objects returned byvarsel()orcv_varsel(). The output is of classranking.) - Added a new function called
cv_proportions()which computes ranking proportions (across CV folds, see?cv_proportionsfor details) from fold-wise predictor rankings. (More precisely,cv_proportions()is a generic. The main method iscv_proportions.ranking(), but as a shortcut,cv_proportions.vsel()has also been added. The output is of classcv_proportions.) - Added a new
plot()method calledplot.cv_proportions()for plotting ranking proportions from fold-wise predictor rankings. (As a shortcut,plot.ranking()has also been added.)
Because of these new functions, a message has been added to
print.vselsummary(), mentioning how to access and investigate the fold-wise predictor rankings (if they exist). Furthermore, due to these changes, elementpct_solution_terms_cvofvselobjects has been replaced with elementsolution_terms_cvwhich contains the fold-wise predictor rankings instead of the corresponding ranking proportions. However, elements ofvselobjects are not meant to be accessed directly, so this replacement should not be a breaking change for most users. Finally, methodsolution_terms.vsel()(which—until now—was the only possibility to extract the full-data predictor ranking) has now been deprecated and will be removed in a future release. Please use the new functionranking()instead (more precisely,ranking()’s output elementfulldatacontains the full-data predictor ranking that is also extracted bysolution_terms.vsel();ranking()’s output elementfoldwisecontains the fold-wise predictor rankings—if available—which were previously not accessible via a built-in function). (GitHub: #289, #406, #411) - Added a new function called
Added function
predictor_terms()which retrieves the predictor terms used in aproject()run. Correspondingly, methodsolution_terms.projection()has now been deprecated and will be removed in a future release. Please usepredictor_terms()instead. (GitHub: #411)Renamed function
cvfolds()tocv_folds()(more precisely, the former variant still exists, but is deprecated and will be removed in a future release). (GitHub: #411)seed(and.seed) arguments now have a default ofNAinstead ofsample.int(.Machine$integer.max, 1)and the pseudorandom number generator (PRNG) state is reset only if the user-supplied seed is notNA. This allows setting a seed once at the beginning of any projpred-related code and then leaving allseed(and.seed) arguments at their default. Previously, such practice could lead to results which were “less random” than they should have been because the former default ofsample.int(.Machine$integer.max, 1)caused projpred functions with aseed(or.seed) argument to reset the PRNG state upon exit, meaning that two repeated calls tocv_varsel()(for example) with no PRNG-using code between them would use the same seed internally. (GitHub: #412)Added the main diagonal of the matrix returned by
cv_proportions()to a new column calledcv_proportions_diagof the summary table computed bysummary.vsel(). The purpose of this new column is to give a basic sense for the (CV) variability in the ranking of the predictors. Argumentcumulateofcv_proportions()has been added tosummary.vsel()as well (to allow the ranking proportions in the newly added column to be cumulated ranking proportions, if desired). (GitHub: #289, #413)Added the full-data predictor ranking and the main diagonal of the matrix returned by
cv_proportions()to the plot created byplot.vsel(). These new elements can be omitted by settingplot.vsel()’s new argumentranking_nterms_maxtoNA(setting it to some specific submodel size causes the full-data predictor ranking and the corresponding ranking proportions to be omitted after that size). Argumentcumulateofcv_proportions()has been added toplot.vsel()as well (to allow the ranking proportions to be cumulated ranking proportions, if desired). Other new arguments areranking_abbreviate(together withranking_abbreviate_args),ranking_repel(together withranking_repel_args), andtext_angle(see theplot.vsel()documentation for details). (GitHub: #289, #414, #416, #417)
Minor changes
- Enhancements in the vignettes. In particular, the new functions
ranking(),cv_proportions(), andplot.cv_proportions()(see “Major changes” above) are now illustrated in the main vignette. (GitHub: #407, #411) - Reduced the peak memory usage of
cv_varsel()withcv_method = "kfold". This may slightly change results from such acv_varsel()run compared to older projpred versions due to different pseudorandom number generator (PRNG) states when clustering posterior draws. (GitHub: #419) - The
cvfitslist (seeinit_refmodel()) does not need to have an attribute calledKanymore.
Bug fixes
- Fixed a bug causing L1 search to throw an error in case of some
I()terms. (GitHub: #404, #408) - Fixed a bug causing L1 search to throw an error in case of
poly()orpolym()terms. Note that just likestep()andMASS::stepAIC(), projpred’s search algorithms do not split up apoly()orpolym()term into its lower-degree polynomial terms (which would be helpful, for example, if the linear part of apoly()term withdegrees = 2was relevant but the quadratic part not). Such a split-up of apoly()orpolym()term needs to be performed manually (if desired). (GitHub: #183, #409) - Fixed a bug causing some non-smooth predictor terms to be treated as smooth terms. (GitHub: #182, #410)
- See “Major changes” above: Fixed a bug causing projpred functions with a
seed(or.seed) argument to use the same seed internally when users set a seed once at the beginning (viaset.seed()) and then had two or more calls to such projpred functions with theirseed(or.seed) argument being at its default and no PRNG-using code between those calls. (GitHub: #412)
projpred 2.5.0
CRAN release: 2023-04-05
Minor changes
Setting the new global option
projpred.extra_verbosetoTRUEwill print out which submodel projpred is currently projecting onto. Furthermore, ifmethod = "forward"andverbose = TRUEinvarsel()orcv_varsel(), this new option will also make projpred print out which submodel has been selected at those steps of the forward search for which a percentage is printed (the percentage refers to the maximum submodel size that the search is run up to). In general, however, we cannot recommend setting this new global option toTRUEforcv_varsel()withvalidate_search = TRUE(simply due to the amount of information that will be printed, but also due to the progress bar which will not work anymore as intended). (GitHub: #363; thanks to @jtimonen)Enhanced
verboseoutput. In particular,varsel()is now more verbose, similarly to howcv_varsel()has already been for a long time. Theverboseoutput forcv_varsel()has also been updated, with the aim to give users a better understanding of the methodology behind projpred. (GitHub: #382)Slightly improved the calculation of predictive variances to make them less prone to numerical inaccuracies. (GitHub: #199)
Improved computational efficiency by avoiding an unnecessary final full-data performance evaluation (including costly re-projections if
refit_prj = TRUE, which is the default for non-datafitreference models) incv_varsel()withvalidate_search = TRUE. Due to this change, results fromcv_varsel()(withvalidate_search = TRUE) may slightly change due to a different pseudorandom number generator (PRNG) state when clustering posterior draws. The different PRNG state was necessary to make the PRNG state for the full-data search in thevalidate_search = TRUEcase consistent to the PRNG state for the full-data search in thevalidate_search = FALSEcase. (GitHub: #385)Reduced dependencies. (GitHub: #388)
Argument
digitsofprint.vselsummary()which used to be passed to an internalround()call was removed. Instead,digitscan now be passed toprint.data.frame()via..., thereby determining the minimum number of significant digits to be printed. (GitHub: #389)Although bad practice (in general), a reference model lacking an intercept can now be used within projpred. However, it will always be projected onto submodels which include an intercept. The reason is that even if the true intercept in the reference model is zero, this does not need to hold for the submodels. An informational message mentioning the projection onto intercept-including submodels is thrown when projpred encounters a reference model lacking an intercept. (GitHub: #96, #391)
In case of non-predictor arguments of
s()ort2(), projpred now throws an error. (This had already been documented before, but a suitable error message was missing.) (GitHub: #393, based on #156 and #269)In case of the
brms::categorical()family (supported since version 2.4.0), projpred now strips underscores from response category names inas.matrix.projection()output, as done by brms. (GitHub: #394)L1 search now throws a warning if an interaction term is selected before all involved main-effect terms have been selected. (GitHub: #395)
Documented that in multilevel (group-level) terms, function calls on the right-hand side of the
|character (e.g.,(1 | gr(group_variable)), which is possible in brms) are currently not allowed in projpred. A corresponding error message has also been added. (GitHub: #319)-
Due to internal refactoring:
-
project()’s output elementssubmodlandweightshave been renamed tooutdminandwdraws_prj, respectively. -
varsel()’s andcv_varsel()’s output elementd_testhas been replaced with new output elementstype_testandy_wobs_test.
Apart from
project()’s output elementwdraws_prj, these elements are not meant to be accessed manually, so changes are mentioned here only for the sake of completeness. Output elementwdraws_prjofproject()is only needed ifproject()was used for a clustered projection, which is not the default (and discouraged in most applied cases, at least with a small number of clusters). Thus, these renamings are breaking changes only in very rare cases. -
print.vselsummary()now also printsKin case of K-fold CV.The
print.vselsummary()output has been slightly improved, e.g., adding a remark what “search included” or “search not included” means.print.vselsummary()now also prints whetherdeltas = TRUEordeltas = FALSEwas used.Output element
pct_solution_terms_cvhas now also been added tovselobjects returned byvarsel(), but in that case, it is simplyNULL. This (pct_solution_terms_cvbeingNULL) is now also the case ifvalidate_search = FALSEwas used incv_varsel().Minor enhancements in the documentation.
Enhancements in the vignettes. In particular, section “Troubleshooting” of the main vignette has been revised.
If
proj_predict()is used with observation weights that are not all equal to1, a warning is now thrown. (GitHub: starts to address #402)
Bug fixes
- Fixed a long-standing bug (existing at least from version 2.0.2 on) causing
predict.refmodel()to requirenewdatato contain the response variable in case of a brms reference model. This is similar to paul-buerkner/brms#1457, but concernspredict.refmodel()(paul-buerkner/brms#1457 referred to predictions from the submodels). In order to make thispredict.refmodel()fix work, brms version 2.19.0 or later is needed. (GitHub: #381) - Fixed a long-standing bug (existing from version 2.1.0 on) causing output element
p_typeofproject()to be incorrect in case ofrefit_prj = FALSE,!is.null(nclusters), and anobjectof classvselthat was created with a non-clustered (thinned) projection during the search phase. The fix comes with a slightly different behavior ofproj_predict()fordatafits: It will not drawnresample_clusterstimes from the posterior-projection predictive distribution (which is based on the same single projected draw), but only once. (GitHub: #211, #401) - When performing predictions from submodels which are GLMs (or from submodels which are L1-penalized GLMs, which is only possible in case of
refit_prj = FALSEafter an L1 search), a new dataset containing acharacterpredictor variable with only a single unique value (or a new dataset containing afactorpredictor variable with a single level) used to cause an error. The case of acharacter(notfactor) predictor variable with only a single unique value occurred, e.g., during the performance evaluation in a LOO CV if acharacterpredictor got selected into a fold’s solution path. Thecharacterissue existed from version 2.1.0 on (in earlier versions, however, there were other issues which causedcharacterpredictors to throw an error). Now, all issues with respect tocharacterpredictor variables should be resolved. The issue with single-levelfactorpredictor variables is resolved now as well. (GitHub: #403) - When performing predictions from submodels which are GLMs (or from submodels which are L1-penalized GLMs, which is only possible in case of
refit_prj = FALSEafter an L1 search), a new dataset containing afactorpredictor with re-ordered levels (compared to this samefactorin the original dataset) used to lead to incorrect predictions. This bug existed at least from version 2.0.2 on (possibly even in earlier versions), but has been resolved now. (GitHub: #403) - Fixed an error thrown by projpred’s internal GLM submodel fitter in case of unused levels of a
factor. This issue existed at least from version 2.0.2 on (possibly even in earlier versions), but should have only affected rstanarm reference model fits (brms reference model fits were only affected in case of abrms::brm()call withdrop_unused_levels = FALSE, which is not the default). (GitHub: #403) - Fixed a bug that caused an L1 search combined with
refit_prj = FALSE(which is the default only fordatafits, not for the reference model objects of classrefmodelthat are usually employed in practice) to lead to incorrect predictions from the L1-searched submodels (which are L1-penalized GLMs) if the solution path had a main effect ranked after an interaction term. This bug existed at least from version 2.0.2 on (possibly even in earlier versions). The mentioned submodel predictions did not only affect the performance evaluation, but also the projected dispersion parameter and the returned Kullback-Leibler divergence (and the corresponding cross-entropy). (GitHub: #403)
projpred 2.4.0
CRAN release: 2023-02-12
Major changes
- Introduction of the augmented-data projection (Weber et al., 2023) (see section “Supported types of models” of the main vignette for details). (GitHub: #70, #322)
- Introduction of the latent projection (Catalina et al., 2021) (see section “Supported types of models” of the main vignette and the new latent-projection vignette for details). A consequence of the latent projection (more precisely, of the
resp_oscale = TRUEdefault insummary.vsel()) is thatvarsel()andcv_varsel()no longer callsuggest_size()internally at the end. Thus,print()-ing an object of classvselno longer includes the suggested projection size in the output (thestatfor this suggested size was fixed to"elpd"anyway, a fact that many users were probably not aware of). (GitHub: #372) - In case of multilevel models, projpred now has two global options for “integrating out” group-level effects:
projpred.mlvl_pred_newandprojpred.mlvl_proj_ref_new. These are explained in detail in the general package documentation (available online or by typing?`projpred-package`). (GitHub: #379)
Minor changes
- Improvements in the numerical stability of internal link and inverse-link functions. (GitHub: #376)
Bug fixes
- Fix a bug for offsets in cases where
family(seeinit_refmodel()) has a non-identity link function: After clustering the reference model’s posterior draws, we need to aggregate (within a given cluster) the reference model’s fitted values which already take the offsets into account instead of taking the offsets into account after aggregating the fitted values which do not take the offsets into account. This fix should affect results only in a very slight manner. Due to projpred’s internal adjustment for numerical stability when averaging a quantity across the draws within a given cluster, this also changes the projected residual standard deviations in Gaussian models in the order of1e-10. (GitHub: #374)
projpred 2.3.0
CRAN release: 2023-01-10
Major changes
- In
plot.vsel()andsummary.vsel(), the default ofalpha = 0.32is replaced byalpha = 2 * pnorm(-1)(=1 - diff(pnorm(c(-1, 1))), which is only approximately 0.32) so that now, a normal-approximation confidence interval with defaultalphastretches by exactly one standard error on either side of the point estimate. Typically, this changes results only slightly. In some cases, however, the new default may lead to a different suggested size, explaining why this is regarded as a major change. (GitHub: #371)
Minor changes
- The deprecated function
ggplot2::aes_string()is not used anymore, thereby avoiding an occasional soft-deprecation warning thrown by ggplot2 3.4.0. (GitHub: #367) - The KL divergence from the reference model to a submodel is simplified to the corresponding cross-entropy (i.e., the reference model’s entropy is dropped), with some caveats described in the documentation for output element
ceofproject(). The reason for this change is that the former KL divergence assumed the reference model’s family to be the same as the submodel’s family, which does not need to be the case for custom reference models. This should not be a user-facing change as users are discouraged to make use of specific output elements (like the former elementklof objects of classprojectionorvsel) directly. (GitHub: #369) - Improvements in the documentation (especially for argument
familyofinit_refmodel()andget_refmodel.default()).
projpred 2.2.2
CRAN release: 2022-11-09
Minor changes
- Improvements in documentation and vignette, especially to emphasize the generality of the reference model object resulting from
get_refmodel()andinit_refmodel()(thereby also distinguishing more clearly between “typical” and “custom” reference model objects) in (i) the description and several arguments ofget_refmodel()andinit_refmodel(), (ii) sections “Reference model” and “Supported types of models” of the vignette. (GitHub: #357, #359, #364, #365, #366) - Minor improvement in terms of efficiency in the
validate_search = FALSEcase ofcv_varsel(). - Improvement in terms of efficiency in case of a forward search with custom
search_terms(at least in some instances), also affecting the output ofsolution_terms(<vsel_object>)in those cases. (GitHub: #360; thanks to @sor16) - Update Catalina et al. (2020) to Catalina et al. (2022). (GitHub: #364)
Bug fixes
- Fix a bug causing offsets not to be taken into account appropriately when calculating the PSIS weights (those used for the submodels) in the
validate_search = FALSEcase ofcv_varsel(). This bug was introduced in v2.2.0 (and existed up to—including—v2.2.1). - Fix a (long-standing) bug causing offsets not to be taken into account appropriately when calculating the predictive variances for a reference model that has a dispersion parameter and a non-identity link function. (GitHub: #186 (partly), #355)
- Fix a (long-standing) bug causing offsets not to be taken into account appropriately when calculating the reference model’s summary statistics in case of
cv_varsel()withcv_method = "LOO"(more precisely, only the LOO posterior predictive expected values<vsel_object>$summaries$ref$muwere affected, not the (pointwise) LOO log posterior predictive density values<vsel_object>$summaries$ref$lppd). (GitHub: #186 (partly), #356) - Fix a (long-standing) bug leading to an error when trying to use
cv_varsel()with customsearch_terms(in some instances). (GitHub: #345, #360; thanks to @sor16)
projpred 2.2.1
CRAN release: 2022-09-20
Minor changes
- Several improvements in the documentation.
- For the RMSE as well as the AUC (see argument
statsofsummary.vsel()), the bootstrapping results are now also used for inferring the lower and upper confidence interval bounds. (GitHub: #318, #347; thanks to @awd97 and @VisionResearchBlog) - For
datafits, offsets are not supported anymore. (GitHub: #186 (partly), #351)
Bug fixes
- Fix GitHub issue #348 (L1 search in the presence of interaction terms). This bug was introduced in v2.1.0 (and existed up to—including—v2.2.0).
- Fix incorrectly thrown messages in case of
datafits (and other—unlikely—cases wherenclusters == SandS <= 20, withSdenoting the number of draws in the reference model). - Fix GitHub issue #349 (only concerned
datafits). (GitHub: #350)
projpred 2.2.0
CRAN release: 2022-08-19
Major changes
- In the
validate_search = FALSEcase ofcv_varsel()(withcv_method = "LOO"), the PSIS weights are now calculated based on the reference model (they used to be calculated based on the submodels which is incorrect). (GitHub: #325) - Some long-standing severe bugs (GitHub issues #329, #330, and #342) have been fixed, concerning the performance evaluation of models with nontrivial observation weights (i.e., models where at least one observation had a weight differing from 1). Concerned performance statistics were
"mse","rmse","acc"(="pctcorr"), and"auc"(i.e., all performance statistics except for"elpd"and"mlpd"). -
plot.vsel()andsuggest_size()gain a new argumentthres_elpd. By default, this argument doesn’t have any impact, but a non-NAvalue can be used for a customized model size selection rule (see?suggest_sizefor details). (GitHub: #335)
Minor changes
- Several improvements in the documentation (especially in the explanation of the
suggest_size()heuristic). - Improvement of the numerical stability for some link functions, achieved by avoiding unnecessary back-and-forth transformations between latent space and response space. (GitHub: #337, #338)
- All arguments
seedand.seedare now allowed to beNAfor not callingset.seed()internally at all. - Argument
d_testofvarsel()is not considered as an internal feature anymore. This was possible after fixing a bug ford_test(see below). (GitHub: #341) - The order of the observations in the sub-elements of
<vsel_object>$summariesand<vsel_object>$d_testnow corresponds to the order of the observations in the original dataset if<vsel_object>was created by a call tocv_varsel(<...>, cv_method = "kfold")(formerly, in that case, the observations in those sub-elements were ordered by fold). Thereby, the order of the observations in those sub-elements now always corresponds to the order of the observations in the original dataset, except if<vsel_object>was created by a call tovarsel(<...>, d_test = <non-NULL_d_test_object>), in which case the order of the observations in those sub-elements corresponds to the order of the observations in<non-NULL_d_test_object>. (GitHub: #341)
Bug fixes
- Fix GitHub issue #324 (large
search_termscaused the R session to crash). - Fix GitHub issue #204. (GitHub: #325)
- Fix the
validate_search = FALSEbug described above in “Major changes”: The PSIS weights are now calculated based on the reference model (they used to be calculated based on the submodels which is incorrect). (GitHub: #325) - Fix
\mbox{}commands displayed incorrectly in the HTML help from R version 4.2.0 on. (GitHub: #326) - Fix GitHub issue #329 (see also “Major changes” above).
- Fix GitHub issue #331.
-
plot.vsel()now draws the dashed red horizontal line for the reference model (and—if present—the dotted black horizontal line for the baseline model) first (i.e., before the submodel-specific graphical elements), to avoid overplotting. - Fix GitHub issue #339. (GitHub: #340)
- Fix argument
d_testofvarsel(): Not only the predictive performance of the reference model needs to be evaluated on the test data, but also the predictive performance of the submodels. (GitHub: #341) - Fix GitHub issue #342 (see also “Major changes” above).
- Fix GitHub issue #330 (see also “Major changes” above). (GitHub: #344, commit 23e7101)
projpred 2.1.2
CRAN release: 2022-05-13
Minor changes
- Account for changes concerning the handling of offsets in rstanarm version 2.21.3. In particular, issue stan-dev/rstanarm#542 was fixed in rstanarm 2.21.3.
- Show the output of the vignette on CRAN.
- In the vignette, use
cv_varsel()with LOO CV andvalidate_search = FALSEinstead of K-fold CV. (GitHub: #305) - Improve the documentation for argument
search_termsofvarsel()andcv_varsel(). (GitHub: #155, #308) - In case of user-specified (non-
NULL)search_terms,method = NULLis internally changed tomethod = "forward"andmethod = "L1"throws a warning. This is done becausesearch_termsonly takes effect in case of a forward search. (GitHub: #155, #308) - Internally, the intercept is now always included in
search_terms. This is necessary to prevent a bug described below. (GitHub: #308) - When fitting multilevel submodels via lme4, projpred now tries to handle
PIRLS loop resulted in NaN valueerrors automatically. (GitHub: #314) - The fix for GitHub issue #320 (see below) required to rename argument
bofprojpred:::bootstrap()toB.
Bug fixes
- Throw a more informative error message in case of special group-level terms which are currently not supported (in particular, nested ones).
- Previously, using a
search_termsvector which excluded the intercept in conjunction withrefit_prj = FALSE(the latter inproject(),varsel(), orcv_varsel()) led to incorrect submodels being fetched from the search or to an error while doing so. This has been fixed now by internally forcing the inclusion of the intercept insearch_terms. (GitHub: #308) - Fix GitHub issues #147 and #202. (GitHub: #312)
- Fix GitHub issue #320. (GitHub: #321)
projpred 2.1.1
CRAN release: 2022-04-03
Bug fixes
- Fix the order of the package authors.
- Fix failing CRAN checks.
- Add an input check for argument
solution_termsofproject()to fix a test failure in R versions >= 4.2.
projpred 2.1.0
CRAN release: 2022-04-01
Major changes
- Added support for weighted LOO proportional-to-size subsampling based on Magnusson et al. (2019). However, subsampled PSIS-LOO CV is currently regarded as experimental. Therefore, a corresponding warning is thrown when calling
cv_varsel()withnloo < nwherendenotes the number of observations. (GitHub: #94, #252, commit feea39e) - Automatically explore both linear and smooths components in GAM models. This allows the user to gauge the impact of the smooth term against its linear counterpart.
- Fast approximate LOO computation for
validate_search = FALSEincv_varsel(). - Formerly, the defaults for arguments
nclusters(=1) andnclusters_pred(=5) ofvarsel()andcv_varsel()were set internally (the user-visible defaults wereNULL). Now,nclustersandndraws_pred(note thendraws_pred, notnclusters_pred) have non-NULLuser-visible defaults of20and400, respectively. In general, this increases the runtime of these functions a lot. With respect tocv_varsel(), the new vignette (see vignettes) mentions two ways to quickly obtain some rough preliminary results which in general should not be used as final results, though: (i)varsel()and (ii)cv_varsel()withvalidate_search = FALSE(which only takes effect forcv_method = "LOO"). (GitHub: #291 and several commits beforehand, in particular bbd0f0a, babe031, 4ef95d3, and ce7d1e0) - For
proj_linpred()andproj_predict(), argumentsnterms,ndraws, andseedhave been removed to allow the user to pass them toproject(). New argumentsfilter_nterms,nresample_clusters, and.seedhave been introduced (see the documentation for details). (GitHub: #92, #135) - Reference models lacking an intercept are not supported anymore (actually, the previous implementation for such models was incomplete). Support might be re-introduced in the future (when fixed), but for now it is withdrawn as it requires some larger changes. (GitHub: #124, but see also #96 and #100)
- In the output of
proj_linpred(), dimensions are not dropped anymore (i.e., output elementspredandlpdare always S x N matrices now). (GitHub: #143) - In case of
integrated = TRUE,proj_linpred()now averages the LPD (across the projected posterior draws) instead of taking the LPD at the averaged linear predictors. (GitHub: #143) - If
newdatadoes not contain the response variable,proj_linpred()now returnsNULLfor output elementlpd. (GitHub: #143) - The fix for the offset issues (listed below under “Bug fixes”) requires reference model fits of class
stanreg(from package rstanarm) with offsets to have these offsets specified via anoffset()term in the model formula (and not via argumentoffset). - Improved handling of errors when fitting multilevel submodels. (GitHub: #201)
- Some defaults have been changed from
NULLto a user-visible value (andNULLis not allowed anymore). - Argument
dataofget_refmodel.stanreg()has been removed. (GitHub: #219) - The function passed to argument
div_minimizerofinit_refmodel()now always needs to return alistof submodels (see the documentation for details). Correspondingly, the function passed to argumentproj_predfunofinit_refmodel()can now always expect alistas input for argumentfits(see the documentation for details). (GitHub: #230) - The function passed to argument
proj_predfunofinit_refmodel()now always needs to return a matrix (see the documentation for details). (GitHub: #230) - The projection can be run in parallel now. However, we cannot recommend this for all kinds of platforms and all kinds of models. For more information, see the general package documentation available at
?`projpred-package`. (GitHub: #235) - Support for the
Student_t()family is regarded as experimental. Therefore, a corresponding warning is thrown when creating the reference model. (GitHub: #233, #252) - Support for additive models (i.e., GAMs and GAMMs) is regarded as experimental. Therefore, a corresponding warning is thrown when creating the reference model. (GitHub: #237, #252)
- Support for the
Gamma()family is regarded as experimental. Therefore, a corresponding warning is thrown when creating the reference model. (GitHub: paul-buerkner/brms#1255, #240, #252) - The previous behavior of
init_refmodel()in case of argumentdisbeingNULL(the default) was dangerous for custom reference models with afamilyhaving a dispersion parameter (in that case,disvalues of all-zeros were used silently). The new behavior now requires a non-NULLargumentdisin that case. (GitHub: #254) - Argument
cv_searchhas been renamed torefit_prj. (GitHub: #154, #265) -
as.matrix.projection()has gained a new argumentnm_schemewhich allows to choose the naming scheme for the column names of the returned matrix. The default ("auto") follows the naming scheme of the reference model fit (and uses the"rstanarm"naming scheme if the reference model fit is of an unknown class). (GitHub: #82, #279) -
seed(and.seed) arguments now have a default ofsample.int(.Machine$integer.max, 1)instead ofNULL. Furthermore, the value supplied to these arguments is now used to generate new seeds internally on-the-fly. In many cases, this will change results compared to older projpred versions. Also note that now, the internal seeds are never fixed to a specific value ifseed(and.seed) arguments are set toNULL. (GitHub: #84, #286)
Minor changes
- Improved summary output with important details.
- For group-level effects, the
as.matrix.projection()method now also returns the estimated group-level effects themselves. (GitHub: #75) - For group-level effects, the
as.matrix.projection()method now returns the variance components (population SD(s) and population correlation(s)) instead of the empirical SD(s) of the group-level effects. (GitHub: #74) - Improved documentation. (GitHub: especially #233)
- Replaced the two vignettes by a single one which also has new content. (GitHub: #237)
- Updated the
READMEfile. (GitHub: #245) - Some error and warning messages have been improved and added. (GitHub: especially #219, #221, #223, #252, #263)
- For K-fold cross-validation, an internally hard-coded value of 5 for
nclusters_predwas removed. (GitHub: commit 5062f2f) - Throw a proper error message for unsupported families. (GitHub: #140)
- Show the README also on the CRAN website. (GitHub: #140)
-
project(): Warn if elements ofsolution_termsare not found in the reference model (and therefore ignored). (GitHub: #140) -
get_refmodel.default()now passes arguments via the ellipsis (...) toinit_refmodel(). (GitHub: #153, commit dd3716e) - Remove dependency on package rngtools (version 2.0.0 of projpred re-introduced this dependency after it was already removed in version 1.1.2). (GitHub: #189)
-
init_refmodel(): The default (NULL) for argumentextract_model_datahas been removed as it wasn’t meaningful anyway. (GitHub: #219) - Argument
foldsofinit_refmodel()has been removed as it was effectively unused. (GitHub: #220) - Use the S3 system for
solution_terms(). This allowed the introduction of asolution_terms.projection()method. (GitHub: #223) -
predict.refmodel()now uses a default ofnewdata = NULL. (GitHub: #223) - Argument
weightsofinit_refmodel()’s argumentproj_predfunhas been removed. (GitHub: #163, #224) -
projpred’s internal
div_minimizerfunctions have been unified into a singlediv_minimizerwhich chooses an appropriate submodel fitter based on the formula of the submodel, not based on that of the reference model. Furthermore, the automatic handling of errors in the submodel fitters has been improved. (GitHub: #230) - Improve the axis labels in
plot.vsel(). (GitHub: #234, #270) - Handle rstanarm’s GitHub issue #551. This implies that projpred’s default
cvfunforstanregfits will now always use inner parallelization inrstanarm::kfold.stanreg()(i.e., across chains, not across CV folds), withgetOption("mc.cores", 1)cores. We do so on all systems (not only Windows). (GitHub: #249) - Argument
fitofinit_refmodel()’s argumentproj_predfunwas renamed tofits. This is a non-breaking change since all calls toproj_predfunin projpred have that argument unnamed. However, this cannot be guaranteed in the future, so we strongly encourage users with a customproj_predfunto rename argumentfittofits. (GitHub: #263) -
init_refmodel()has gained argumentcvrefbuilderwhich may be a custom function for constructing the K reference models in a K-fold CV. (GitHub: #271) - Allow arguments to be passed from
project(),varsel(), andcv_varsel()to the divergence minimizer. (GitHub: #278) - In
init_refmodel(), anycontrastsattributes of the dataset’s columns are silently removed. (GitHub: #284) -
NAs in data supplied tonewdataarguments now trigger an error. (GitHub: #285)
Bug fixes
- Fixed a bug in
as.matrix.projection()(causing incorrect column names for the returned matrix). (GitHub: #72, #73) - Fixed a bug raising an error when not projecting from a
vselobject. (GitHub: #79, #80) - Fixed a bug in the calculation of the Gaussian deviance. (GitHub: #81)
- Fixed a bug in the calculation of the predictive statistics of the reference model on test data in
varsel(). (GitHub #90) - Fixed a bug in an input check for argument
nlooofcv_varsel(). (GitHub: #93) - Fixed a bug in
cv_varsel(), causing an error in case of!validate_search && cv_method != "LOO". (GitHub: #95) - Fixed bugs related to the setting of the seed. (GitHub: commit 02cd50d)
- Fixed a bug causing
proj_linpred()to raise an error if argumentnewdatawasNULL. (GitHub: #97) - Fixed an incorrect usage of the dispersion parameter values when calculating output element
lpdinproj_linpred()(forintegrated = TRUEas well as forintegrated = FALSE). (GitHub: #105) - Fixed bugs in
proj_linpred()’s calculation of output elementlpd(forintegrated = TRUE). (GitHub: #106, #112) - Fixed an inconsistency in the dimensions of
proj_linpred()’s output elementspredandlpd(forintegrated = FALSE): Now, they are both S x N matrices, with S denoting the number of (possibly clustered) posterior draws and N denoting the number of observations. (GitHub: #107, #112) - Fixed a bug causing
proj_predict()’s output matrix to be transposed in case ofnrow(newdata) == 1. (GitHub: #112) - Fixed a bug when using weights or offsets e.g. in
proj_linpred(). (GitHub: #114) - Fixed a bug causing
varsel()/make_formulato fail with multidimensional interaction terms. (GitHub: #102, #103) - Fixed an indexing bug in
cv_varsel()for models with a single predictor. (GitHub: #115) - Fixed bugs for argument
ntermsofproj_linpred()andproj_predict(). (GitHub: #110) - Fixed an inconsistency for some intercept-only submodels. (GitHub: #119)
- Fix a bug for
as.matrix.projection()in case of 1 (clustered) draw after projection. (GitHub: #130) - For submodels of class
subfit, make the column names ofas.matrix.projection()’s output matrix consistent with other classes of submodels. (GitHub: #132) - Fix a bug for argument
nterms_maxofplot.vsel()if there is just the intercept-only submodel. (GitHub: #138) - Throw an appropriate error message when trying to apply an L1 search to an empty (i.e. intercept-only) reference model. (GitHub: #139)
- Fix the list names of element
search_pathin, e.g.,varsel()’s output. (GitHub: #140) - Fix a bug (error
unused argument) when initializing the K reference models in a K-fold CV with CV fits not of classbrmsfitorstanreg. (GitHub: #140) - In
get_refmodel.default(), remove old defunct argumentsfetch_data,wobs, andoffset. (GitHub: #140) - Fix a bug in
get_refmodel.stanreg(). (GitHub: #142, #184) - Fix a possible bug related to
extract_model_data()’s argumentextract_yinget_refmodel.default(). (GitHub: #153, commit 39fece8) - Fix a possible bug related to
extract_model_data()in K-fold CV. (GitHub: #153, commit 4f32195) - Fix GitHub issue #161.
- Fix GitHub issue #162.
- Fix GitHub issue #164.
- Fix GitHub issue #160.
- Fix GitHub issue #159.
- Fix GitHub issue #158.
- Fix GitHub issue #157.
- Fix GitHub issue #144.
- Fix GitHub issue #146.
- Fix GitHub issue #169.
- Fix GitHub issue #167.
- Fix a bug in the default
proj_predfun()for GLMMs. (GitHub: #174) - Fix GitHub issue #171.
- Fix GitHub issue #172.
- Fix a bug in the default
proj_predfun()fordatafits. (GitHub: #177) - Fix the names of
summary.vsel()$selectionfor objects of classvselcreated byvarsel(). (GitHub: #179) - Fix forward search when
search_termsare not consecutive in size. (GitHub: commit 34e24de) - Fix a bug in
cv_varsel()$pct_solution_terms_cv. (GitHub: #188, commit e529ec1) - Fix GitHub issue #185. (GitHub: #193, #194)
- Fix a bug in forward searches with interaction terms. (GitHub: #191)
- Fix offset issues. (GitHub: #196, #203, #228)
- Fix a bug in
glm_elnet()(the workhorse for L1 search), causing the grid for lambda to be constructed without taking observation weights into account. (GitHub: #198; note that the second part of #198 did not have any consequences for users) - Fix GitHub issue #136. (GitHub: #221)
- Fix a bug in
print.vsel()causing argumentdigitsto be ignored. (GitHub: #222) - Fix a bug causing the default of argument
cv_searchinvarsel()andcv_varsel()to beTRUEfordatafits, although it should beFALSEin that case. (GitHub: #223) - Fix a bug (
Error: Levels '<...>' of grouping factor '<...>' cannot be found in the fitted model. Consider setting argument 'allow_new_levels' to TRUE.) when predicting from submodels which are GLMMs fornewdatacontaining new levels for grouping factors. (GitHub: #223) -
predict.refmodel(): Fix a bug for integerynew. (GitHub: #223) -
predict.refmodel(): Fix input checks foroffsetnewandweightsnew. (GitHub: #223) - After all calls to
extract_model_data(), the weights and offsets are now checked if they are of length 0 (and if yes, then they are set to vectors of ones and zeros, respectively). This is important forextract_model_data()functions which return weights and offsets of length 0 (see, e.g.,brmsversion <= 2.16.1). (GitHub: #223) - Handle rstanarm’s GitHub issue #546. (GitHub: #227)
- Fix a bug causing the internal submodel fitter for GLMMs to not pass arguments
var(the predictive variances) andregul(amount of ridge regularization) to the internal submodel fitter for GLMs. (GitHub: #230) - Fix GitHub issue #210. (GitHub: #234)
- Fix GitHub issue #242. (GitHub: #253)
- Fix GitHub issue #244. (GitHub: #255)
- Fix GitHub issue #243. (GitHub: #262)
- Fix GitHub issue #213. (GitHub: #264)
- Fix GitHub issue #215. (GitHub: #266)
- Fix GitHub issue #212. (GitHub: #267)
- Fix GitHub issue #156. (GitHub: #269)
- If the data used for the reference model contains
NAs, an appropriate error is now thrown. Previously, the reference model was created successfully, but this caused opaque errors in downstream code such asproject(). (GitHub: #274) - Fix GitHub issue #268. (GitHub: #287)
- Fix GitHub issue #149. (GitHub: #288)
projpred 2.0.2
CRAN release: 2020-10-28
We have fully rewritten the internals in several ways. Most importantly, we now leverage maximum likelihood estimation to third parties depending on the reference model’s family. This allows a lot of flexibility and extensibility for various models. Functionality wise, the major updates since the last release are:
- Added support for GLMMs and GAMMs via lme4 and gamm4.
- Formula syntax support internally that allows for easier building upon projections.
- Thanks to the above point, we save some computation by only considering sensible projections during forward search instead of fitting every possible submodel.
- We have added a new argument
search_termsthat allows the user to specify custom unit building blocks of the projections. New vignette coming up. - We have fully changed the way to define custom reference models. The user now provides projection fitting and prediction functions (more information in a new upcoming vignette).
projpred 1.1.3
Added print methods for vsel and cvsel objects. Added AUC statistics for binomial family. A few additional minor patches.
projpred 1.1.1
CRAN release: 2019-03-12
This version contains only a few patches, no new features to the user.
projpred 1.1.0
CRAN release: 2018-10-23
New features
- Added support for brms models.
Bug fixes
- The program crashed with rstanarm models fitted with syntax like
stan_glm(log(y) ~ log(x), ...), that is, it did not allow transformation fory.
projpred 1.0.0
CRAN release: 2018-09-18
New features and improvements
- Changed the internals so that now all fit objects (such as rstanarm fits) are converted to
refmodel-objects using the genericget_refmodel-function, and all the functions use only this object. This makes it much easier to use projpred with other reference models by writing them a newget_refmodel-function. The syntax is now changed so thatvarselandcv_varselboth return an object that has similar structure always, and the reference model is stored into this object. - Added more examples to the vignette.
- Added possibility to change the baseline in
plot/summary. Now it is possible to compare also to the best submodel found, not only to the reference model. - Bug fix: RMSE was previously computed wrong, this is now fixed.
- Small changes:
nloo = nby default incv_varsel.regul=1e-4now by default in all functions.
projpred 0.9.0
New features and improvements
- Added the
cv_searchargument for the main functions (varsel,cv_varsel,projectand the prediction functions). Now it is possible to make predictions also with those parameter estimates that were computed during the L1-penalized search. This change also allows the user to compute the Lasso-solution by providing the observed data as the ‘reference fit’ for init_refmodel. An example will be added to the vignette.