Perform cross-validation for the projective variable selection for a generalized linear model.

cv_varsel(fit, method = NULL, cv_method = NULL, ns = NULL,
  nc = NULL, nspred = NULL, ncpred = NULL, relax = NULL,
  nv_max = NULL, intercept = NULL, penalty = NULL, verbose = T,
  nloo = NULL, K = NULL, lambda_min_ratio = 1e-05, nlambda = 150,
  thresh = 1e-06, regul = 1e-04, validate_search = T, seed = NULL,
  ...)

Arguments

fit

Same as in varsel.

method

Same as in varsel.

cv_method

The cross-validation method, either 'LOO' or 'kfold'. Default is 'LOO'.

ns

Number of samples used for selection. Ignored if nc is provided or if method='L1'.

nc

Number of clusters used for selection. Default is 1 and ignored if method='L1' (L1-search uses always one cluster).

nspred

Number of samples used for prediction (after selection). Ignored if ncpred is given.

ncpred

Number of clusters used for prediction (after selection). Default is 5.

relax

Same as in varsel.

nv_max

Same as in varsel.

intercept

Same as in varsel.

penalty

Same as in varsel.

verbose

Whether to print out some information during the validation, Default is TRUE.

nloo

Number of observations used to compute the LOO validation (anything between 1 and the total number of observations). Smaller values lead to faster computation but higher uncertainty (larger errorbars) in the accuracy estimation. Default is to use all observations, but for faster experimentation, one can set this to a small value such as 100. Only applicable if cv_method = LOO.

K

Number of folds in the k-fold cross validation. Only applicable if cv_method = TRUE and k_fold = NULL.

lambda_min_ratio

Same as in varsel.

nlambda

Same as in varsel.

thresh

Same as in varsel.

regul

Amount of regularization in the projection. Usually there is no need for regularization, but sometimes for some models the projection can be ill-behaved and we need to add some regularization to avoid numerical problems.

validate_search

Whether to cross-validate also the selection process, that is, whether to perform selection separately for each fold. Default is TRUE and we strongly recommend not setting this to FALSE, because this is known to bias the accuracy estimates for the selected submodels. However, setting this to FALSE can sometimes be useful because comparing the results to the case where this parameter is TRUE gives idea how strongly the feature selection is (over)fitted to the data (the difference corresponds to the search degrees of freedom or the effective number of parameters introduced by the selectin process).

seed

Random seed used in the subsampling LOO. By default uses a fixed seed.

...

Additional arguments to be passed to the get_refmodel-function.

Value

An object of type cvsel that contains information about the feature selection. The fields are not meant to be accessed directly by the user but instead via the helper functions (see the vignettes or type ?projpred to see the main functions in the package.)

Examples

### Usage with stanreg objects fit <- stan_glm(y~x, binomial())
#> Error in stan_glm(y ~ x, binomial()): could not find function "stan_glm"
cvs <- cv_varsel(fit)
#> Error in get_refmodel(fit, ...): object 'fit' not found
#> Error in "vsel" %in% class(object): object 'cvs' not found