31.4 Bagging

This is an old version, view current version.

When bootstrapping is carried through inference it is known as bootstrap aggregation, or bagging, in the machine-learning literature (Breiman 1996). In the simplest case, this involves bootstrapping the original data, fitting a model to each bootstrapped data set, then averaging the predictions. For instance, rather than using an estimate $\hat{\sigma}$ from the original data set, bootstrapped data sets $y^{\textrm{boot}(1)}, \ldots, y^{\textrm{boot}(N)}$ are generated. Each is used to generate an estimate $\hat{\sigma}^{\textrm{boot}(n)}.$ The final estimate is $\hat{\sigma} = \frac{1}{N} \sum_{n = 1}^N \hat{\sigma}^{\textrm{boot}(n)}.$ The same would be done to estimate a predictive quantity $\tilde{y}$ for as yet unseen data. $\hat{\tilde{y}} = \frac{1}{N} \sum_{n = 1}^N \hat{\tilde{y}}^{\textrm{boot}(n)}.$ For discrete parameters, voting is used to select the outcome.

One way of viewing bagging is as a classical attempt to get something like averaging over parameter estimation uncertainty.

References

Breiman, Leo. 1996. “Bagging Predictors.” Machine Learning 24 (2): 123–40.