Bagging In Machine Learning, What Is It?

Bagging is a Machine Learning method to improve the performance and stability of algorithms. Bagging can be used in regression as well as in classification. It reduces the variance of the model and limits its overfitting. The final prediction in fact takes into consideration all the models trained to make its final prediction. In classification, we speak of a “model vote”.

Table of Contents

Definition Of Bagging

Bagging is a meta-algorithm that is part of the set methods: starting from a Machine Learning algorithm, it uses this algorithm multiple times to obtain a more reliable result. Concretely, the bagging performs a sampling of the data and trains the algorithm separately on each of these samples. It then assembles the results of the models obtained.

Combine Predictions

The word bagging is the contraction of “ Bootstrap Aggregating ”. It is a concept that is applied in the field of Machine Learning or predictive data mining. It allows the predictions made from several models to be combined, using the same algorithm for different samples of the training data. Bagging is also used to provide solutions to problems related to the instability of results when complex models are applied to small datasets.

Weak And Strong Learner

An artificial intelligence technique , bagging consists of assembling a large number of algorithms with low individual performance. The goal is to create a more efficient performance. We use the term ” weak learners ” to refer to low performance algorithms that allow a single large algorithm called ” strong learner “.

Bagging is therefore a method particularly put into practice to improve the learning of decision trees, considered as “weak classifiers” because they have limited performance and are quite unstable (small changes in the data can strongly modify the learning of the model).

Detailed Bagging Method

Thanks to the methods of the bagging type, it is possible to build several instances of estimators which are calculated on random samples arising from the learning base. This then combines the individual predictions by calculating their average in order to reduce the variance of the estimator. This favors the construction of a better version of the basic algorithm without going through the modification of the algorithm in question. Bagging methods also work well with “strong” predictors.

Bagging And Decision Trees: The Random Forest

It is the randomness of bagging that gives the random forest its name. The Random Forest algorithm is simply the bagging of decision trees (regression or classification trees).

Each tree is trained on a subset of the dataset and yields a result. All decision trees lead to results that are combined to give a final answer. To facilitate understanding, we can say that each tree “votes” yes or no. And it’s the final answer that gets the majority of votes.

Also Read: The Use Of Machine Learning In Production