Ensemble Learning - Singhak/MLIQ GitHub Wiki

1. What is ensembling?

In general, ensembling is a technique of combining two or more algorithms of similar or dissimilar types called base learners.
This is done to make a more robust system which incorporates the predictions from all the base learners.
You can consider another example of a candidate going through multiple rounds of job interviews. The final decision of candidate's ability is generally taken based on the feedback of all the interviewers. Although a single interviewer might not be able to test the candidate for each required skill and trait. But the combined feedback of multiple interviewers usually helps in better assessment of the candidate.

2. Types of ensembling

Bagging:
- Bagging is also referred to as bootstrap aggregation. To understand bagging, we first need to understand bootstrapping. Bootstrapping is a sampling technique in which we choose 'n' observations or rows out of the original dataset of 'n' rows as well. But the key is that each row is selected with replacement from the original dataset so that each row is equally likely to be selected in each iteration.
- One important thing to note here is that it’s done mainly to reduce the variance. Now, random forest actually uses this concept but it goes a step ahead to further reduce the variance by randomly choosing a subset of features as well for each bootstrapped sample to make the splits while training.
Boosting:
- Boosting is a sequential technique in which, the first algorithm is trained on the entire dataset and the subsequent algorithms are built by fitting the residuals of the first algorithm, thus giving higher weight to those observations that were poorly predicted by the previous model.
- It relies on creating a series of weak learners each of which might not be good for the entire dataset but is good for some part of the dataset. Thus, each model actually boosts the performance of the ensemble.
- It’s really important to note that boosting is focused on reducing the bias. This makes the boosting algorithms prone to overfitting. Thus, parameter tuning becomes a crucial part of boosting algorithms to make them avoid overfitting.
- Some examples of boosting are XGBoost, GBM, ADABOOST, etc.
Stacking:
- In stacking multiple layers of machine learning models are placed one over another where each of the models passes their predictions to the model in the layer above it and the top layer model takes decisions based on the outputs of the models in layers below it.
- The basic idea is to train machine learning algorithms with training dataset and then generate a new dataset with these models. Then this new dataset is used as input for the combiner machine learning algorithm.

3. Advantages and Disadvantages of ensembling

Advantages
- Ensembling is a proven method for improving the accuracy of the model and works in most of the cases.
- It is the key ingredient for winning almost all of the machine learning hackathons.
- Ensembling makes the model more robust and stable thus ensuring decent performance on the test cases in most scenarios.
- You can use ensembling to capture linear and simple as well non-linear complex relationships in the data. This can be done by using two different models and forming an ensemble of two.
Disadvantages
- Ensembling reduces the model interpretability and makes it very difficult to draw any crucial business insights at the end.
- It is time-consuming and thus might not be the best idea for real-time applications.
- The selection of models for creating an ensemble is an art which is really hard to master.