ML2 ‐ Lec (5) - RenadShamrani/test GitHub Wiki

🧠 Ensemble Learning

Definition: Combining multiple models (base learners) to improve accuracy and generalization.

Key Benefits:

✅ More robust predictions
✅ Reduces overfitting
✅ Improves stability

🏆 Types of Ensemble Learning

1️⃣ Homogeneous Ensembles

🔹 Use the same algorithm but train on different data.
Examples:

✔ Bagging (Bootstrap Aggregating)
✔ Boosting (Sequential Learning)

2️⃣ Heterogeneous Ensembles

🔹 Use different algorithms on the same data.
Examples:

✔ Voting (Majority decision)
✔ Stacking (Meta-learning)

🏗 Bagging (Bootstrap Aggregating)

✔ Goal: Reduce variance, stabilize predictions.
✔ How it works:

Bootstrap: Create random subsets of data (with replacement).
Train multiple models independently on these subsets.
Aggregate predictions (majority voting for classification, averaging for regression).

🔹 Common Algorithm: Random Forest 🌳
🔹 Reduces overfitting, works well for high-variance models (e.g., decision trees).

📌 Formula:
For classification, final prediction = majority vote
For regression, final prediction =
[ \hat{y} = \frac{1}{M} \sum_{m=1}^{M} G_m(x) ]
where (G_m(x)) is the prediction of the (m)th model.

⚡ Boosting (Sequential Learning)

✔ Goal: Reduce bias & variance by focusing on misclassified samples.
✔ How it works:

Train a weak learner.
Identify misclassified samples and increase their weight.
Train the next model to correct these mistakes.
Final prediction: weighted combination of all models.

🔹 Common Algorithms:

✔ AdaBoost (Adjusts sample weights)
✔ Gradient Boosting (Optimizes loss function)
✔ XGBoost, LightGBM, CatBoost (Advanced versions)

📌 Formula:
Final prediction:
[ F(x) = \sum_{m=1}^{M} \alpha_m G_m(x) ]
where (\alpha_m) is the weight of each model.

🚨 Boosting can overfit! Needs careful tuning.

🔄 Comparison of Bagging vs Boosting vs Stacking

Criteria	Bagging 🏗	Boosting ⚡	Stacking 🏆
Approach	Parallel training	Sequential training	Meta-learning
Goal	Reduce variance	Reduce bias & variance	Improve accuracy
Base Models	Homogeneous	Homogeneous	Heterogeneous
Final Prediction	Voting/Averaging	Weighted sum	Meta-model

🎯 Key Takeaways

✅ Bagging → Best for reducing overfitting
✅ Boosting → Best for improving accuracy
✅ Stacking → Best for combining different models

Use Bagging (Random Forest) when overfitting is a problem.
Use Boosting (AdaBoost, XGBoost) for high accuracy but watch for overfitting.
Use Stacking when different models capture different aspects of data.

1. Ensemble Learning 🤝

What?: Combine multiple models (weak learners) to create a stronger model.
Goal: Improve accuracy, reduce overfitting, and increase robustness.
Types:
- Homogeneous: Same algorithm, different data (e.g., Bagging, Boosting).
- Heterogeneous: Different algorithms, same data (e.g., Stacking).

2. Bagging (Bootstrap Aggregating) 🎒

What?: Train multiple models on different subsets of data (sampled with replacement).
Aggregation: Average (regression) or majority vote (classification).
Example: Random Forest 🌳 (ensemble of decision trees).
Advantages:
- Reduces variance and overfitting.
- Improves accuracy and stability.
- Easy to parallelize.

3. Boosting 🚀

What?: Sequentially train models, focusing on misclassified samples.
How?: Increase weights of misclassified samples in each iteration.
Example: AdaBoost (Adaptive Boosting).
Advantages:
- Reduces bias and variance.
- Improves accuracy by correcting errors.
- Handles noisy data well.

4. Stacking 🥞

What?: Combine predictions of multiple models using a meta-model.
Steps:
1. Train base models (level-0).
2. Use their predictions as input to train a meta-model (level-1).
Advantages:
- Leverages model diversity.
- Often improves performance over individual models.

5. Key Concepts 🔑

Base Learners: Individual models in the ensemble.
Diversity: Ensures models make different errors.
Aggregation: Combine predictions (e.g., averaging, voting).
Random Forest: Bagging + Decision Trees.
AdaBoost: Boosting + Decision Stumps.

Mind Map 🧠

Ensemble Learning
├── Bagging (Bootstrap Aggregating)
│   ├── Random Forest (Decision Trees)
│   ├── Reduces Variance
│   └── Parallel Training
├── Boosting
│   ├── AdaBoost (Sequential Training)
│   ├── Focuses on Misclassified Samples
│   └── Reduces Bias
└── Stacking
    ├── Combines Predictions with Meta-Model
    └── Leverages Model Diversity

Key Symbols 🔑

M: Number of models.
D: Dataset.
G_m(x): Model m's prediction.
w_i: Weight of model i (in Boosting).

You’re ready! 🎉 Just remember Bagging = parallel training, Boosting = sequential training, and Stacking = meta-model! 🚀