18 04 Boosting - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki
Boosting
- Ensemble method combining several weak learners to form a strong learner
- Weak learner: Model doing slightly better than random guessing
- Example: Decision stump (CART whose maximum depth = 1)
- Train an ensemble of predictors sequentially
- Each predictor tries to correct its predecessor
- Most popular boosting methods
- AdaBoost
- Gradient Boosting
01 Adaboost
- Adaptive Boosting
- Each predictor pays more attention to the instances wrongly predicted by its predecessor
- Achieved by changing the weights of training instances
- Each predictor is assigned a coefficient alpha
- alpha depends on the predictor's training error
- Learning Rate**: 0 < eta <= 1
- trade-off between eta and # of estimators
AdaBoostClassifier
& AdaBoostRegressor
Example
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier
# Import AdaBoostClassifier
from sklearn.ensemble import AdaBoostClassifier
# Instantiate dt
dt = DecisionTreeClassifier(max_depth=2, random_state=1)
# Instantiate ada
ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)
# Fit ada to the training set
ada.fit(X_train, y_train)
# Compute the probabilities of obtaining the positive class
y_pred_proba = ada.predict_proba(X_test)[:,1]
- Evaluate the AdaBoost Classifier
# Import roc_auc_score
from sklearn.metrics import roc_auc_score
# Evaluate test-set roc_auc_score
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)
# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))
02 Gradient Boosting (GB)
Gradient Boosted Trees
- Sequential correction of predecessor's error
- Does not tweak the weights of training instances
- Fit each predictor is trained using its predecessor's residual errors as labels
- a CART is used as a base learner
- Shrinkage
GradientBoostingRegressor
, GradientBoostingClassifier
Example
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor
# Instantiate gb
gb = GradientBoostingRegressor(max_depth=4,
n_estimators=200,
random_state=2)
# Fit gb to the training set
gb.fit(X_train, y_train)
# Predict test set labels
y_pred = gb.predict(X_test)
# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE
# Compute MSE
mse_test = MSE(y_test, y_pred)
# Compute RMSE
rmse_test = mse_test**(1/2)
# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(rmse_test)). # Test set RMSE of gb: 52.065
03 Stochastic Gradient Boosting (SGB)
Cons of Gradient Boosting
- GB involves an exhaustive search procedure
- Each CART is trained to find the best split points and features
- May lead to CARTs using the same split points and maybe the same features
Stochastic Gradient Boosting (SGB)
- Each tree is trained on a random subset of rows of the training data
- 40-80%, sampled without replacement
- Features are sampled (without replacement)
- Result: further ensemble diversity
- Effect: adding further variance to the ensemble of trees
Example
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor
# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4,
subsample=0.9,
max_features=0.75,
n_estimators=200,
random_state=2)
# Fit sgbr to the training set
sgbr.fit(X_train, y_train)
# Predict test set labels
y_pred = sgbr.predict(X_test)
# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE
# Compute test set MSE
mse_test = MSE(y_test, y_pred)
# Compute test set RMSE
rmse_test = mse_test**(1/2)
# Print rmse_test
print('Test set RMSE of sgbr: {:.3f}'.format(rmse_test)) # est set RMSE of sgbr: 49.979