18 04 Boosting - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

Boosting

Ensemble method combining several weak learners to form a strong learner
Weak learner: Model doing slightly better than random guessing
- Example: Decision stump (CART whose maximum depth = 1)
Train an ensemble of predictors sequentially
Each predictor tries to correct its predecessor
Most popular boosting methods
- AdaBoost
- Gradient Boosting

01 Adaboost

Adaptive Boosting
Each predictor pays more attention to the instances wrongly predicted by its predecessor
Achieved by changing the weights of training instances
Each predictor is assigned a coefficient alpha
alpha depends on the predictor's training error
Learning Rate**: 0 < eta <= 1
- trade-off between eta and # of estimators
AdaBoostClassifier & AdaBoostRegressor

Example

Define the classifier

# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Import AdaBoostClassifier
from sklearn.ensemble import AdaBoostClassifier

# Instantiate dt
dt = DecisionTreeClassifier(max_depth=2, random_state=1)

# Instantiate ada
ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)

Train the classifier

# Fit ada to the training set
ada.fit(X_train, y_train)

# Compute the probabilities of obtaining the positive class
y_pred_proba = ada.predict_proba(X_test)[:,1]

Evaluate the AdaBoost Classifier

# Import roc_auc_score
from sklearn.metrics import roc_auc_score

# Evaluate test-set roc_auc_score
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)

# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))

02 Gradient Boosting (GB)

Gradient Boosted Trees

Sequential correction of predecessor's error
Does not tweak the weights of training instances
Fit each predictor is trained using its predecessor's residual errors as labels
a CART is used as a base learner
Shrinkage
GradientBoostingRegressor , GradientBoostingClassifier

Example

Define , Train, Evaluate

# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate gb
gb = GradientBoostingRegressor(max_depth=4, 
            n_estimators=200,
            random_state=2)

# Fit gb to the training set
gb.fit(X_train, y_train)

# Predict test set labels
y_pred = gb.predict(X_test)

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute MSE
mse_test = MSE(y_test, y_pred)

# Compute RMSE
rmse_test = mse_test**(1/2)

# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(rmse_test)). # Test set RMSE of gb: 52.065

03 Stochastic Gradient Boosting (SGB)

Cons of Gradient Boosting

GB involves an exhaustive search procedure
Each CART is trained to find the best split points and features
May lead to CARTs using the same split points and maybe the same features

Stochastic Gradient Boosting (SGB)

Each tree is trained on a random subset of rows of the training data
- 40-80%, sampled without replacement
Features are sampled (without replacement)
Result: further ensemble diversity
Effect: adding further variance to the ensemble of trees

Example

# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4, 
            subsample=0.9,
            max_features=0.75,
            n_estimators=200,                                
            random_state=2)

# Fit sgbr to the training set
sgbr.fit(X_train, y_train)

# Predict test set labels
y_pred = sgbr.predict(X_test)

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute test set MSE
mse_test = MSE(y_test, y_pred)

# Compute test set RMSE
rmse_test = mse_test**(1/2)

# Print rmse_test
print('Test set RMSE of sgbr: {:.3f}'.format(rmse_test)) # est set RMSE of sgbr: 49.979