18 04 Boosting - HannaAA17/Data-Scientist-With-Python-datacamp GitHub Wiki

Boosting

  • Ensemble method combining several weak learners to form a strong learner
  • Weak learner: Model doing slightly better than random guessing
    • Example: Decision stump (CART whose maximum depth = 1)
  • Train an ensemble of predictors sequentially
  • Each predictor tries to correct its predecessor
  • Most popular boosting methods
    • AdaBoost
    • Gradient Boosting

01 Adaboost

  • Adaptive Boosting
  • Each predictor pays more attention to the instances wrongly predicted by its predecessor
  • Achieved by changing the weights of training instances
  • Each predictor is assigned a coefficient alpha
  • alpha depends on the predictor's training error
  • Learning Rate**: 0 < eta <= 1
    • trade-off between eta and # of estimators
  • AdaBoostClassifier & AdaBoostRegressor

Example

  • Define the classifier
# Import DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier

# Import AdaBoostClassifier
from sklearn.ensemble import AdaBoostClassifier

# Instantiate dt
dt = DecisionTreeClassifier(max_depth=2, random_state=1)

# Instantiate ada
ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)
  • Train the classifier
# Fit ada to the training set
ada.fit(X_train, y_train)

# Compute the probabilities of obtaining the positive class
y_pred_proba = ada.predict_proba(X_test)[:,1]
  • Evaluate the AdaBoost Classifier
# Import roc_auc_score
from sklearn.metrics import roc_auc_score

# Evaluate test-set roc_auc_score
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)

# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))

02 Gradient Boosting (GB)

Gradient Boosted Trees

  • Sequential correction of predecessor's error
  • Does not tweak the weights of training instances
  • Fit each predictor is trained using its predecessor's residual errors as labels
  • a CART is used as a base learner
  • Shrinkage
  • GradientBoostingRegressor , GradientBoostingClassifier

Example

  • Define , Train, Evaluate
# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate gb
gb = GradientBoostingRegressor(max_depth=4, 
            n_estimators=200,
            random_state=2)

# Fit gb to the training set
gb.fit(X_train, y_train)

# Predict test set labels
y_pred = gb.predict(X_test)

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute MSE
mse_test = MSE(y_test, y_pred)

# Compute RMSE
rmse_test = mse_test**(1/2)

# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(rmse_test)). # Test set RMSE of gb: 52.065

03 Stochastic Gradient Boosting (SGB)

Cons of Gradient Boosting

  • GB involves an exhaustive search procedure
  • Each CART is trained to find the best split points and features
  • May lead to CARTs using the same split points and maybe the same features

Stochastic Gradient Boosting (SGB)

  • Each tree is trained on a random subset of rows of the training data
    • 40-80%, sampled without replacement
  • Features are sampled (without replacement)
  • Result: further ensemble diversity
  • Effect: adding further variance to the ensemble of trees

Example

# Import GradientBoostingRegressor
from sklearn.ensemble import GradientBoostingRegressor

# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4, 
            subsample=0.9,
            max_features=0.75,
            n_estimators=200,                                
            random_state=2)

# Fit sgbr to the training set
sgbr.fit(X_train, y_train)

# Predict test set labels
y_pred = sgbr.predict(X_test)

# Import mean_squared_error as MSE
from sklearn.metrics import mean_squared_error as MSE

# Compute test set MSE
mse_test = MSE(y_test, y_pred)

# Compute test set RMSE
rmse_test = mse_test**(1/2)

# Print rmse_test
print('Test set RMSE of sgbr: {:.3f}'.format(rmse_test)) # est set RMSE of sgbr: 49.979