Gradient Boosted Regression Tree for Classification - Nori12/Machine-Learning-Tutorial GitHub Wiki

Machine Learning Tutorial

Gradient Boosted Regression Tree for Classification

The main parameters for tuning are n_estimators and the learning_rate. They are highly interconnected (a lower learning_rate means that more trees are needed to build a model of similar complexity).

from sklearn.ensemble import GradientBoostingClassifier 

gbrt = GradientBoostingClassifier(random_state=0, max_depth=1, learning_rate=0.01)
gbrt.fit(X_train, y_train)

Tips

  • To reduce overfitting, apply stronger pre-pruning by limiting the maximum depth or lower the learning rate;
  • Increasing n_estimators leads to a more complex model, which may lead to overfitting. In random forests a higher n_estimator is always better;
  • A common practice is to fit n_estimators depending on the time and memory budget, and then search over different learning_rates;
  • max_depth is also an important parameter set very low for gradient boosted models, often not deeper than five splits.