Gradient Boosted Regression Tree for Classification - Nori12/Machine-Learning-Tutorial GitHub Wiki
Machine Learning Tutorial
Gradient Boosted Regression Tree for Classification
The main parameters for tuning are n_estimators and the learning_rate. They are highly interconnected (a lower learning_rate means that more trees are needed to build a model of similar complexity).
from sklearn.ensemble import GradientBoostingClassifier
gbrt = GradientBoostingClassifier(random_state=0, max_depth=1, learning_rate=0.01)
gbrt.fit(X_train, y_train)
Tips
- To reduce overfitting, apply stronger pre-pruning by limiting the maximum depth or lower the learning rate;
- Increasing n_estimators leads to a more complex model, which may lead to overfitting. In random forests a higher n_estimator is always better;
- A common practice is to fit n_estimators depending on the time and memory budget, and then search over different learning_rates;
- max_depth is also an important parameter set very low for gradient boosted models, often not deeper than five splits.