AutoML - rudydesplan/book_rating GitHub Wiki


In order to find the best model, we tried the AutoML feature of Azure Machine Learning tools.

To do so, we first created our workspace and then launched an AutoML task. We fed the task with data coming from the preprocessing steps with outliers kept (with_outliers.csv). After waiting for about 45 minutes, we got the results. Our settings were the following : automatic validation set creation and random split of the set between 70% for the training set and 30% for the test set. The metric proposed by azure that fit the best for our task was : the normalized RMSE (NRMSE)

The best results came from two ensemble methods : VotingEnsemble and StackEnsemble.

The models used in theses ensemble methods were the five first that gave the bests results :

  1. Random Forest with a MinMaxScaler preprocessor (hyperparameters : "bootstrap" : "false", "max_features" : 0.7, "n_estimators" : 25)
  2. Random Forest with a MinMaxScaler preprocessor (hyperparameters : bootstrap : true, "max_features" : "sqrt", "n_estimators" : 25)
  3. LightGBM with MaxAbsScaler preprocessor (hyperparameters : "min_data_in_leaf" : 20)
  4. Random Forest with StandardScalerWrapper preprocessor("with_mean" : "true" and "with_std" : "true") (hyperparameters : "bootstrap": "true", "max_features": 0.4,"min_samples_leaf": 0.005080937188890647,"min_samples_split": 0.0008991789964660114, "n_estimators": 50)
  5. XGBoostRegressor with MaxAbsScaler preprocessor (hyperparameters : "tree_method": auto)
Models NRMSE MAE RMSE R2
1. 0.06112 0.20715 0.30561 0.19516
2. 0.061216 0.20780 0.30608 0.19277
3. 0.061575 0.20283 0.30787 0.18066
4. 0.061804 0.20793 0.30902 0.17753
5. 0.062687 0.20392 0.31343 0.15162

Finally, the ensemble methods tested by AutoML tasks were:

  • VotingEnsemble (weights for each models : 1. : 0.2, 2. : 0.266, 3. : 0.1333, 4. : 0.0667, 5. : 0.3333)
  • StackEnsemble
Models NRMSE MAE RMSE R2
VotingEnsemble 0.059761 0.19929 0.29880 0.23028
StackEnsemble 0.060298 0.19996 0.30149 0.21649

For the VotingEnsemble models, the most important features were the following :