ml - not-so-fat/conjurer GitHub Wiki

Motivation of ml

Provide method to tune hyper parameters of machine learning algorithm. To use RandomizedSearchCV / GridSearchCV with pandas.DataFrame, we use sklearn_cv_pandas.

Supported process in pipeline

Machine learning algorithm

Based on the argument ml_type, following machine algorithm is used

  • lightgbm (gbm_autosplit.LGBMClassifier or gbm_autosplit.LGBMRegressor)
  • xgboost (gbm_autosplit.XGBClassfier or gbm_autosplit.XGBRegressor)
  • random_forest (sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor)
  • linear_model (sklearn.linear_model.Lasso or sklearn.linear_model.LogisticRegression)

Preprocessing

Always use following missing imputation & scaler

  • sklearn.impute.SimpleImputer
  • sklearn.preprocessing.StandardScaler