ml - not-so-fat/conjurer GitHub Wiki
Motivation of ml
Provide method to tune hyper parameters of machine learning algorithm.
To use RandomizedSearchCV / GridSearchCV with pandas.DataFrame, we use sklearn_cv_pandas
.
Supported process in pipeline
Machine learning algorithm
Based on the argument ml_type
, following machine algorithm is used
- lightgbm (gbm_autosplit.LGBMClassifier or gbm_autosplit.LGBMRegressor)
- xgboost (gbm_autosplit.XGBClassfier or gbm_autosplit.XGBRegressor)
- random_forest (sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor)
- linear_model (sklearn.linear_model.Lasso or sklearn.linear_model.LogisticRegression)
Preprocessing
Always use following missing imputation & scaler
sklearn.impute.SimpleImputer
sklearn.preprocessing.StandardScaler