ml - not-so-fat/conjurer GitHub Wiki
Motivation of ml
Provide method to tune hyper parameters of machine learning algorithm.
To use RandomizedSearchCV / GridSearchCV with pandas.DataFrame, we use sklearn_cv_pandas.
Supported process in pipeline
Machine learning algorithm
Based on the argument ml_type, following machine algorithm is used
- lightgbm (gbm_autosplit.LGBMClassifier or gbm_autosplit.LGBMRegressor)
- xgboost (gbm_autosplit.XGBClassfier or gbm_autosplit.XGBRegressor)
- random_forest (sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor)
- linear_model (sklearn.linear_model.Lasso or sklearn.linear_model.LogisticRegression)
Preprocessing
Always use following missing imputation & scaler
sklearn.impute.SimpleImputersklearn.preprocessing.StandardScaler