ml - not-so-fat/conjurer GitHub Wiki

Motivation of ml

Provide method to tune hyper parameters of machine learning algorithm. To use RandomizedSearchCV / GridSearchCV with pandas.DataFrame, we use sklearn_cv_pandas.

Supported process in pipeline

Machine learning algorithm

Based on the argument ml_type, following machine algorithm is used

lightgbm (gbm_autosplit.LGBMClassifier or gbm_autosplit.LGBMRegressor)
xgboost (gbm_autosplit.XGBClassfier or gbm_autosplit.XGBRegressor)
random_forest (sklearn.ensemble.RandomForestClassifier or sklearn.ensemble.RandomForestRegressor)
linear_model (sklearn.linear_model.Lasso or sklearn.linear_model.LogisticRegression)

Preprocessing

Always use following missing imputation & scaler

sklearn.impute.SimpleImputer
sklearn.preprocessing.StandardScaler