Project Workflow 2 - sideround/project-ml-onlineshop GitHub Wiki
1. Data cleaning, visualization, correlation, transformation
- Clean errors of the columns with a function
- Create random column (to be used in the feature selection)
- Column transformation
download clean csv without filling csv
- Fill missing values (ffill)
download clean_no_nan to csv
2. ML Modeling
download encoded data to csv
- define X and y
- Balance data (undersample NearMiss)
- Split data (train_test_split)
- Scale data (StandardScaler)
- Run each model
- Short description about each model
- In RandomForest do the feature selection
- Run models with only selected features (probably will do it in the pipeline)
- Visualize model results
- Hyperparameter tuning for each model with GridSearch
- Run each model with best parameters
- Visualize model results
3. ML Modeling (pipeline and deep learning)
- Create pipeline and try all models models
- Try with different scalers
- Try with oversampling (SMOTE)
- Other changes (different percentage of train and test..)
- Cross-validation with and without kfold
- Run one deep learning model
- 1st part - Sosa (with Isaak if has the function finished)
- 2nd part - Kristina with Sosa (Sosa taking care of visualizations)
- 3rd part - Pau and Jota
- Readme - Isaak
- Presentation ?