Part 10: Model Selection - oooookk7/machine-learning-a-to-z GitHub Wiki

What is Model Selection?

It is the process of selecting one final ML model from among a collection of candidate machine learning models for a training dataset which addresses the original problem statement (e.g. business or problem which the model aims to solve).

Model selection evaluates or assess candidate models in order to choose the best one, and Model assessment happens after where it evaluates how well it is expected to perform in general.

k-fold Cross Validation and Grid search

See related notebook 10/k-fold_and_grid-search.ipynb.

Note that k-fold is different from its extension, the Stratified k-fold,

(Source: Z² Little, 2020)

Instead of a single fold as the test dataset, it takes samples from each fold to set as parts of the test data. It only and always shuffles data 1 time before splitting.

Or the Stratified shuffle split

(Source: Z² Little, 2020)

In shuffle split, the data is shuffled every time and then split, meaning that the test sets may overlap between the splits.

Akaike information criterion (AIC) and Bayesian information criterion (BIC)

See related notebook for simple application 10/aic_and_bic.ipynb.

XGBoost

See related notebook 10/xg_boost-classifier.ipynb (for classifier) and 10/xg_boost-regressor.ipynb (for regressor).