List of Models - AgileDataScienceUB/ADS4 GitHub Wiki

Logistic Regression

Logistic regression, despite its name, is a linear model for classification rather than regression. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function.

Read more in the User Guide.

We will use its implementation on the class LogisticRegressionthe contained in the package sklearn. This implementation can fit binary, One-vs- Rest, or multinomial logistic regression with optional L2 or L1 regularization.

Parameters

Our model will optomize the following paramenters:

  • Penalty : str, ‘l1’ or ‘l2’. Used to specify the norm used in the penalization.
  • C : positive float. Inverse of regularization strength.

XGboost

XGBoost (eXtreme Gradient Boosting) is an advanced implementation of gradient boosting algorithm. We will use the implementetion of the package xgboost.

Parameters:

There are several tuning techniques for this method. See the following guide

Random forest classifier

A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True

Parameters

  • n_estimators : integer, The number of trees in the forest.