ML: Algorithm By Function - dudycooly/1235 GitHub Wiki

Regression

In Statistics, Regression is defined as "a measure of the relation between the mean value of one variable (e.g. output) and corresponding values of other variables (e.g. time and cost)."

Hence these algorithms is concerned with modelling the relationship between variables that is iteratively refined using a measure of error in the predictions made by the model.

  • Ordinary Least Squares Regression (OLSR)
  • Linear Regression
  • Logistic Regression
  • Stepwise Regression
  • Multivariate Adaptive Regression Splines (MARS)
  • Locally Estimated Scatterplot Smoothing (LOESS)

Instance-based (or Memory-based)

Instead of performing explicit generalisation (or approximation as seen in regression), these algorithms compare new problem instances (data) with instances seen in training (training data points), which have been stored in memory (database). A similarity measure is used in comparison to make a prediction.

  • k-Nearest Neighbor (kNN)
  • Learning Vector Quantization (LVQ)
  • Self-Organizing Map (SOM)
  • Locally Weighted Learning (LWL)

Regularisation (Penalisation)

In statistics, regularisation is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting

The goal of a better ML is to model the pattern and ignore the noise. Anytime an algorithm is trying to fit the noise in addition to the pattern, it is overfitting.

This overfitting is reduced artificially by penalizing complex models by favouring simpler models that are also better at generalizing. This is called regularisation

In simple terms, these are extensions made to another algorithms (typically regression algorithms) that penalizes models based on their complexity

The most popular regularisation algorithms are:

  • Ridge Regression
  • Least Absolute Shrinkage and Selection Operator (LASSO)
  • Elastic Net
  • Least-Angle Regression (LARS)

Decision Tree

The aim of these algorithms is to build tree-like model of decisions by recursively splitting training data into subsets based on feature (attribute) values. In this decision tree, each node represents a feature(attribute) where the data is split, each link(branch) represents a decision(rule) and each leaf represents a decision or an outcome(categorical or continues value).

a) How to pick first attribute to start the split & b) When to stop the split are depends on specific algorithm

Reference:

The most popular decision tree algorithms are:

  • Classification and Regression Tree (CART)
  • Iterative Dichotomiser 3 (ID3)
  • C4.5 and C5.0 (different versions of a powerful approach)
  • Chi-squared Automatic Interaction Detection (CHAID)
  • Decision Stump
  • M5
  • Conditional Decision Trees

Bayesian

Bayesian methods are those that explicitly apply Bayes’ Theorem for problems such as classification and regression.

The most popular Bayesian algorithms are:

  • Naive Bayes
  • Gaussian Naive Bayes
  • Multinomial Naive Bayes
  • Averaged One-Dependence Estimators (AODE)
  • Bayesian Belief Network (BBN)
  • Bayesian Network (BN)

Clustering

Clustering, like regression, describes the class of problem and the class of methods. The Clustering methods are organized by the modeling approaches such as centroid-based and hierarchal. All methods are concerned with using the inherent structures in the data. That is a need to best organize the data into groups of maximum commonality. The most popular clustering algorithms are:

  • k-Means
  • k-Medians
  • Expectation Maximisation (EM)
  • Hierarchical Clustering

Association Rule Learning:

Association rule learning methods extract rules that best explain observed relationships between variables in data. These rules can discover important and commercially useful associations in large multidimensional datasets that can be exploited by an organization.

The most popular association rule learning algorithms are:

  • Apriori algorithm
  • Eclat algorithm

Artificial Neural Networks (ANN)

These are models that are inspired by the structure and/or function of biological neural networks

They are a class of pattern matching commonly used for regression and classification problems. However its enormous subfield comprised of hundreds of algorithms and variations for all manner of problem types

e.g

  • Perceptron
  • Back-Propagation
  • Hopfield Network
  • Radial Basis Function Network (RBFN)

Deep Learning Algorithms

Deep Learning methods are a modern update to Artificial Neural Networks that exploit abundant cheap computation. They are concerned with building much larger and more complex neural networks. Many methods are concerned with semi-supervised learning problems where large datasets contain very little labeled data.

e.g

  • Deep Boltzmann Machine (DBM)
  • Deep Belief Networks (DBN)
  • Convolutional Neural Network (CNN)
  • Stacked Auto-Encoders

Dimensionality Reduction:

Like clustering methods, dimensionality reduction seek and exploit the inherent structure in the data, but in this case in an unsupervised manner or order to summarize or describe data using less information.

This can be useful to visualize dimensional data or to simplify data which can then be used in a supervised learning method. Many of these methods can be adapted for use in classification and regression.

e.g

  • Principal Component Analysis (PCA)
  • Principal Component Regression (PCR)
  • Partial Least Squares Regression (PLSR)
  • Sammon Mapping
  • Multidimensional Scaling (MDS)
  • Projection Pursuit
  • Linear Discriminant Analysis (LDA)
  • Mixture Discriminant Analysis (MDA)
  • Quadratic Discriminant Analysis (QDA)
  • Flexible Discriminant Analysis (FDA)

Ensemble:

Basically, these are models composed of weaker models that are independently trained and whose predictions are combined in some way to make the overall prediction.

Much effort is put into what types of weak learners to combine and the ways in which to combine them. This is a very powerful class of techniques and as such is very popular.

  • Boosting
  • Bootstrapped Aggregation (Bagging)
  • AdaBoost
  • Stacked Generalization (blending)
  • Gradient Boosting Machines (GBM)
  • Gradient Boosted Regression Trees (GBRT)
  • Random Forest

Conclusion

Few algorithms won't fit under just one of those categories listed above but a couple of them. For example, Support Vector Machines can be Regression or Classification

Some are more fined tuned version of one of the above algorithms to meet a specific domain e.g Natural Language Processing (NLP)

Hence it is not an exhaustive list but a good list to classify common ML algorithms by the way it function

Reference: