Introduction to Machine Learning - clizarraga-UAD7/Workshops GitHub Wiki

A brief Machine Learning Landscape


An Introduction to Machine Learning

Machine learning (ML) focuses on the study of computer algorithms that are feed on collections of data, and can be improved automatically or by the intervention of the users.

Machine learning can be seen as a part of Artificial Intelligence. And within Machine learning there is a whole other field of knowledge called Deep learning (DL).

With Machine learning algorithms, models are built based on sample data, known as training data. The goal of these models is to make predictions or decisions without being explicitly programmed to perform this.

Machine learning algorithms are used in a wide variety of current applications, for example they can be used for:

  • Object Recognition, a computer vision task to automatically identify objects form a digital image or video.
  • Automatic Summarization, for extracting the most relevant information form a text document.
  • Prediction, which form analyzing current or historical data a future event can be predicted.
  • Classification, which is used to identify an object based on its main features and assign it to a set of similar objets.
  • Clustering, being the action of grouping objects with similar characteristics.
  • Recommender systems, generalized application in digital platforms that seeks to predict the "rating" or "preference" a user will sign to a specific item.
  • and more.

(Image Credit: mygreatlearning.com)


The Scikit-learn Library

Scikit-learn

In Python, there is Scikit-learn Library for solving machine learning problems built on top of Numpy and SciPy. It features a lot of machine learning algorithms such as support vector machines, and random forests, as well as a lot of utilities for general pre-and post-processing of data. There is a large set of worked examples we can use to adapt them to whatever is our case.

Scikit-learn's implementation of neural networks is limited. Unlike scikit-learn, TensorFlow and PyTorch allow you to use a custom architecture, and they support GPUs for a massive training scale.

There is also a well-known specialized Python library dedicated to digital image processing called scikit-image. It includes algorithms for image segmentation, geometric transformations, color space manipulation, image analysis, filtering, morphology, feature detection, and more.

Scikit-Learn also provides a collection of datasets for learning machine learning, that can be accessed thru the function sklearn.datasets.

Machine Learning Process - Infograph


Classical Machine Learning Algorithms

Classical Machine Learning Methods can be classified into 3 major classes:

  • Supervised Learning. In supervised learning, the algorithms construct a function from input-output examples, and then it will predict the outcome given an unlabeled input.
  • Unsupervised Learning. In unsupervised learning, the algorithms learn to detect regularity or patterns from untagged data.
  • Weak Supervised Learning. This process lands between supervised and unsupervised learning methods. It occurs when there is only a noisy, limited or imprecise set of data used in supervised training for labeling large amounts of training data.

Supervised Machine Learning Algorithms

  1. Classification. Classification is an algorithm for predicting to which set of items, class,es or categories an object belongs to.

Some examples of Classification algorithms: Decision Trees | Examples K-Nearest Neighbors | Examples (Infograph) Naive Bayes | Examples Perceptron | Examples Random Forest | Example Support Vector Machine | Examples (Infograph)

  1. Regression (Prediction). Regression algorithms are used for predicting continuous values.

Some examples of Regression algorithms: Linear Regression | Examples (Infograph) Least Squares | Examples Polynomial Regression | Examples (Infograph) Nonlinear Regression Logistic Regression | Examples (Infograph)


Supervised Learning Evaluation

Classifier evaluation metrics:

  • Precision and Recall.
  • Precision = TP/(TP + FP), is the ratio of the correctly identified positive cases to all the predicted positive cases.
  • Recall = TP/(TP + FN), also known as sensitivity, is the ratio of the correctly identified positive cases to all the actual positive cases.
  • Accuracy. Accuracy is a statistical measure that is defined as the quotient of correct predictions (both True positives (TP) and True negatives (TN)) made by a classifier divided by the sum of all predictions made by the classifier, including False positives (FP) and False negatives (FN). Accuracy = (TP + TN)/(TP + TN + FP + FN).
  • F1-score. F1 = 2(Precision * Recall)/(Precision + Recall) = 2TP/(2TP + FP + FN).
  • Area under the curve - AUC
  • The Confusion Matrix, is a specific table layout that allows visualization of the performance of an algorithm

Regression Learning Evaluation:

More on Evaluation Metrics in Scikit-Learn.


Jupyter Notebook Regression Examples


Unsupervised Machine Learning Algorithms

  1. Dimensionality reduction, is a transformation of data set from a high-dimensional space into a low-dimensional space so that this representation preserves the meaningful properties of the original data. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

Some examples of Dimensionality Reduction Algorithms:

Linear Discriminant Analysis - LDA | Examples Latent Semantic Analysis - LSA | Examples Principal Component Analysis - PCA | Examples Singular Value Decomposition - SVD t-distribuited Stochastic Neighbor Embedding - t-SNE

  1. Association Rule Learning, is a rule-based machine learning method meant to discover the rules that determine how or why certain items are connected.

Some examples of Association Rule Learning Algorithms: Apriori Equivalent Class Transformation - Eclat Frequent Pattern - FP-Growth

  1. Clustering is the action of defining groups of objects (clusters) that are very similar to each other, compared with objects from other groups.

Some examples of Clustering algorithms Comparison: Connectivity models: for example, hierarchical clustering builds models based on distance connectivity. | Examples (Infograph) Centroid models: for example, the k-means algorithm represents each cluster by a single mean vector. | Examples (Infograph) Distribution models: clusters are modeled using statistical distributions, such as the multivariate normal distributions. Density models: for example, the Density-based spatial clustering of applications with noise - DBSCAN and the Ordering Points to Identify the Clustering Structure - OPTICS, which defines clusters as connected dense regions in the data space. | DBSCAN Examples | OPTICS Examples Soft Clustering: Each object belongs to each cluster to a certain degree, example Fuzzy Clustering. There is a large collection of Cluster Analysis Algorithms.

Unsupervised Learning Evaluation (Clustering)


Unsupervised Learning Jupyter Notebook Examples

Dimensionality Reduction and Clustering Examples


Reinforcement Learning

Reinforcement learning is the third Machine Learning paradigm, alongside supervised learning and unsupervised learning.

Reinforcement learning focus on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).

Some examples of Reinforcement Learning Algorithms: Monte Carlo method Q-learning State-action-reward-state-action - SARSA

Decision Tree Learning

Decision Tree learning (Infograph)


Ensemble Learning

Ensemble learning uses a combination of algorithms to improve its prediction performance. Examples of Ensemble learning algorithms:

Adaptive Boosting - AdaBoost (Examples) Gradient Boosting (Examples) XGBoost - eXtreme Gradient Boosting LightGBM - Light Gradient Boosting Machine


References

Cheat Sheets


Created: 04/24/2022 (C. Lizárraga); Last update: 06/07/2022 (C. Lizárraga)

CC BY-NC-SA