Machine Learning - bobbae/gcp GitHub Wiki

Machine learning (ML) is the study of computer algorithms that improve automatically through experience and data.

Machine learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.

https://www.youtube.com/watch?v=9MWj__4s9hk&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie

Guidelines for developing ML solutions

https://cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutions

Collection of Machine Learning resources

https://github.com/collections/machine-learning

Google Machine Learning Platform Overview

https://www.youtube.com/watch?v=QR_LQQ-vvko

Vertex AI

Vertex AI brings AutoML and AI Platform together into a unified API, client library, and user interface.

AI Hub

AI Hub is a platform that lets us centralize our code and knowledge in a way that can step up the pace of deployment and learnings globally.

AI Platform

AI Platform is a development platform to build AI applications that run on GCP and on-premises.

AutoML

AutoML lets you train high-quality custom machine learning models with minimal effort and machine learning expertise.

BigQuery ML

BigQuery ML lets you create and execute machine learning models in BigQuery using standard SQL queries.

OpenXLA

https://cloud.google.com/blog/products/ai-machine-learning/googles-open-source-momentum-openxla-new-partnerships/

GCP ML Solutions

AutoML

AutoML can be used to create your own custom machine learning models that are tailored to your business needs, and then integrate those models into your applications.

AI Platform and Machine Learning

AI Platform enables many parts of the machine learning (ML) workflow.

ML Solutions Overview

https://cloud.google.com/ai-platform/docs/ml-solutions-overview

Machine Learning Options

https://www.youtube.com/watch?v=pm_-pVPvZ-4

CloudML Engine

https://www.youtube.com/watch?v=m0rqccviLNM

Cloud AutoML to custom model

https://www.youtube.com/watch?v=OHIEZ-Scek8

AutoML NL for custom text classification

https://www.youtube.com/watch?v=ieaqfU1BwJ8

Custom Sentiment Analysis with AutoML Natural Language

https://www.youtube.com/watch?v=CReeC8YuEd8

Vision AI

Cloud Vision includes several options that you can use to integrate machine learning vision models into your applications.

https://www.youtube.com/watch?v=kgxfdTh9lz0

https://www.youtube.com/watch?v=BN8aO0LULyw

Video AI

Video Intelligence includes several options that you can use to integrate machine learning video intelligence models into your applications.

https://www.youtube.com/watch?v=h1zU0Qor9J8

Cloud Natural Language API

The Cloud Natural Language API provides natural language understanding technologies to developers, including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis.

https://cloud.google.com/natural-language/docs

Example of using Classification of Bag of Words via Keras

https://www.youtube.com/watch?v=UFtXy0KRxVI

Gain Insights from Text with Cloud Natural Language API

Qwiklabs GSP097
https://www.qwiklabs.com/focuses/582?parent=catalog

Entity Analysis

https://www.youtube.com/watch?v=3iOtK0sRNMI

RNN & Natural Language generation

https://www.youtube.com/watch?v=MNvT5JekDpg

Cloud Translation

Cloud Translation can dynamically translate text between thousands of language pairs.

AutoML Translation vs Translation API

The Translation API covers a huge number of language pairs and does a great job with general-purpose text. Where AutoML Translation really shines is for the "last mile" between generic translation tasks and specific, niche vocabularies.

Using Python and Translation API

https://www.youtube.com/watch?v=YapTts_An9A

Speech AI

https://cloud.google.com/blog/products/ai-machine-learning/your-ultimate-guide-to-speech-on-google-cloud

https://cloud.google.com/blog/products/ai-machine-learning/learn-how-google-cloud-customers-use-speech-ai-in-innovative-ways

Text-to-Speech

Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech.

Cloud Text-to-Speech API using C#

https://www.youtube.com/watch?v=OK1ZmlaFIV8

Speech-to-Text

https://cloud.google.com/speech-to-text/docs

Convert speech to text using Node.js

https://www.youtube.com/watch?v=naZ8oEKuR44

Cloud speech API Codelabs

https://cloud.google.com/blog/products/ai-machine-learning/top-google-cloud-speech-api-codelabs

Speech on Device

https://cloud.google.com/blog/products/ai-machine-learning/speech-on-device-run-server-quality-speech-ai-locally/

Difference between AutoML and Cloud Natural language API

Google AutoML Natural Language is much more powerful than the Natural Language API because it allows the user to train models that are customized for their specific dataset and domain.

Natural Language API

The Google Natural Language API is an easy to use interface to a set of powerful NLP models which have been pre-trained.

The major advantage of the Google Natural Language API is its ease of use. No machine learning skills are required and almost no coding skills.

The Google Natural Language API is a very convenient option for quick, out-of-the-box solutions.

AutoML Natural Language

If the Natural Language API is not flexible enough for your business purposes, then AutoML Natural Language might be the right service.

Machine Learning Crash Course

Step 1, read this comicbook.

Step 2, head over to this tutorial.

Step 3, look at these videos: https://www.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF

Step 4, go through ML learning materials

https://hackernoon.com/where-to-learn-machine-and-deep-learning-for-free

https://github.com/eugeneyan/applied-ml

https://github.com/microsoft/ML-For-Beginners

https://youtube.com/channel/UC12LqyqTQYbXatYS9AA7Nuw

Machine Learning & Artificial Intelligence

A Data Scientist models and analyzes key data and continually improves the way the business utilizes data. Data Scientists aim to make accurate predictions about the future using in-depth data modeling and deep learning.

https://towardsdatascience.com/5-minutes-cheat-sheet-explaining-all-machine-learning-models-3fea1cf96f05

Predictive Analytics

https://en.wikipedia.org/wiki/Predictive_analytics

Machine learning Workflows

AI Platform enables many parts of the machine learning (ML) workflow.

https://cloud.google.com/ai-platform/docs/ml-solutions-overview

7 steps of Machine Learning

Gather data, prepare data, choose the model, train the model, evaluate, tune parameters, review prediction or inference.

https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e

https://www.youtube.com/watch?v=nKW8Ndu7Mjw

Dataset Search engine

https://datasetsearch.research.google.com/

Supervised vs. Unsupervised learning

A supervised machine learning algorithm (as opposed to an unsupervised machine learning algorithm) is one that relies on labeled input data to learn a function that produces an appropriate output when given new unlabeled data.

https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d

The most common tasks within unsupervised learning are clustering, representation learning, and density estimation. In all of these cases, we wish to learn the inherent structure of our data without using explicitly-provided labels. Some common algorithms include k-means clustering, principal component analysis, and autoencoders. Since no labels are provided, there is no specific way to compare model performance in most unsupervised learning methods.

Two common use-cases for unsupervised learning are exploratory analysis and dimensionality reduction.

In situations where it is either impossible or impractical for a human to propose trends in the data, unsupervised learning can provide initial insights that can then be used to test individual hypotheses.

Dimensionality reduction, which refers to the methods used to represent data using less columns or features, can be accomplished through unsupervised methods. In representation learning, we wish to learn relationships between individual features, allowing us to represent our data using the latent features that interrelate our initial features. This sparse latent structure is often represented using far fewer features than we started with, so it can make further data processing much less intensive, and can eliminate redundant features.

Feature Engineering

Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.

Feature Engineering using TFX Pipeline and TensorFlow Transform

https://www.tensorflow.org/tfx/tutorials/tfx/penguin_tft

Feature engineering tutorials

One-Hot Encoding

In digital circuits and machine learning, a one-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).

https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/

https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f

One shot learning

One-shot learning is a classification task where one, or a couple, examples are used to classify many new examples in the future.

https://en.wikipedia.org/wiki/One-shot_learning

Binning

Binning (also called bucketing) is the process of converting a continuous feature into multiple binary features called bins or buckets, typically based on value range.

https://towardsdatascience.com/binning-for-feature-engineering-in-machine-learning-d3b3d76f364a

Normalization

Normalization is the process of converting an actual range of values which a numerical feature can take, into a standard range of values, typically in the interval [≠1, 1] or [0, 1].

By normalizing all of our inputs to a standard scale, we're allowing the network to more quickly learn the optimal parameters for each input node.

Standardization

Standardization (or z-score normalization) is the procedure during which the feature values are rescaled so that they have the properties of a standard normal distribution.

Dealing with Missing Features

In some cases, the data comes to the analyst in the form of a dataset with features already defined. In some examples, values of some features can be missing.

https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e

Data Imputation Techniques

One technique consists in replacing the missing value of a feature by an average value of this feature in the dataset.

https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779

Training Datasets

Once you have got your annotated dataset, you can split the dataset into three subsets: training, validation, and test.

Underfitting and Overfitting

https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/

Regularization

https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization

L1 and L2 regularization methods are also combined in what is called elastic net regularization with L1 and L2 regularizations being special cases.

https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c

Evaluation of Models

Once you have a model built using the training set, how can you say how good the model is? You use test set to assess the model.

https://heartbeat.fritz.ai/introduction-to-machine-learning-model-evaluation-fa859e1b2d7f

Accuracy

Accuracy is necessarily relevant or good way of evaluating a model. Accuracy is given by the number of correctly classified examples divided by the total number of classified examples.

Accuracy may be useful when errors in predicting all classes are equally important. In case of spam/not spam this may not be the case. You would tolerate false positives less than false negatives. A false positive may mean you don't get an important email. False negative is no big deal, even though it is annoying to get a spam.

Accuracy can be not useful when all classes not not equally important. Predicting click stream can be biased because of very few real positive clicks per rendered pages. In other words, almost no clicks can be the norm. In that case, a model that is 99.999% accurate can be created by returning "no click" as answer every time.

Accuracy and precision in statistics

https://en.wikipedia.org/wiki/Accuracy_and_precision

Bias variance trade-off

https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff

Confusion Matrix

Confusion Matrix is a table that summarizes how successful the classification model is at predicting examples belonging to various classes.

Confusion Matrices can be used to calculate two important performance metrics: precision and recall.

Precision & Recall

The two most frequently used metrics to assess the model are precision and recall. Precision is the ratio of correct positive predictions to overall number of positive predictions. Recall is the ratio of positive predictions to the overall number of positive examples in the test set.

https://www.youtube.com/watch?v=j-EB6RqqjGI&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=3

https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c

Imagine that you are given an image and asked to detect all the cars within it. Which metric do you use? Because the goal is to detect all the cars, use recall. This may misclassify some objects as cars, but it eventually will work towards detecting all the target objects.

Now say you're given a mammography image, and you are asked to detect whether there is cancer or not. Which metric do you use? Because it is sensitive to incorrectly identifying an image as cancerous, we must be sure when classifying an image as Positive (i.e. has cancer). Thus, precision is the preferred metric.

F-measure

F-measure is the harmonic mean of Precision and Recall and gives a better measure of the incorrectly classified cases than the Accuracy Metric.

https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/

Accuracy, precision , recall or F1

https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9

Log loss

Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification). The more the predicted probability diverges from the actual value, the higher is the log-loss value.

https://towardsdatascience.com/intuition-behind-log-loss-score-4e0c9979680a

Sensitivity and Specificity

https://dzone.com/articles/ml-metrics-sensitivity-vs-specificity-difference

ROC and AUC

Receiver Operating Characteristic curve and Area Under the Curve use a combination of the true positive rate and false positive rate to build up a summary picture of the model performance.

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate and False Positive Rate.

AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example.

AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

https://towardsdatascience.com/intuition-behind-roc-auc-score-1456439d1f30

evaluation of automl models

https://cloud.google.com/vertex-ai/docs/training/evaluating-automl-models

7 tips for ML training

https://cloud.google.com/blog/products/ai-machine-learning/7-tips-for-trouble-free-ml-model-training

Classification

Classification is a task that requires the use of machine learning algorithms that learn how to assign a class label to examples from the problem domain. An easy to understand example is classifying emails as “spam” or “not spam.”

Classification algorithms are used when you have a dataset of observations where we'd like to use the features associated with an observation to predict its class.

Basic Bayes Theorem

Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical formula for determining conditional probability. Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring.

https://www.youtube.com/watch?v=HZGCoVF3YvM

https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/

Naive Bayes classification methods are quite simple (in terms of model complexity) and commonly used for tasks such as document classification and spam filtering. This algorithm works well for datasets with a large amount of features (ex. a body of text where every word is treated as a feature) but it is naive in the sense that it treats every feature as independent of one another. This is clearly not the case for language, where word order matters when trying to discern meaning from a statement. Nonetheless, these methods have been used quite successfully for various text classification tasks.

Regression & Classification

Regression and classification lead to ways of splitting data.

https://en.wikipedia.org/wiki/Regression_analysis

Classification is a problem of automatically assigning a label to an unlabeled example. Spam detection is a famous example of classification.

https://en.wikipedia.org/wiki/Statistical_classification

Linear Regression

Linear regression is used to predict an outcome given some input value(s). While machine learning classifiers use features to predict a discrete label for a given instance or example, machine learning regressors have the ability use features to predict a continuous outcome for a given instance or example.

https://www.youtube.com/watch?v=K_EH2abOp00&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=2

https://www.youtube.com/watch?v=nk2CQITm_eo

Polynomial regression

Polynomial regression is very similar to linear regression, with a slight deviation in treatment of the feature-space.

https://towardsdatascience.com/polynomial-regression-with-scikit-learn-what-you-should-know-bed9d3296f2

Kernelization

https://www.youtube.com/watch?v=wBVSbVktLIY&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=5

Logistic Regression

The goal of logistic regression, as with any classifier, is to figure out some way to split the data to allow for an accurate prediction of a given observation's class using the information present in the features.

https://www.youtube.com/watch?v=YMJtsYIp4kg&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=4

https://www.youtube.com/watch?v=yIYKR4sgzI8

Decision Trees

Decision trees are one of the oldest and most widely-used machine learning models, due to the fact that they work well with noisy or missing data, can easily be formed as more robust predictors, and are incredibly fast at runtime.

Decision trees are desirable in that they scale well to larger datasets, they are robust against irrelevant features, and it is very easy to visualize the rationalization between a decision tree's predictions.

SVM

Support vector machines classifier works well in complicated feature domains, albeit requiring clear separation between classes.

SVM is a supervised machine learning model that uses classification algorithms for two-group classification problems.

Compared to newer algorithms like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples.

https://www.youtube.com/watch?v=05VABNfa1ds&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=6

https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

SVMs don't work well with noisy data, and the algorithm scales roughly cubic O(n3) to input depending on your implementation.

Random forests

Random forests inherit the benefits of a decision tree model whilst improving upon the performance by reducing the variance.

https://www.youtube.com/watch?v=mld0TnA2jEs&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=8

Boosted trees

Boosting is an iterative process where models are trained in a sequential order.

Content Classification

Content Classification analyzes a document and returns a list of content categories that apply to the text found in the document.

https://cloud.google.com/natural-language/docs/classifying-text

Using AutoML to classify text.

https://www.youtube.com/watch?v=ieaqfU1BwJ8

Classification, redaction, and de-identification

The Cloud Data Loss Prevention (DLP) helps you understand, manage, and protect sensitive data. With the Cloud DLP, you can easily classify and redact sensitive data contained in text-based content and images, including content stored in Google Cloud storage repositories.

https://cloud.google.com/dlp/docs/classification-redaction

Clustering

Clustering is a popular technique to find groups or segments in your data that are similar. This is an unsupervised learning algorithm in the sense that you don't train the algorithm and give it examples for what you'd like it to do, you just let the clustering algorithm explore the data and provide you with new insights.

K-means clustering

K-means clustering is a simple method for partitioning n data points in k groups, or clusters.

https://www.youtube.com/watch?v=O6b2L_lYH9k&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=9

https://www.youtube.com/watch?v=4b5d3muPQmA

K-nearest neighbors

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.

Dimensionality Reduction

Dimensionality reduction is used to reduce the dimension of our feature-space while maintaining the maximum amount of information.

Principal Components Analysis

Principal components analysis (PCA) allows us to take an n-dimensional feature-space and reduce it to a k-dimensional feature-space while maintaining as much information from the original dataset as possible in the reduced dataset.

Autoencoders

Autoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning.

Neural Networks

Neural networks are one of the most popular approaches to machine learning today, achieving impressive performance on a large variety of tasks.

https://www.youtube.com/watch?v=fkqZyYo_ebs&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=1

Structure

https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/anatomy

Neural Networks Representation

Neural networks are a biologically-inspired algorithm that attempt to mimic the functions of neurons in the brain. Each neuron acts as a computational unit, accepting input from the dendrites and outputting signal through the axon terminals. Actions are triggered when a specific combination of neurons are activated.

Activation functions

Activation functions are used to determine the firing of neurons in a neural network. Given a linear combination of inputs and weights from the previous layer, the activation function controls how we'll pass that information on to the next layer.

Backpropagation

Backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used.

Gradient descent

Gradient descent is an optimization technique commonly used in training machine learning algorithms. Often when we're building a machine learning model, we'll develop a cost function which is capable of measuring how well our model is doing. This function will penalize any error our model makes by assigning a cost with respect to the current parameter values. By minimizing the cost function we can find the optimal parameters that yield the best model performance.

https://www.youtube.com/watch?v=sDv4f4s2SB8

Hyperparameter Tuning

It is often necessary to tune hyperparameters by experimentally finding the best combination of values, one per hyperparamter.

Learning Rate

One of the key hyperparameters to set in order to train a neural network is the learning rate for gradient descent. The learning rate parameter scales the magnitude of our weight updates in order to minimize the network's loss function.

If your learning rate is set too low, training will progress very slowly as you are making very tiny updates to the weights in your network. However, if your learning rate is set too high, it can cause undesirable divergent behavior in your loss function.

Convolutional Neural Networks

CNN are used heavily in image recognition applications of machine learning. Convolutional neural networks provide an advantage over feed-forward networks because they are capable of considering locality of features.

https://www.youtube.com/watch?v=m8pOnJxOcqY&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=2

https://medium.com/swlh/convolutional-neural-network-for-detecting-cancer-tumors-in-microscopic-images-1acab6481d05

https://www.youtube.com/watch?v=YRhxdVk_sIs

UNet

UNet, evolved from the traditional convolutional neural network, was first designed and applied in 2015 to process biomedical images. As a general convolutional neural network focuses its task on image classification, where input is an image and output is one label, but in biomedical cases, it requires us not only to distinguish whether there is a disease, but also to localise the area of abnormality.

https://towardsdatascience.com/unet-line-by-line-explanation-9b191c76baf5

Recurrent Neural Networks

Recurrent neural networks are good for learning from sequential data.

RNNs are often used in text and speech processing because sentences and texts are naturally sequences of either words/punctuation marks or sequences of characters.

https://www.youtube.com/watch?v=yZv_yRgOvMg&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=3

https://medium.com/towards-data-science/how-does-masking-work-in-an-rnn-and-variants-and-why-537bf63c306d

https://www.youtube.com/watch?v=LHXXI4-IEns

LSTM

Long short-term memory networks are an extension for recurrent neural networks, which basically extends the memory. Therefore it is well suited to learn from important experiences that have very long time lags in between.

https://www.youtube.com/watch?v=QciIcRxJvsM&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=4

LSTMs enable RNNs to remember inputs over a long period of time.

LSTM, Transformer and BERT

https://www.youtube.com/watch?v=xI0HHN5XKDo

GAN Generative Adversarial Networks

https://www.youtube.com/watch?v=C1YUYWP-6rE&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=5

Transformers, Attention is all you need

https://www.youtube.com/watch?v=TQQlZhbC5ps

Multi-class Neural Networks

One vs. All

https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/one-vs-all

Softmax

https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax

NN Best Practices

https://developers.google.com/machine-learning/crash-course/training-neural-networks/best-practices

Reinforcement Learning

Reinforcement learning is an approach to machine learning where agents are rewarded to accomplish some task.

Markov Decision Process

The Markov Decision Process is a method for planning in a stochastic environment.

Monte Carlo learning

The Monte Carlo approach approximates the value of a state-action pair by calculating the mean return from a collection of episodes.

Model based vs. Instance based learning

Most supervised learning algorithms are model-based, e.g. SVM. Model-based learning algorithms use the training data to create a model that has parameters learned from the training data. After the model was built, the training data can be discarded.

Instance-based learning algorithms use the whole dataset as the model. One instance-based algorithm frequently used in practice is k-Nearest Neighbors (kNN). In classification, to predict a label for an input example the kNN algorithm looks at the close neighborhood of the input example in the space of feature vectors and outputs the label that it saw the most often in this close neighborhood.

https://www.kaggle.com/getting-started/179177

Shallow vs. Deep learning

A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.

https://www.mathworks.com/discovery/deep-learning.html

Characteristics of a machine learning model

https://subscription.packtpub.com/book/data/9781838820299/1/ch01lvl1sec03/characteristics-of-a-machine-learning-model

https://www.malicksarr.com/type-of-machine-learning-algorithms-the-complete-overview/

https://serokell.io/blog/machine-learning-algorithm-classification-overview

https://towardsdatascience.com/a-tour-of-machine-learning-algorithms-466b8bf75c0a

Natural Language Processing

GCP Translation tools

https://cloud.google.com/blog/products/ai-machine-learning/translation-tools-that-meet-business-needs

TF-IDF Vectorization

Term frequency-inverse document frequency (TF-IDF) vectorization is a mouthful to say, but it's also a simple and convenient way to characterize bodies of text.

Build Text Classification Model using TF-IDF and NLTK

https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk

NLTK

https://www.nltk.org/book/

BERT

Bidirectional Encoder Representations from Transformers is described in this paper.

https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270

GPT-3

Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text.

https://en.wikipedia.org/wiki/GPT-3

https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3

https://github.com/elyase/awesome-gpt3

https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/

Write with transformer

https://transformer.huggingface.co/doc/gpt

BERT and GPT-3

https://www.ibm.com/blogs/watson/2020/12/how-bert-and-gpt-models-change-the-game-for-nlp/

https://360digitmg.com/gpt-vs-bert

GPT-3 Criticism

https://dl.acm.org/doi/10.1145/3442188.3445922

Stochastic parrots

https://faculty.washington.edu/ebender/stochasticparrots.html

BLOOM

https://bigscience.huggingface.co/blog/bloom

DALL-E

https://openai.com/blog/dall-e/

DALL-E 2

https://openai.com/dall-e-2/

Imagen

https://imagen.research.google/

DALL-E mini

https://blog.paperspace.com/dalle-mini/

Code Editors

https://copilot.github.com/

https://github.com/salesforce/CodeT5

https://github.com/codota/TabNine

ELMo

ELMo is a deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).

https://allennlp.org/elmo

BERT and ELMo

http://jalammar.github.io/illustrated-bert/

Transfer Learning

Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task.

Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of different algorithms like ULMFiT, Skip-Gram, Elmo, BERT etc.

https://towardsdatascience.com/transfer-learning-using-elmo-embedding-c4a7e415103c

Attention

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

Transformers

https://jalammar.github.io/illustrated-transformer/

Trends

https://www.elderresearch.com/blog/trends-in-natural-language-processing/

Embeddings

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors to capture some of the semantics of the input by placing semantically similar inputs close together in the embedding space.

https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings

Collaborative Filtering

https://developers.google.com/machine-learning/crash-course/embeddings/motivation-from-collaborative-filtering

Categorical Input Data

https://developers.google.com/machine-learning/crash-course/embeddings/categorical-input-data

Translating to a Lower Dimensional Space

https://developers.google.com/machine-learning/crash-course/embeddings/translating-to-a-lower-dimensional-space

Obtaining Embeddings

https://developers.google.com/machine-learning/crash-course/embeddings/obtaining-embeddings

spaCy

spaCy supports a number of transfer and multi-task learning workflows that can often help improve your pipeline’s efficiency or accuracy.

https://applied-language-technology.readthedocs.io/en/latest/notebooks/part_iii/04_embeddings_continued.html

Machine Learning on Source Code

https://github.com/src-d/awesome-machine-learning-on-source-code

ML Frameworks and tools

Feature Store

https://www.featurestore.org/

https://cloud.google.com/vertex-ai/docs/featurestore/overview

https://www.hopsworks.ai/post/feature-store-the-missing-data-layer-in-ml-pipelines

Tensorflow

Created by the Google Brain team, TensorFlow is an open source library for numerical computation and large-scale machine learning.

Tensorflow Example: Plain and Simple estimators

TensorFlow has a pretty large API surface, but the part we are going to focus on is high-level APIs, called Estimators.

https://towardsdatascience.com/plain-and-simple-estimators-d8d3f4c185c1

The Estimators API gives us a nice workflow of getting our raw data, passing it through an input function, setting up our feature columns and model structure, running our training, and running our evaluation.

https://www.youtube.com/watch?v=G7oolm0jU8I

Scikit Learn

Scikit Learn provides a range of supervised and unsupervised learning algorithms via a consistent interface.

Keras

Keras is a neural network library. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models.

PyTorch

PyTorch is an awesome source machine learning library based on the Torch library.

Micrograd

https://github.com/karpathy/micrograd

Tinygrad

https://github.com/geohot/tinygrad

JAX

https://github.com/google/jax

https://colab.research.google.com/github/google/jax/blob/main/docs/notebooks/neural_network_with_tfds_data.ipynb

EvoJAX

https://cloud.google.com/blog/topics/developers-practitioners/evojax-bringing-power-neuroevolution-solve-your-problems

Kubeflow

Kubeflow Pipelines is a platform for building, deploying, and managing multi-step ML workflows based on Docker containers. Kubeflow offers several components that you can use to build your ML training, hyperparameter tuning, and serving workloads across multiple platforms.

MLOps

MLOps is the process of taking an experimental Machine Learning model into a production web system.

Risks

Fairness

https://wikipedia.org/wiki/Fairness_(machine_learning)

When to not use ML

https://eugeneyan.com/writing/first-rule-of-ml/

Metascience and p-value

https://en.wikipedia.org/wiki/P-value

Diminishing returns on deep learning costs

https://spectrum.ieee.org/deep-learning-computational-cost

Google ML Engineer exam

https://medium.com/google-developer-experts/get-recognized-as-an-ml-expert-with-the-google-professional-ml-engineer-certificate-c85a67e9270d

https://medium.com/@joshcx/how-i-passed-the-google-cloud-professional-machine-learning-engineer-exam-vertex-ai-484c7863bbac

https://towardsdatascience.com/a-comprehensive-study-guide-for-the-google-professional-machine-learning-engineer-certification-1e411db4d2cf

https://medium.com/@datacouch/google-cloud-professional-machine-learning-engineer-certification-preparation-guide-2067478767ff

How to reduce your ML model inference costs

https://medium.com/google-cloud/how-to-reduce-your-ml-model-inference-costs-on-google-cloud-e3d5e043980f

Examples

Machine learning crash course

https://developers.google.com/machine-learning/crash-course

Titanic survival prediction using danfo.js and tensorflow.js

https://danfo.jsdata.org/examples/titanic-survival-prediction-using-danfo.js-and-tensorflow.js

Monitor models for training-serving skew with Vertex AI

https://cloud.google.com/blog/topics/developers-practitioners/monitor-models-training-serving-skew-vertex-ai

BigQuery ML tutorials

https://cloud.google.com/bigquery-ml/docs/tutorials

The Making of an AI Storyteller

https://towardsdatascience.com/the-making-of-an-ai-storyteller-c3b8d5a983f5

Getting started with Keras

https://cloud.google.com/ai-platform/docs/getting-started-keras

Demo of Video Intelligence API

https://medium.com/@zackakil/see-what-video-intelligence-api-can-do-with-this-visualisation-tool-4303e371505

Anomaly detection using River

https://medium.com/spikelab/anomalies-detection-using-river-398544d3536

USAA Insurance Operations

https://cloud.google.com/blog/products/ai-machine-learning/usaa-and-google-cloud-transform-insurance-operations

IT and IIOT Monitoring

https://cloud.google.com/blog/products/ai-machine-learning/usaa-and-google-cloud-transform-insurance-operation

Mapping carbon pollution globally with satellites

https://cloud.google.com/blog/topics/developers-practitioners/mapping-carbon-pollution-globally-satellites

Choose an outfit with AI

https://www.youtube.com/watch?v=o6nGn1euRjk&list=PLIivdWyY5sqLsaG5hNms0D9aZRBE7DHBb&index=7

Machine Learning Glossary

https://developers.google.com/machine-learning/glossary?hl=en

Tutorials

Links

Qwiklabs