Machine Learning - bobbae/gcp GitHub Wiki
Machine learning (ML) is the study of computer algorithms that improve automatically through experience and data.
Machine learning is an application of Artificial Intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
Machine learning focuses on the development of computer programs that can access data and use it to learn for themselves.
https://www.youtube.com/watch?v=9MWj__4s9hk&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie
Guidelines for developing ML solutions
https://cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutions
Collection of Machine Learning resources
https://github.com/collections/machine-learning
Google Machine Learning Platform Overview
https://www.youtube.com/watch?v=QR_LQQ-vvko
Vertex AI
Vertex AI brings AutoML and AI Platform together into a unified API, client library, and user interface.
AI Hub
AI Hub is a platform that lets us centralize our code and knowledge in a way that can step up the pace of deployment and learnings globally.
AI Platform
AI Platform is a development platform to build AI applications that run on GCP and on-premises.
AutoML
AutoML lets you train high-quality custom machine learning models with minimal effort and machine learning expertise.
BigQuery ML
BigQuery ML lets you create and execute machine learning models in BigQuery using standard SQL queries.
OpenXLA
GCP ML Solutions
AutoML
AutoML can be used to create your own custom machine learning models that are tailored to your business needs, and then integrate those models into your applications.
AI Platform and Machine Learning
AI Platform enables many parts of the machine learning (ML) workflow.
ML Solutions Overview
https://cloud.google.com/ai-platform/docs/ml-solutions-overview
Machine Learning Options
https://www.youtube.com/watch?v=pm_-pVPvZ-4
CloudML Engine
https://www.youtube.com/watch?v=m0rqccviLNM
Cloud AutoML to custom model
https://www.youtube.com/watch?v=OHIEZ-Scek8
AutoML NL for custom text classification
https://www.youtube.com/watch?v=ieaqfU1BwJ8
Custom Sentiment Analysis with AutoML Natural Language
https://www.youtube.com/watch?v=CReeC8YuEd8
Vision AI
Cloud Vision includes several options that you can use to integrate machine learning vision models into your applications.
https://www.youtube.com/watch?v=kgxfdTh9lz0
https://www.youtube.com/watch?v=BN8aO0LULyw
Video AI
Video Intelligence includes several options that you can use to integrate machine learning video intelligence models into your applications.
https://www.youtube.com/watch?v=h1zU0Qor9J8
Cloud Natural Language API
The Cloud Natural Language API provides natural language understanding technologies to developers, including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis.
https://cloud.google.com/natural-language/docs
Example of using Classification of Bag of Words via Keras
https://www.youtube.com/watch?v=UFtXy0KRxVI
Gain Insights from Text with Cloud Natural Language API
Qwiklabs GSP097
https://www.qwiklabs.com/focuses/582?parent=catalog
Entity Analysis
https://www.youtube.com/watch?v=3iOtK0sRNMI
RNN & Natural Language generation
https://www.youtube.com/watch?v=MNvT5JekDpg
Cloud Translation
Cloud Translation can dynamically translate text between thousands of language pairs.
AutoML Translation vs Translation API
The Translation API covers a huge number of language pairs and does a great job with general-purpose text. Where AutoML Translation really shines is for the "last mile" between generic translation tasks and specific, niche vocabularies.
Using Python and Translation API
https://www.youtube.com/watch?v=YapTts_An9A
Speech AI
Text-to-Speech
Text-to-Speech converts text or Speech Synthesis Markup Language (SSML) input into audio data of natural human speech.
Cloud Text-to-Speech API using C#
https://www.youtube.com/watch?v=OK1ZmlaFIV8
Speech-to-Text
https://cloud.google.com/speech-to-text/docs
Convert speech to text using Node.js
https://www.youtube.com/watch?v=naZ8oEKuR44
Cloud speech API Codelabs
https://cloud.google.com/blog/products/ai-machine-learning/top-google-cloud-speech-api-codelabs
Speech on Device
Difference between AutoML and Cloud Natural language API
Google AutoML Natural Language is much more powerful than the Natural Language API because it allows the user to train models that are customized for their specific dataset and domain.
Natural Language API
The Google Natural Language API is an easy to use interface to a set of powerful NLP models which have been pre-trained.
The major advantage of the Google Natural Language API is its ease of use. No machine learning skills are required and almost no coding skills.
The Google Natural Language API is a very convenient option for quick, out-of-the-box solutions.
AutoML Natural Language
If the Natural Language API is not flexible enough for your business purposes, then AutoML Natural Language might be the right service.
Machine Learning Crash Course
Step 1, read this comicbook.
Step 2, head over to this tutorial.
Step 3, look at these videos: https://www.youtube.com/playlist?list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF
Step 4, go through ML learning materials
https://hackernoon.com/where-to-learn-machine-and-deep-learning-for-free
https://github.com/eugeneyan/applied-ml
https://github.com/microsoft/ML-For-Beginners
https://youtube.com/channel/UC12LqyqTQYbXatYS9AA7Nuw
Machine Learning & Artificial Intelligence
A Data Scientist models and analyzes key data and continually improves the way the business utilizes data. Data Scientists aim to make accurate predictions about the future using in-depth data modeling and deep learning.
Predictive Analytics
https://en.wikipedia.org/wiki/Predictive_analytics
Machine learning Workflows
AI Platform enables many parts of the machine learning (ML) workflow.
https://cloud.google.com/ai-platform/docs/ml-solutions-overview
7 steps of Machine Learning
Gather data, prepare data, choose the model, train the model, evaluate, tune parameters, review prediction or inference.
https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e
https://www.youtube.com/watch?v=nKW8Ndu7Mjw
Dataset Search engine
https://datasetsearch.research.google.com/
Supervised vs. Unsupervised learning
A supervised machine learning algorithm (as opposed to an unsupervised machine learning algorithm) is one that relies on labeled input data to learn a function that produces an appropriate output when given new unlabeled data.
https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
The most common tasks within unsupervised learning are clustering, representation learning, and density estimation. In all of these cases, we wish to learn the inherent structure of our data without using explicitly-provided labels. Some common algorithms include k-means clustering, principal component analysis, and autoencoders. Since no labels are provided, there is no specific way to compare model performance in most unsupervised learning methods.
Two common use-cases for unsupervised learning are exploratory analysis and dimensionality reduction.
In situations where it is either impossible or impractical for a human to propose trends in the data, unsupervised learning can provide initial insights that can then be used to test individual hypotheses.
Dimensionality reduction, which refers to the methods used to represent data using less columns or features, can be accomplished through unsupervised methods. In representation learning, we wish to learn relationships between individual features, allowing us to represent our data using the latent features that interrelate our initial features. This sparse latent structure is often represented using far fewer features than we started with, so it can make further data processing much less intensive, and can eliminate redundant features.
Feature Engineering
Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data.
Feature Engineering using TFX Pipeline and TensorFlow Transform
https://www.tensorflow.org/tfx/tutorials/tfx/penguin_tft
Feature engineering tutorials
- https://www.kdnuggets.com/2018/12/feature-engineering-explained.html
- https://towardsdatascience.com/feature-engineering-for-machine-learning-3a5e293a5114
- https://towardsdatascience.com/feature-engineering-in-machine-learning-23b338ea48f4
One-Hot Encoding
In digital circuits and machine learning, a one-hot is a group of bits among which the legal combinations of values are only those with a single high (1) bit and all the others low (0).
https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
https://hackernoon.com/what-is-one-hot-encoding-why-and-when-do-you-have-to-use-it-e3c6186d008f
One shot learning
One-shot learning is a classification task where one, or a couple, examples are used to classify many new examples in the future.
https://en.wikipedia.org/wiki/One-shot_learning
Binning
Binning (also called bucketing) is the process of converting a continuous feature into multiple binary features called bins or buckets, typically based on value range.
https://towardsdatascience.com/binning-for-feature-engineering-in-machine-learning-d3b3d76f364a
Normalization
Normalization is the process of converting an actual range of values which a numerical feature can take, into a standard range of values, typically in the interval [≠1, 1] or [0, 1].
By normalizing all of our inputs to a standard scale, we're allowing the network to more quickly learn the optimal parameters for each input node.
Standardization
Standardization (or z-score normalization) is the procedure during which the feature values are rescaled so that they have the properties of a standard normal distribution.
Dealing with Missing Features
In some cases, the data comes to the analyst in the form of a dataset with features already defined. In some examples, values of some features can be missing.
https://towardsdatascience.com/7-ways-to-handle-missing-values-in-machine-learning-1a6326adf79e
Data Imputation Techniques
One technique consists in replacing the missing value of a feature by an average value of this feature in the dataset.
Training Datasets
Once you have got your annotated dataset, you can split the dataset into three subsets: training, validation, and test.
Underfitting and Overfitting
https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/
Regularization
L1 and L2 regularization methods are also combined in what is called elastic net regularization with L1 and L2 regularizations being special cases.
https://towardsdatascience.com/l1-and-l2-regularization-methods-ce25e7fc831c
Evaluation of Models
Once you have a model built using the training set, how can you say how good the model is? You use test set to assess the model.
https://heartbeat.fritz.ai/introduction-to-machine-learning-model-evaluation-fa859e1b2d7f
Accuracy
Accuracy is necessarily relevant or good way of evaluating a model. Accuracy is given by the number of correctly classified examples divided by the total number of classified examples.
Accuracy may be useful when errors in predicting all classes are equally important. In case of spam/not spam this may not be the case. You would tolerate false positives less than false negatives. A false positive may mean you don't get an important email. False negative is no big deal, even though it is annoying to get a spam.
Accuracy can be not useful when all classes not not equally important. Predicting click stream can be biased because of very few real positive clicks per rendered pages. In other words, almost no clicks can be the norm. In that case, a model that is 99.999% accurate can be created by returning "no click" as answer every time.
Accuracy and precision in statistics
https://en.wikipedia.org/wiki/Accuracy_and_precision
Bias variance trade-off
https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
Confusion Matrix
Confusion Matrix is a table that summarizes how successful the classification model is at predicting examples belonging to various classes.
Confusion Matrices can be used to calculate two important performance metrics: precision and recall.
Precision & Recall
The two most frequently used metrics to assess the model are precision and recall. Precision is the ratio of correct positive predictions to overall number of positive predictions. Recall is the ratio of positive predictions to the overall number of positive examples in the test set.
https://www.youtube.com/watch?v=j-EB6RqqjGI&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=3
https://towardsdatascience.com/beyond-accuracy-precision-and-recall-3da06bea9f6c
Imagine that you are given an image and asked to detect all the cars within it. Which metric do you use? Because the goal is to detect all the cars, use recall. This may misclassify some objects as cars, but it eventually will work towards detecting all the target objects.
Now say you're given a mammography image, and you are asked to detect whether there is cancer or not. Which metric do you use? Because it is sensitive to incorrectly identifying an image as cancerous, we must be sure when classifying an image as Positive (i.e. has cancer). Thus, precision is the preferred metric.
F-measure
F-measure is the harmonic mean of Precision and Recall and gives a better measure of the incorrectly classified cases than the Accuracy Metric.
https://machinelearningmastery.com/precision-recall-and-f-measure-for-imbalanced-classification/
Accuracy, precision , recall or F1
https://towardsdatascience.com/accuracy-precision-recall-or-f1-331fb37c5cb9
Log loss
Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification). The more the predicted probability diverges from the actual value, the higher is the log-loss value.
https://towardsdatascience.com/intuition-behind-log-loss-score-4e0c9979680a
Sensitivity and Specificity
https://dzone.com/articles/ml-metrics-sensitivity-vs-specificity-difference
ROC and AUC
Receiver Operating Characteristic curve and Area Under the Curve use a combination of the true positive rate and false positive rate to build up a summary picture of the model performance.
An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate and False Positive Rate.
AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example.
AUC ranges in value from 0 to 1. A model whose predictions are 100% wrong has an AUC of 0.0; one whose predictions are 100% correct has an AUC of 1.0.
https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc
https://towardsdatascience.com/intuition-behind-roc-auc-score-1456439d1f30
evaluation of automl models
https://cloud.google.com/vertex-ai/docs/training/evaluating-automl-models
7 tips for ML training
https://cloud.google.com/blog/products/ai-machine-learning/7-tips-for-trouble-free-ml-model-training
Classification
Classification is a task that requires the use of machine learning algorithms that learn how to assign a class label to examples from the problem domain. An easy to understand example is classifying emails as “spam” or “not spam.”
Classification algorithms are used when you have a dataset of observations where we'd like to use the features associated with an observation to predict its class.
Basic Bayes Theorem
Bayes' theorem, named after 18th-century British mathematician Thomas Bayes, is a mathematical formula for determining conditional probability. Conditional probability is the likelihood of an outcome occurring, based on a previous outcome occurring.
https://www.youtube.com/watch?v=HZGCoVF3YvM
https://machinelearningmastery.com/naive-bayes-classifier-scratch-python/
Naive Bayes classification methods are quite simple (in terms of model complexity) and commonly used for tasks such as document classification and spam filtering. This algorithm works well for datasets with a large amount of features (ex. a body of text where every word is treated as a feature) but it is naive in the sense that it treats every feature as independent of one another. This is clearly not the case for language, where word order matters when trying to discern meaning from a statement. Nonetheless, these methods have been used quite successfully for various text classification tasks.
Regression & Classification
Regression and classification lead to ways of splitting data.
https://en.wikipedia.org/wiki/Regression_analysis
Classification is a problem of automatically assigning a label to an unlabeled example. Spam detection is a famous example of classification.
https://en.wikipedia.org/wiki/Statistical_classification
Linear Regression
Linear regression is used to predict an outcome given some input value(s). While machine learning classifiers use features to predict a discrete label for a given instance or example, machine learning regressors have the ability use features to predict a continuous outcome for a given instance or example.
https://www.youtube.com/watch?v=K_EH2abOp00&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=2
https://www.youtube.com/watch?v=nk2CQITm_eo
Polynomial regression
Polynomial regression is very similar to linear regression, with a slight deviation in treatment of the feature-space.
Kernelization
https://www.youtube.com/watch?v=wBVSbVktLIY&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=5
Logistic Regression
The goal of logistic regression, as with any classifier, is to figure out some way to split the data to allow for an accurate prediction of a given observation's class using the information present in the features.
https://www.youtube.com/watch?v=YMJtsYIp4kg&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=4
https://www.youtube.com/watch?v=yIYKR4sgzI8
Decision Trees
Decision trees are one of the oldest and most widely-used machine learning models, due to the fact that they work well with noisy or missing data, can easily be formed as more robust predictors, and are incredibly fast at runtime.
Decision trees are desirable in that they scale well to larger datasets, they are robust against irrelevant features, and it is very easy to visualize the rationalization between a decision tree's predictions.
SVM
Support vector machines classifier works well in complicated feature domains, albeit requiring clear separation between classes.
SVM is a supervised machine learning model that uses classification algorithms for two-group classification problems.
Compared to newer algorithms like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples.
https://www.youtube.com/watch?v=05VABNfa1ds&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=6
https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/
SVMs don't work well with noisy data, and the algorithm scales roughly cubic O(n3) to input depending on your implementation.
Random forests
Random forests inherit the benefits of a decision tree model whilst improving upon the performance by reducing the variance.
https://www.youtube.com/watch?v=mld0TnA2jEs&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=8
Boosted trees
Boosting is an iterative process where models are trained in a sequential order.
Content Classification
Content Classification analyzes a document and returns a list of content categories that apply to the text found in the document.
https://cloud.google.com/natural-language/docs/classifying-text
Using AutoML to classify text.
https://www.youtube.com/watch?v=ieaqfU1BwJ8
Classification, redaction, and de-identification
The Cloud Data Loss Prevention (DLP) helps you understand, manage, and protect sensitive data. With the Cloud DLP, you can easily classify and redact sensitive data contained in text-based content and images, including content stored in Google Cloud storage repositories.
https://cloud.google.com/dlp/docs/classification-redaction
Clustering
Clustering is a popular technique to find groups or segments in your data that are similar. This is an unsupervised learning algorithm in the sense that you don't train the algorithm and give it examples for what you'd like it to do, you just let the clustering algorithm explore the data and provide you with new insights.
K-means clustering
K-means clustering is a simple method for partitioning n data points in k groups, or clusters.
https://www.youtube.com/watch?v=O6b2L_lYH9k&list=PLTl9hO2Oobd9UuNwS9R5Z6HcTesBMCvie&index=9
https://www.youtube.com/watch?v=4b5d3muPQmA
K-nearest neighbors
The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems.
Dimensionality Reduction
Dimensionality reduction is used to reduce the dimension of our feature-space while maintaining the maximum amount of information.
Principal Components Analysis
Principal components analysis (PCA) allows us to take an n-dimensional feature-space and reduce it to a k-dimensional feature-space while maintaining as much information from the original dataset as possible in the reduced dataset.
Autoencoders
Autoencoders are an unsupervised learning technique in which we leverage neural networks for the task of representation learning.
Neural Networks
Neural networks are one of the most popular approaches to machine learning today, achieving impressive performance on a large variety of tasks.
https://www.youtube.com/watch?v=fkqZyYo_ebs&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=1
Structure
https://developers.google.com/machine-learning/crash-course/introduction-to-neural-networks/anatomy
Neural Networks Representation
Neural networks are a biologically-inspired algorithm that attempt to mimic the functions of neurons in the brain. Each neuron acts as a computational unit, accepting input from the dendrites and outputting signal through the axon terminals. Actions are triggered when a specific combination of neurons are activated.
Activation functions
Activation functions are used to determine the firing of neurons in a neural network. Given a linear combination of inputs and weights from the previous layer, the activation function controls how we'll pass that information on to the next layer.
Backpropagation
Backpropagation computes the gradient of the loss function with respect to the weights of the network for a single input–output example, and does so efficiently, unlike a naive direct computation of the gradient with respect to each weight individually. This efficiency makes it feasible to use gradient methods for training multilayer networks, updating weights to minimize loss; gradient descent, or variants such as stochastic gradient descent, are commonly used.
Gradient descent
Gradient descent is an optimization technique commonly used in training machine learning algorithms. Often when we're building a machine learning model, we'll develop a cost function which is capable of measuring how well our model is doing. This function will penalize any error our model makes by assigning a cost with respect to the current parameter values. By minimizing the cost function we can find the optimal parameters that yield the best model performance.
https://www.youtube.com/watch?v=sDv4f4s2SB8
Hyperparameter Tuning
It is often necessary to tune hyperparameters by experimentally finding the best combination of values, one per hyperparamter.
Learning Rate
One of the key hyperparameters to set in order to train a neural network is the learning rate for gradient descent. The learning rate parameter scales the magnitude of our weight updates in order to minimize the network's loss function.
If your learning rate is set too low, training will progress very slowly as you are making very tiny updates to the weights in your network. However, if your learning rate is set too high, it can cause undesirable divergent behavior in your loss function.
Convolutional Neural Networks
CNN are used heavily in image recognition applications of machine learning. Convolutional neural networks provide an advantage over feed-forward networks because they are capable of considering locality of features.
https://www.youtube.com/watch?v=m8pOnJxOcqY&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=2
https://www.youtube.com/watch?v=YRhxdVk_sIs
UNet
UNet, evolved from the traditional convolutional neural network, was first designed and applied in 2015 to process biomedical images. As a general convolutional neural network focuses its task on image classification, where input is an image and output is one label, but in biomedical cases, it requires us not only to distinguish whether there is a disease, but also to localise the area of abnormality.
https://towardsdatascience.com/unet-line-by-line-explanation-9b191c76baf5
Recurrent Neural Networks
Recurrent neural networks are good for learning from sequential data.
RNNs are often used in text and speech processing because sentences and texts are naturally sequences of either words/punctuation marks or sequences of characters.
https://www.youtube.com/watch?v=yZv_yRgOvMg&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=3
https://www.youtube.com/watch?v=LHXXI4-IEns
LSTM
Long short-term memory networks are an extension for recurrent neural networks, which basically extends the memory. Therefore it is well suited to learn from important experiences that have very long time lags in between.
https://www.youtube.com/watch?v=QciIcRxJvsM&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=4
LSTMs enable RNNs to remember inputs over a long period of time.
LSTM, Transformer and BERT
https://www.youtube.com/watch?v=xI0HHN5XKDo
GAN Generative Adversarial Networks
https://www.youtube.com/watch?v=C1YUYWP-6rE&list=PLTl9hO2Oobd-GaTYQWIuIs2yyNy7TYbEj&index=5
Transformers, Attention is all you need
https://www.youtube.com/watch?v=TQQlZhbC5ps
Multi-class Neural Networks
One vs. All
https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/one-vs-all
Softmax
https://developers.google.com/machine-learning/crash-course/multi-class-neural-networks/softmax
NN Best Practices
https://developers.google.com/machine-learning/crash-course/training-neural-networks/best-practices
Reinforcement Learning
Reinforcement learning is an approach to machine learning where agents are rewarded to accomplish some task.
Markov Decision Process
The Markov Decision Process is a method for planning in a stochastic environment.
Monte Carlo learning
The Monte Carlo approach approximates the value of a state-action pair by calculating the mean return from a collection of episodes.
Model based vs. Instance based learning
Most supervised learning algorithms are model-based, e.g. SVM. Model-based learning algorithms use the training data to create a model that has parameters learned from the training data. After the model was built, the training data can be discarded.
Instance-based learning algorithms use the whole dataset as the model. One instance-based algorithm frequently used in practice is k-Nearest Neighbors (kNN). In classification, to predict a label for an input example the kNN algorithm looks at the close neighborhood of the input example in the space of feature vectors and outputs the label that it saw the most often in this close neighborhood.
https://www.kaggle.com/getting-started/179177
Shallow vs. Deep learning
A shallow learning algorithm learns the parameters of the model directly from the features of the training examples. Most supervised learning algorithms are shallow. The exceptions are neural network learning algorithms, specifically those that build neural networks with more than one layer between input and output. Such neural networks are called deep neural networks. In deep neural network learning (or, deep learning), contrary to shallow learning, most model parameters are learned not directly from the features of the training examples, but from the outputs of the preceding layers.
https://www.mathworks.com/discovery/deep-learning.html
Characteristics of a machine learning model
https://www.malicksarr.com/type-of-machine-learning-algorithms-the-complete-overview/
https://serokell.io/blog/machine-learning-algorithm-classification-overview
https://towardsdatascience.com/a-tour-of-machine-learning-algorithms-466b8bf75c0a
Natural Language Processing
GCP Translation tools
TF-IDF Vectorization
Term frequency-inverse document frequency (TF-IDF) vectorization is a mouthful to say, but it's also a simple and convenient way to characterize bodies of text.
Build Text Classification Model using TF-IDF and NLTK
https://www.datacamp.com/community/tutorials/text-analytics-beginners-nltk
NLTK
BERT
Bidirectional Encoder Representations from Transformers is described in this paper.
https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270
GPT-3
Generative Pre-trained Transformer 3 (GPT-3) is an autoregressive language model that uses deep learning to produce human-like text.
https://en.wikipedia.org/wiki/GPT-3
https://www.theguardian.com/commentisfree/2020/sep/08/robot-wrote-this-article-gpt-3
https://github.com/elyase/awesome-gpt3
https://github.blog/2021-06-29-introducing-github-copilot-ai-pair-programmer/
Write with transformer
https://transformer.huggingface.co/doc/gpt
BERT and GPT-3
https://www.ibm.com/blogs/watson/2020/12/how-bert-and-gpt-models-change-the-game-for-nlp/
https://360digitmg.com/gpt-vs-bert
GPT-3 Criticism
https://dl.acm.org/doi/10.1145/3442188.3445922
Stochastic parrots
https://faculty.washington.edu/ebender/stochasticparrots.html
BLOOM
https://bigscience.huggingface.co/blog/bloom
DALL-E
https://openai.com/blog/dall-e/
DALL-E 2
Imagen
https://imagen.research.google/
DALL-E mini
https://blog.paperspace.com/dalle-mini/
Code Editors
https://github.com/salesforce/CodeT5
https://github.com/codota/TabNine
ELMo
ELMo is a deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy).
BERT and ELMo
http://jalammar.github.io/illustrated-bert/
Transfer Learning
Transfer Learning is the process of training a model on a large-scale dataset and then using that pre-trained model to process learning for another target task.
Transfer Learning became popular in the field of NLP thanks to the state-of-the-art performance of different algorithms like ULMFiT, Skip-Gram, Elmo, BERT etc.
https://towardsdatascience.com/transfer-learning-using-elmo-embedding-c4a7e415103c
Attention
Transformers
https://jalammar.github.io/illustrated-transformer/
Trends
https://www.elderresearch.com/blog/trends-in-natural-language-processing/
Embeddings
An embedding is a relatively low-dimensional space into which you can translate high-dimensional vectors to capture some of the semantics of the input by placing semantically similar inputs close together in the embedding space.
https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings
Collaborative Filtering
Categorical Input Data
https://developers.google.com/machine-learning/crash-course/embeddings/categorical-input-data
Translating to a Lower Dimensional Space
Obtaining Embeddings
https://developers.google.com/machine-learning/crash-course/embeddings/obtaining-embeddings
spaCy
spaCy supports a number of transfer and multi-task learning workflows that can often help improve your pipeline’s efficiency or accuracy.
Machine Learning on Source Code
https://github.com/src-d/awesome-machine-learning-on-source-code
ML Frameworks and tools
Feature Store
https://cloud.google.com/vertex-ai/docs/featurestore/overview
https://www.hopsworks.ai/post/feature-store-the-missing-data-layer-in-ml-pipelines
Tensorflow
Created by the Google Brain team, TensorFlow is an open source library for numerical computation and large-scale machine learning.
Tensorflow Example: Plain and Simple estimators
TensorFlow has a pretty large API surface, but the part we are going to focus on is high-level APIs, called Estimators.
https://towardsdatascience.com/plain-and-simple-estimators-d8d3f4c185c1
The Estimators API gives us a nice workflow of getting our raw data, passing it through an input function, setting up our feature columns and model structure, running our training, and running our evaluation.
https://www.youtube.com/watch?v=G7oolm0jU8I
Scikit Learn
Scikit Learn provides a range of supervised and unsupervised learning algorithms via a consistent interface.
Keras
Keras is a neural network library. It wraps the efficient numerical computation libraries Theano and TensorFlow and allows you to define and train neural network models.
PyTorch
PyTorch is an awesome source machine learning library based on the Torch library.
Micrograd
https://github.com/karpathy/micrograd
Tinygrad
https://github.com/geohot/tinygrad
JAX
EvoJAX
Kubeflow
Kubeflow Pipelines is a platform for building, deploying, and managing multi-step ML workflows based on Docker containers. Kubeflow offers several components that you can use to build your ML training, hyperparameter tuning, and serving workloads across multiple platforms.
MLOps
MLOps is the process of taking an experimental Machine Learning model into a production web system.
Risks
Fairness
https://wikipedia.org/wiki/Fairness_(machine_learning)
When to not use ML
https://eugeneyan.com/writing/first-rule-of-ml/
Metascience and p-value
https://en.wikipedia.org/wiki/P-value
Diminishing returns on deep learning costs
https://spectrum.ieee.org/deep-learning-computational-cost
Google ML Engineer exam
How to reduce your ML model inference costs
Examples
Machine learning crash course
https://developers.google.com/machine-learning/crash-course
Titanic survival prediction using danfo.js and tensorflow.js
https://danfo.jsdata.org/examples/titanic-survival-prediction-using-danfo.js-and-tensorflow.js
Monitor models for training-serving skew with Vertex AI
BigQuery ML tutorials
https://cloud.google.com/bigquery-ml/docs/tutorials
The Making of an AI Storyteller
https://towardsdatascience.com/the-making-of-an-ai-storyteller-c3b8d5a983f5
Getting started with Keras
https://cloud.google.com/ai-platform/docs/getting-started-keras
Demo of Video Intelligence API
Anomaly detection using River
https://medium.com/spikelab/anomalies-detection-using-river-398544d3536
USAA Insurance Operations
IT and IIOT Monitoring
Mapping carbon pollution globally with satellites
Choose an outfit with AI
https://www.youtube.com/watch?v=o6nGn1euRjk&list=PLIivdWyY5sqLsaG5hNms0D9aZRBE7DHBb&index=7
Machine Learning Glossary
https://developers.google.com/machine-learning/glossary?hl=en
Tutorials
- https://developers.google.com/machine-learning/crash-course/
- https://codelabs.developers.google.com/ml-for-developers
- https://www.mygreatlearning.com/blog/machine-learning-tutorial/
- https://github.com/ujjwalkarn/Machine-Learning-Tutorials
- https://cloud.google.com/blog/topics/developers-practitioners/new-ml-learning-path-vertex-ai
- Machine Learning Tutorial
- https://medium.com/sarus/distributed-ml-with-dask-and-kubernetes-on-gcp-97fdd6533736
- https://towardsdatascience.com/preprocessing-time-series-data-for-supervised-learning-2e27493f44ae
- http://themlbook.com/wiki/doku.php?id=start
- https://d2l.ai/
- https://www.coursera.org/learn/machine-learning
- https://www.javatpoint.com/machine-learning
- self driving car tutorial https://youtu.be/Rs_rAxEsAvI
Links
- https://github.com/ashishpatel26/Real-time-ML-Project
- https://github.com/melvfnz/data_science_portfolio
- StatQuest videos
- Google AI Fun Projects
- AI Platform Training and Prediction sample code repo
- Guide to bring code to ML GCP
- Labs and demos for courses for GCP ML and Bigdata Training
- Official repo for Google AI Platform
- Building Machine Learning and Deep Learning Models on GCP
- Hands-On Machine Learning on GCP
- Machine Learning Mastery
- Awesome Machine Learning
Qwiklabs
- https://www.qwiklabs.com/quests/50
- https://www.qwiklabs.com/quests/32
- https://www.qwiklabs.com/focuses/3389?parent=catalog
- https://www.qwiklabs.com/focuses/3393?parent=catalog
- https://google.qwiklabs.com/quests/82
- https://www.qwiklabs.com/focuses/3391?parent=catalog
- https://www.qwiklabs.com/focuses/1241?parent=catalog