AI - robbiehume/CS-Notes GitHub Wiki

Read Later (Click me)

Useful prompts (Click me)

ChatGPT

I'm going to use this chat for learning more about health topics. I might ask you for answers, tips, or queries to learn more about specific topics. I may also send you links to save. For each link I send, read the article and give a headline title, a short summary (1-2 sentences), and a link to the URL. If you can't access the full article, then use the information from the URL itself or whatever you can get from the page. You don't need to save this information to memory, I just wanted to add it to the context. Do you have any questions or would any additional information be helpful?

Perplexity

For this query, accuracy and comprehensiveness are more important than speed. Please conduct an in-depth search and analysis. <query>

Services to look into

Look into

RAG: link
LoRA
CoT
CUDA
DSPy: link

Overview

Humans are a mix of AI

Sometimes we know "if this happens then do that" (AI)
Sometimes we've seen a lot of similar things before, and we classify them (ML)
Sometimes we haven't seen something before, but we have "learned" a lot of similar concepts, so we can make a decision (Deep Learning)
Sometimes, we get creative, and based on what we've learned, we can generate content (GenAI)

What is AI

AI is a broad field for the development of intelligent systems capable of performing tasks that typically require human intelligence:
- Perception
- Reasoning
- Learning
- Problem solving
- Decision making
AI is an umbrella term for various techniques
Use cases:
- Intelligent Document Processing (IDP): automatically extract structured data from various types of documents, such as invoices contracts, and forms

AI Components

Data Layer: collect vast amounts of data
ML Framework and Algorithm Layer: data scientists and engineers work together to understand use cases, requirements, and frameworks that can solve them
Model Layer: implement a model and train it
- We have the structure, parameters and functions, and set an optimizer function
Application Layer: how to serve the model and its capabilities to users

What is Machine Learning (ML)

ML is a type of AI for building methods that allow machines to learn
Data is leveraged to improve computer performance on a set of tasks
It's used to make predictions based on data used to train the model
You don't explicitly program the rules, you just give data to the algorithm and it creates its own model to classify or understand how the data is being structured

What is Deep Learning (DL)

It is a subset of ML
It uses neurons and synapses (like our brain) to train a model
It's able to process more complex patterns in the data than traditional ML
It's called Deep Learning because there's more than one layer of learning
Ex:
- Computer Vision: image classification, object detection, image segmentation
- NLP: text classification, sentiment analysis, machine translation, language generation
To have a good DL model you need a very large amount of input data and a GPU

What is Generative AI (GenAI)

It's a subset of Deep Learning
It's a multipurpose foundation model backed by neural networks
They can be fine-tuned if necessary to better fit our use cases
GenAI utilizes Transformer Models (LLM)
- They're able to process a sentence as a whole instead of word by word
- It provides faster and more efficient text processing (less training time)
- It give relative importance to specific words in a sentence (more coherent sentences)
Transformer-based LLMs
- Powerful models that can understand and generate human-like text
- Trained on vast amounts of text data from the internet, books, and other sources, and learn patterns and relationships between words and phrases
- Ex: Google BERT, ChatGPT (Generative Pretrained Transformer)
Diffusion models for images

When is ML NOT appropriate

For deterministic problems (the solution can be computed), it's better to write the computer code that is adapted to the problem
- If we use (un)supervised learning or reinforcement learning, we may have an "approximation" of the result

Phases of an ML Project

Fill in from udemy video 66

Define Business Goals

ML Problem Framing

Data Collection & Preparation

Model Development

Model Evaluation

Model Deployment

Model Monitoring

Model Iterations

Hyperparameter Tuning

Hyperparameter

Settings that define the model structure and learning algorithm and process
Set before training begins
Examples: learning rate, batch size, number of epochs, and regularization
Hyperparameters have nothing to do with the data, they're just about the algorithm used to train the model

Hyperparameter Tuning

Finding the best hyperparameter values to optimize the model performance
Improves model accuracy, reduces overfitting, and enhances generalization

Implementations

Grid search, random search
Using services such as SageMaker Automatic Model Tuning (AMT)

Important Hyperparameters

Learning rate
- How large or small the steps are when updating the model's weights during training
- High learning rate can lead to faster convergence, but risks overshooting the optimal solution
- Low learning rate may result in more precise but slower convergence
Batch size
- How many training examples used to update the model weights in one iteration
- Smaller batches can lead to more stable learning, but require more time to compute
- Larger batches are faster but may lead to less stable updates
Number of Epochs
- How many times the model will iterate over the entire training dataset
- Too few epochs can lead to underfitting
- Too many epochs may cause overfitting
Regularization
- Adjusting the balance between simple and complex model
- Increase regularization to reduce overfitting

Training Data

To train our model we must have good data (garbage in --> garbage out)
Collecting data, cleaning it, and ensuring its useful data for your purpose is the one of the most critical part of building a good model

Labeled vs Unlabeled Data

Labeled data: data that includes both input features and corresponding output labels
- Ex: dataset with images of animals where each image is labeled with the corresponding animal type (cat, dog, etc.)
- Use case: supervised learning, where the model is trained to map inputs to known outputs
Unlabeled data: data that includes only input features without any output labels
- Ex: a collection of mimages without any associated labels
- Use case: unsupervised learning, where the model tries to find patterns or structures in the data

Structured vs Unstructured Data

Unstructured data: data that is organized in a structured format, often in rows and columns (like Excel)
- Tabular data: data is arranged in a table with rows representing records and columns representing features
  - Ex: customers database with fields such as name, age, and total purchase amount
- Time series data: data points collected or recorded at successive points in time
  - Ex: stock price es recorded daily over a year
Unstructured data: data that doesn't follow a specific structure and is often text-heavy or multimedia content
- Text data: unstructured data such as articles, social media posts, or customer reviews
  - Ex: a collection of product reviews from an e-commerce site
- Image data: data in the form of images, which can vary widely in format and content
  - Ex: images used for object recognition tasks

Training vs Validation vs Test Set

Training set: used to train the model
- Percentage: typically 60-80% of the dataset
- Ex: 800 labels images from a dataset of 1,000 images
Validation set: used to tune model parameters and validate performance
- Percentage: typically 10-20% of the dataset
- Ex: 100 labeled images for hyperparameter tuning (tune the settings of the algorithm to make it more efficient)
Test set: used to evaluate the final model performance
- Percentage: typically 10-20% of the dataset
- Ex: 100 labeled imaged to test the model's accuracy

Feature Engineering

The process of using domain knowledge to select and transform raw data into meaningful features
It helps enhance the performance of ML models
It's especially meaningful for supervised learning
Techniques:
- Feature Extraction: extracting useful info from raw data
  - Ex: deriving age from date of birth
- Feature Selection: selecting a subset of relevant features
  - Ex: choosing a subset of the data to use only the important predictors in a regression model
- Feature Transformation: transforming the data for better model performance
  - Ex: normalizing numerical data
Feature Engineering on Structured Data (tabular data)
- Ex: predicting house prices based on features like size, location, and number of rooms
- Feature engineering tasks:
  - Feature creation: deriving new features like "price per square foot"
  - Feature selection: identifying and retaining important features such as location or number of bedrooms
  - Feature transformation: normalizing features to ensure they are on a similar scale, which helps algorithms like gradient descent converge faster
Feature Engineering on Unstructured Data (text, images)
- Ex: sentiment analysis of customer reviews
- Feature engineering tasks
  - Text Data: converting text into numerical features using techniques like TF-IDF or word embeddings
  - Image Data: extracting features such as edges or textures using techniques like convolutional neural networks (CNNs)

ML Algorithms

Supervised Learning

Want to learn a mapping function that can predict the output for new unseen input data
Models are trained on labeled data: very powerful, but difficult to perform on millions of datapoints
Techniques:
- Classification: predicts a discrete categroical label of the input data
  - Use cases: scenarios where decisions or predictions need to be made between district categories (fraud, image classification, customer retention, diagnostics)
  - Examples:
    - **Binary classification **(one or the other): classify emails as "spam" or "not spam"
    - Multi-class classification (more than two): classify animals in a zoo as "mammal", "bird", or "reptile"
    - Multi-label classification (can assign multiple to one): assign multiple labels to a movie, like "action" and "comedy"
  - Key algorithm: K-nearest neighbors (k-NN) model
- Regression: predicts a continuous numeric value
  - Use cases: used hen the goal is to predict a quantity or real value
  - Examples: probabilities or scores; sales forecasts, temperature predictions

Unsupervised Learning

Models work with unlabeled data to find patterns, relationships, groupings, or underlying structures
The machine must uncover and create the groups itself, but humans still put labels on the output groups
Even though it uses unlabeled data, feature engineering can still help improve the quality of the training data
Techniques:
- Clustering: used to group similar data points together into clusters based on their features
  - Use cases: customer segmentation, targeted marketing, recommender systems
  - Example: customer segmentation:
    - Scenario: e-commerce company wants to segment its customers to understand different purchasing behaviors
    - Data: a dataset contains customer purchase history (e.g. purchase frequency, average order value)
    - Goal: identify distinct groups of customer based on their purchasing behavior
    - Technique: K-means clustering
    - Outcome: the company can target each segment with tailored marketing strategies
- Dimensionality Reduction
- Anomaly Detection:
  - Example: Fraud Detection
    - Scenario: detect fraudulent credit card transactions • Data: transaction data, including amount, location, and time
    - Goal: identify transactions that deviate significantly from typical behavior
    - Technique: Isolation Forest
    - Outcome: the system flags potentially fraudulent transactions for further investigation
- Association Rule Learning:
  - Example: Market Basket Analysis
    - Scenario: supermarket wants to understand which products are frequently bought together
    - Data: transaction records from customer purchases • Goal: Identify associations between products to optimize product placement and promotions
    - Technique: Apriori algorithm
    - Outcome: the supermarket can place associated products together to boost sales

Semi-Supervised Learning

Use a small amount of labeled data and a large amount of unlabeled data to train systems
- It's useful because labeling data is useful but expensive, so providing some is a happy medium
After that, the partially trained algorithm itself labels the unlabeled data
- This is called pseudo-labeling
Now that everything is labeled, retrain the model on the entire dataset

Self-Supervised Learning

Have a model generate pseudo-labels for its own data without having humans label any data first
- This is useful since labeling data as humans can be expensive
Then, using the pseudo labels, solve problems traditionally solved by supervised learning
This is widely used in NLP (to create the BERT and GPT models for example) and in image recognition tasks
Self-supervised learning intuitive example:
- Create "pre-text tasks" to have the model solve simple tasks to learn patterns in the dataset
- Pretext tasks are not "useful" as-is, but will teach out model to create a "representation" of our dataset
  - Predict any part of the input from any other part
  - Predict the future from the past
  - Predict the masked from the visible
  - Predict any occluded part form all available parts
- After solving the pre-text tasks, we have a model trained that can solve our end goal: "downstream tasks"

Reinforcement Learning (RL)

A type of ML where an agent interacts with an environment and learns to make the maximal decisions by receiving rewards or penalties
Good YouTube channel: channel, video
Key concepts:
- Agent: the learner or decision-maker
- Environment: the external system the agent interacts with
- Action: the choices made by the agent
- Reward: the feedback from the environment based on the agent's actions
- State: the current situation of the environment
- Policy: the strategy the agent uses to determine actions based on the state
How does RL work?
- Learning Process
  - The Agent observes the current State of the Environment
  - It selects an Action based on its Policy
  - The environment transitions to a new State and provides a Reward
  - The Agent updates its Policy to improve future decisions
- Goal: Maximize cumulative reward over time
Example: RL in action
- Scenario: training a robot to navigate a maze
- Steps: robot (Agent) observes its position (State)
  - Chooses a direction to move (Action)
  - Receives a reward (-1 for taking a step, -10 for hitting a wall, +100 for going to the exit)
  - Updates its Policy based on the Reward and new position
- Outcome: the robot learns to navigate the maze efficiently over time
Applications of RL
- Gaming – teaching AI to play complex games (e.g., Chess, Go)
- Robotics – navigating and manipulating objects in dynamic environments
- Finance – portfolio management and trading strategies
- Healthcare – optimizing treatment plans
- Autonomous Vehicles – path planning and decision-making

RLHF: Reinforcement Learning from Human Feedback

Use human feedback to help ML models to self-learn more efficiently
RLHF significantly enhances the model performance
In RL there's a reward function. In RLHF, human feedback is incorporated in the reward function, to be more aligned with human goals, wants, and needs
- First the model's responses are compared to human's responses
- Then, a human assesses the quality of the model's responses
RLHF is used thought GenAI applications including LLM models
Ex: grading text translations from "technically correct" to "human"
For the AWS exam, mostly focus on knowing the 4 steps of RLHF below
Example of how RLHF works: internal company knowledge chatbot
- Data collection
  - Set of human-generated prompts and responses are created
  - “Where is the location of the HR department in Boston?”
- Supervised fine-tuning of a language model
  - Fine-tune an existing model with internal knowledge
  - Then the model creates responses for the human-generated prompts
  - Responses are mathematically compared to human-generated answers
- Build a separate reward model
  - Humans can indicate which response they prefer from the same prompt
  - The reward model can now estimate how a human would prefer a prompt response
- Optimize the language model with the reward-based model
  - Use the reward model as a reward function for RL
  - This part can be fully automated

Model Fit, Bias, and Variance

You want no underrating or overfitting and low bias and variance

Model Fit

In case your model has poor performance, you need to look at its fit
Overfitting (high variance): model performs well on the training data but doesn't perform well on the evaluation data
- This can lead to high variance score
- Occurs due to:
  - Training data size too small or doesn't represent all possible input values
  - The model trains too long on a single sample set of data
  - Model complexity is high and learns from the "noise" within the training data
- How to prevent it:
  - Increase training data size
  - Early stopping the training of the model
  - Can do data augmentation to increase the diversity in the dataset
  - Can adjust hyperparameters, but usually not the best method
Underfitting (high bias): model performs poorly on training data
- Could be a problem of having model too simple or poor data features
What you want is a balance fit: neither overfitting or undercutting

Bias (high = underfitting)

The difference or error between the predicted and actual value
Occurs due to the wrong choice in the ML process
High bias: means the model doesn't closely match the training data
- Ex: linear regression function on a non-linear dataset
- Considered as underfitting
Reducing the bias
- Use a more complex model
- Or increase the number of features

Variance (high = overfitting)

How much the performance of a model changes if trained on a different dataset that has a similar distribution
High variance: means the model is very sensitive to changes in the training data
- This is occurs when overfitting: performs well on training data, but poorly on unseen test data
Reducing the variance
- Use feature selection: consider less features (only the important ones)
- Split the training and test data multiple times

Model Evaluation

Binary and multi-class classification

Confusion Matrix
- Uses true positives (TP) and negatives (TN), false positives (FP) and negatives (FN)
- It's the best way to evaluate performance of a model that does classifications
- Can be for binary classification (image above) or for multi-class classification (multi-dimension confusion matrix)
Metrics
- Precision – Best when false positives are costly
  - Precision = TP / (TP + FP)
  - Best when FP are costly
- Recall – Best when false negatives are costly
  - Recall = TP / (TP + FN)
  - Best when FN are costly
- F1 Score – Best when you want a balance between precision and recall, especially in imbalanced datasets
  - F1 = (2 * Precision * Recall) / (Precision + Recall)
  - Best when you want a balance between precision and recall, especially in imbalanced datasets
- Accuracy (rarely used) – Best for balanced datasets
  - Accuracy = (TP + TN) / (TP + TN + FP + FN
  - Best for balanced datasets
For the AWS exam, you don't need to know the formulas, just need to know precision, recall, f1, and accuracy are used for binary classification
Area Under the ROC Curve (AUC)
- Used for performance evaluation of binary classification models making probabilistic predictions

Regression Metrics

MAE
MAPE
RMSE
R^2 (R Squared)
These are used for evaluating models that predict a continuous value (i.e. regressions)
- For MAE, MAPE, and RMSE, the lower the better
- For R^2, the closer to 1 the better

Inferencing

Inferencing is when a model is making a prediction on new data
Real time
- Computers have to make decisions quickly as data arrives
- Speed is preferred over perfect accuracy
- Ex: chatbots
Batch
- Large amount of data that is analyzed all at once
- Often used for data analysis
- Speed of the results is usually not a concern, accuracy is
Inferencing at the Edge
- Edge devices usually have less computing power and are close to where the data is generated, in places where internet connections can be limited
- Small Language Model (SLM) on the edge device
  - Very low latency
  - Low compute footprint
  - Offline capability; local inference
- Large Language Model (LLM) on a remote server
  - More powerful model
  - Higher latency
  - Must be online to be accessed

Responsible AI & Security

Responsible AI

Making sure AI systems are transparent and trustworthy
Need to mitigate potential risk and negative outcomes
Core facets of Responsible AI
- Fairness: promote inclusion and prevent discrimination
- Explainability
- Privacy and security: individuals control when and if their data is used
- Transparency:
- Veracity and robustness: reliable even in unexpected situations
- Governance: define, implement and enforce responsible AI practices
- Safety: algorithms are safe and beneficial for individuals and society
- Controllability: ability to align to human values and intent
AWS Services for Responsible AI
- Bedrock: human or automatic model evaluation
- Guardrails for Bedrock
- SageMaker Clarify:
  - FM evaluation on accuracy robustness, toxicity
  - Bias detection (ex data skewed towards one race)
- SageMaker Data Wrangler: fix bias by balancing dataset (e.g. with augmented data)
- SageMaker Model Monitor: quality analysis in production
- Amazon Augmented AI (A2I): human review of ML predictions
- Governance: SageMaker Role Manager, Model Cards, and Model Dashboard
- Also have AWS AI Service Cards for some services

Security

Ensure that confidentiality, integrity, and availability are maintained
This applies to your data, information assets, and infrastructure

Governance & Compliance

Governance

Ensure to add value and manage risk in he operation of business
Need clear policies, guidelines, and oversight mechanisms to ensure AI systems align with legal and regulatory requirements

Security

Ensure adherence to regulations and guidelines
Especially for sensitive domains such as healthcare, finance, and legal applications

Prompt tips

OpenAI prompt engineering tips
OpenAI API prompt tips
Zapier prompt engineering tips
15+ Rules For Crafting Effective GPT Chat Prompts
How to write an effective prompt (Zapier)
Past chat about preventing missed context; using checklists
- Use checklists to make sure it doesn't miss anything (can have it generate one or you can provide it one)
Help avoid mistakes:
- Use clear explanation of the problem scope
- Highlight key requirements
- Prompt for more clarification
- Give feedback as you notice issues

Parameters

Playground with parameters (Zapier)

AI - robbiehume/CS-Notes GitHub Wiki

Read Later (Click me)

Useful prompts (Click me)

ChatGPT

Perplexity

Services to look into

Look into

Overview

Humans are a mix of AI

What is AI

AI Components

What is Machine Learning (ML)

What is Deep Learning (DL)

What is Generative AI (GenAI)

When is ML NOT appropriate

Phases of an ML Project

Define Business Goals

ML Problem Framing

Data Collection & Preparation

Model Development

Model Evaluation

Model Deployment

Model Monitoring

Model Iterations

Hyperparameter Tuning

Hyperparameter

Hyperparameter Tuning

Implementations

Important Hyperparameters

Training Data

Labeled vs Unlabeled Data

Structured vs Unstructured Data

Training vs Validation vs Test Set

Feature Engineering

ML Algorithms

Supervised Learning

Unsupervised Learning

Semi-Supervised Learning

Self-Supervised Learning

Reinforcement Learning (RL)

RLHF: Reinforcement Learning from Human Feedback

Model Fit, Bias, and Variance

Model Fit

Bias (high = underfitting)

Variance (high = overfitting)

Model Evaluation

Binary and multi-class classification

Regression Metrics

Inferencing

Responsible AI & Security

Responsible AI

Security

Governance & Compliance

Governance

Security

Prompt tips

Parameters

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️