AWS AI - robbiehume/CS-Notes GitHub Wiki

AI Page

Quick comparison

Transcribe: speech-to-text; Polly: text-to-speech (TTS); Lex: chatbot with voice (speech) & text, recognizes intentions
Lex: interpret speech; Comprehend: interpret text
Kendra: search documents; Comprehend: interpret documents
Bedrock: GenAI FMs; SageMaker JumpStart: open-source and broader models, you manage more of the data/tuning

Services overview

Machine Learning and AI Services

Amazon SageMaker: Build, train, and deploy machine learning models
Amazon Bedrock: Build generative AI apps using foundation models
Amazon Personalize: Real-time personalized recommendations using ML
Amazon Forecast: Time-series forecasting using machine learning
Amazon Q Business: AI assistant for business insights and productivity
Amazon Q Developer: AI-powered assistant for coding and development tasks

Natural Language Processing (NLP) Services

Amazon Comprehend: Analyze text for insights, sentiment, and entities
Amazon Translate: Real-time and batch text translation
Amazon Textract: Extract text and data from scanned documents

Speech and Conversational AI Services

Amazon Lex: Build chatbots and voice assistants
Amazon Polly: Convert text to lifelike speech (text-to-speech)
Amazon Transcribe: Convert speech to text (speech-to-text)

Computer Vision Services

Amazon Rekognition: Analyze images and videos for objects, people, and activities

Analytics and Data Processing Services

AWS Glue: Fully managed ETL service for data preparation
Amazon Kinesis: Real-time streaming data processing
Amazon QuickSight: Business intelligence (BI) service for dashboards and reports

Security and Compliance Services

AWS Identity and Access Management (IAM): Secure access and permissions management
Amazon CloudWatch: Monitor AWS services and applications
AWS Key Management Service (KMS): Manage encryption keys securely

Overview

Why AWS AI Managed Services

The primary advantage of using AWS GenAI service to build applications is that it allows you to quickly prototype, deploy, and scale high-performance AI apps
AWS AI Services are pre-trained ML services for you uses cases
Main benefits
- Responsiveness and availability
- Redundancy and regional coverage: deployed across multiple AZs and AWS regions
- Performance: specialized CPU and GPUs for specific use cases for cost saving
- Token-based pricing: only pay for what you use
  - Provisioned throughput is also offered for predictable workloads and gives cost savings and predictable performance

AWS Cloud

Shared responsibility model
- AWS is responsible for security "of" the cloud, including infrastructure, hardware, and software
- The customer is responsible for security "in" the cloud, including data, applications, and access management

AWS Global Infrastructure

Each AWS Region consists of a minimum of three Availability Zones (AZ)
Each Availability Zones (AZ) consists of one or more discrete data centers

ML terms you may encounter on the exam

GPT, BERT, and GAN are most common on the exam. But also just remember which is best for certain things: ResNet (images), WaveNet (audio), GAN (data augmentation), GPT/BERT (language) so that on the exam you can tell which is correct by process of elimination

GPT (Generative Pre-Trained Transformer): generate human text or computer code based on input prompts
BERT (Bidirectional Encoder Representation from Transformers): similar intent to GPT, but reads the text in two directions, which makes it great for translation purposes
GAN (Generative Adversarial Network): models used to generate synthetic data such as images, videos, or sounds that resemble the training data
- Helpful for data augmentation where you have underrepresented training data and you want to generate some synthe
RNN (Recurrent Neural Network): meant for sequential data such as time-series or text; useful in speech recognition, time-series predictions
ResNet (Residual Network): Deep Convolutional Neural Network (CNN) used for iamage recognition tasks, object detection, facial recognition
SVM (Support Vector Machine): ML algorithm for classification and regression
WaveNet: model used to generate raw audio waveform, used in Speech Synthesis
XGBoost (Extreme Gradient Boosting): an implementation of gradient boosting and is used for regressions

Other definitions

Foundational Model (FM): a large, general-purpose model that is pre-trained on diverse datasets that can be fine-tuned for downstream tasks
- They use unlabeled datasets for self-supervised learning
Fine-tuning: the process to further train and refine a pre-trained LLM on a smaller, targeted dataset
Reinforcement learning: a technique to train an ML model to achieve a goal and maximize cumulative reward
- It uses a trial-and-error process and a reward-based system
- You cannot use reinforcement learning to assess the performance of an FM for text generation
Transfer Learning: the broader concept of re-using a pre-trained model to adapt to a new related task
- Widely used for image classification and NLP (models like BERT and GPT)
- It can appear on the exam as a general ML concept
- Fine-tuning is a specific kind of transfer learning
F1 score: used to evaluate a model's accuracy for binary classification
- F1 scores use precision and recall to evaluate how accurate a model correctly classifies the correct class
- You cannot use the F1 score to assess the performance of an FM for text generation

AI model issues

Not enough training time: leads to low accuracy on both the training data and testing data
Underfitting: when the model does not identify the relationships in the training data
- It leads to low accuracy on both the training and testing data
Overfitting: when the model learns from the training data but is unable to perform well when given new data

Tradeoffs of customizing an FM

Higher cost, higher implementation complexity

GenAI Concepts

Tokenization: converting raw text into a sequence of tokens

Word-based tokenization: text is split into individual words
Subword tokenization: some words can be split too (helpful for long words)
Can experiment at https://platform.openai.com/tokenizer Context Window: the number of tokens an LLM can consider when generating text
The larger the context window, the more information and coherence
Large context windows require more memory and processing power
It should be the first factor to look at when considering a model Embeddings: create vectors (array of numerical values) out of text, images, or audio
Vectors have a high dimensionality to capture many features for one input token, such as semantic meaning, syntactic role, sentiment
Embedding models are great for powering search applications
Words (tokens) that have a semantic relationship have similar embeddings
Two ways to visualize embeddings:
- Dimensionality reduction of word embeddings to 2D and graph them
- Color visualization of vectors (tokens that have similar colors are semantically similar)

ML development pipeline

Data collection

A step to label, ingest, and aggregate data that you will use for ML model training
During data collection, you ingest and aggregate data from multiple sources
- Then, you label the data

Model training

Model evaluation

Can use model evaluation to evaluate a model's performance and metrics

AI model performance techniques

Hyperparameter tuning

A method to adjust the behavior of an ML algorithm

Feature engineering

A method to select and transform variables when you create a predictive model
It includes feature creation, feature transformation, feature extraction, and feature selection
It enhances the data by increasing the number of variables in the training dataset to ultimately improve model performance

Prompt Engineering

Prompt Engineering is the process of developing, designing, and optimizing prompts to enhance the output of FMs for your needs
A good prompt consists of
- Instructions: a task for the model to do (description, how the model should perform)
- Context: external information to guide the model
- Input data: the input for which you want a response
- Output indicator: the output type or format
Negative Prompting
- A technique where you explicitly instruct the model on what not to include or do in the response
- It helps to
  - Avoid unwanted content: reduces the chances of irrelevant or inappropriate content
  - Maintain focus: helps the model stay on topic and not stray into areas that are not useful or desired
  - Enhance clarity: prevents the use of complex terminology or detailed data, making the output clearer and more accessible

Prompt Performance Optimization

System prompts: how the model should behave and reply
Temperature (0-1): creativity of the model's output
- Low (ex: 0.2): outputs are more conservative, repetitive, and focused on the most likely response
- High (ex: 1.0): outputs are more diverse, creative, and unpredictable, but may be less coherent
Top P (0-1):
- Low P (ex: 0.25): only consider the 25% most likely words; will make a more coherent response
- High P (ex: 0.99): consider a broad range of possible words; possibly more creative and diverse output
Top K: limits the number of probable words
- Low K (ex: 10): smaller number of probable words --> more coherent response
- High K (ex: 500): larger number of probable words --> more diverse and creative response
Length: maximum length of the response
Stop Sequences: tokens that signal the model to stop generating output

Prompt Latency

Latency is how fast the model responds
It's impacted by a few parameters:
- The model size
- The model type (Llama has a different performance than Claude)
- The number of tokens in the input (the bigger the slower)
- The number of tokens in the output (the bigger the slower)
Latency is not impacted by Top P, Top K, or Temperature

Prompt Engineering Techniques

Zero-Shot Prompting: present a task to the model without providing examples or explicit training for that task
- You fully rely on the model's general knowledge
- The larger and more capable the FM, the more likely you'll get good results
Few-Shot Prompting: present examples of a task to the model to guide its output
- You provide a "few shots" to the model to perform the task
- If you provide one example only, this is called "one-shot" or "single-shot"
Chain of Thought (CoT) Prompting: divide the task into a sequence of reasoning steps
- This leads to more structure and coherence
- Using a sentence like "think step by step" helps
- Helpful when solving a problem as a human usually requires several steps
- Can be combined with few-shot prompting
RAG: not considered prompt engineering, but often compared to it on the exam
- Combine the model's capability with external data sources to generate a more informed and contextually rich response

Prompt Templates

Simplify and standardize the process of generating prompts
Benefits:
- Processes user input text and output prompts from FMs
- Orchestrates between the FM, action groups, and knowledge bases
- Formats and returns responses to the user
You can also provide examples with few-shot prompting to improve the model performance
Prompt templates can be used with Bedrock Agents
Prompt template injections:
- AKA "Ignoring the prompt template" attack
- Users could try to enter malicious inputs to hijack our prompt and provide information on a prohibited or harmful topic
- Protecting against prompt injections:
  - Add explicit instructions (Guardrails) to ignore any unrelated or potential malicious content

Amazon Bedrock

Used to build GenAI applications on AWS
It's a fully managed service that provides a unified API to access popular FMs
You get to keep control of your data used to train the model
It supports image generation models from providers such as Stability AI or AWS
You can use Amazon Bedrock to consume FMs through a unified API without the need to train, host, or manage ML models
This is the most suitable solution for a company that does not want to train or manage ML models for image generation
Out-of-the box features: RAG, LLM Agents, etc.
Bedrock Studio: gives nice UI access to Bedrock to your team so they can easily create AI-powered applications
Watermark detection: checks if an image was generated by Amazon Titan Generator

Pricing

On-Demand: great for predictable workloads, no long-term commitment
- Pay-as-you-go (no commitment); works with Base Models only
- Text models: charged for every input/output token processed
- Embedding models: charged for every input token processed
- Image models: charged for every image generated
Batch: can provide discounts of up to 50%, but takes longer
- Multiple do predictions at a time (output is a single file in S3)
Provisioned Throughput: (usually) no cost savings, but maintains capacity and performance
- Purchase model units for a certain time (1 month, 6 months, etc.)
- Throughput: max. number of input/output tokens processed per minute
- Works with Base, Fine-tuned, and Custom Models

Cost savings

Model pricing type: choose the most cost-effective type (see above) that meets your performance requirements
Temperature, Top K, Top P: no impact on pricing
Model size: usually a smaller model will be cheaper (varies based on providers)
Number of input/output tokens: this is the main driver of cost

Model improvement techniques (cheapest to expensive)

Prompt engineering: no model training needed (no additional training or fine-tuning)
RAG: uses external knowledge (FM doesn't need to know everything; less complex)
- No FM changes (no additional computation or fine-tuning)
Instruction-based Fine-tuning: FM is fine-tuned with specific instructions (requires additional computation)
Domain Adaptation Fine-tuning: model is trained on a domain-specific dataset (requires intensive computation)

Guardrails for Amazon Bedrock

Helps you control the interaction between users and FMs
Can filter undesireable and harmful content
Can remove PII
Can reduce hallucinations
Can be used to ensure that the content aligns with safety and compliance policies
You have the ability to create multiple Guardrails and monitor and analyze user inputs that can violate the Guardrails

Agents in Bedrock

Agents can manage and carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities
Task coordination: perform tasks in the correct order and ensure information is passed correctly between tasks
Agents are configured to perform specific pre-defined action groups
They can integrate with other systems, services, databases and APIs to exchange data or initiate actions
They can also leverage RAG to retrieve info when necessary

CloudWatch and Bedrock

Model Invocation Logging
- This will send logs of all invocations to CloudWatch and/or S3
- This can include text, images, and embeddings
- You can analyze the data further and build alerting thanks to CloudWatch Logs Insights
CloudWatch Metrics
- Can have Bedrock publish metrics to CloudWatch
  - Such as ContentFilteredCount, which helps to see if Guardrails are functioning
- Can also build CloudWatch Alarms on top of Metrics to get alerted when a Guardrail is triggered or when Bedrock exceeds a threshold for a specific metric

Model Fine-Tuning in Amazon Bedrock

Adapt a copy of an FM with your own data
Fine-tuning will change the weights of the base FM
Use cases
- A chatbot designed with a particular persona or tone, or geared towards a specific purpose
  - E.g. assisting customers or crafting advertisements
- Training using more up-to-date info than what the model previously had access to
- Training with exclusive data
  - E.g. your historical emails or messages; records from customer service interactions
- Targeted use cases (categorization, assessing accuracy)
Training data must:
- Adhere to a specific format
- Be stored in S3
You must use "provisioned throughput" pricing model to use a fine-tuned model
- This is a different pricing model than on-demand
Things to know about fine-tuning:
- Re-training an FM requires a higher budget
- You must prepare the data, do the fine-tuning, and evaluate the model
- Instruction-based fine-tuning is usually cheaper as computations are less intense and the amount of data required is usually less
- Running a fine-tuned model is also more expensive (provisioned throughput)
Fine-tuning is a specific kind of transfer learning, so the answer to a question might be transfer learning instead of fine-tuning

Fine-tuning vs further (continued) pre-training

Aspect	Further (Continued) Pre-training (AWS)	Fine-Tuning (AWS)	Instruction-Based Fine-Tuning (AWS)
Scope of Training	Full model retraining (high cost)	Updates selected model layers (e.g., output layers)	Aligns the model to human-like responses using multi-task datasets
Data Source Type	AWS-hosted datasets or customer-provided domain-specific datasets (e.g., AWS Data Exchange)	Task-specific datasets in S3 or SageMaker Data Wrangler	Instruction datasets, multi-task prompts stored in Amazon S3
Outcome	Domain-specialized models for further customization	Task-optimized models (e.g., fraud detection model)	Multi-task models (e.g., chatbots, Q&A systems using Bedrock Agents)
Best AWS Use Case	Building a domain-specific foundation model (e.g., a healthcare LLM)	Training a customer support model for a specific use case	Building a general-purpose chatbot that can complete diverse tasks
AWS Cost Impact	Highest due to large datasets and long compute time (e.g., GPU costs)	Moderate (depends on the dataset size and compute resources)	Moderate (depends on instruction dataset size and model size)

Instruction-based fine-tuning

Improves the performance of a pre-trained FM on domain-specific tasks
Domain-specific tasks: further trained on a particular field or area of knowledge
Instruction-based fine-tuning uses labeled examples that are prompt-response pairs and phrased as instructions
Purpose: tailors the model for specific tasks by training it on labeled instruction-response pairs
Effect: makes the model follow instructions more effectively and improves alignment with user expectations
Example: training an LLM on question-answer pairs to improve performance in customer support
Single-Turn Messaging
- Part of instruction-based fine-tuning
- Components:
  - system (optional): context for the conversation
  - messages: a list of message objects, each containing:
    - role: either user or assistant
    - content: the text content of the message
- Ex:
```
{ 
  "system": "You are a helpful assistant.",
  "messages": [
    {"role": "user", "content": "What is AWS"},
    {"role": "assistant", "content": "it's Amazon Web Services"}
  ]
}
```

Multi-Turn Messaging

Provide instruction-based fine-tuning for a conversation
Chatbots = multi-turn environment
You must alternate between "user" and "assistant" roles

Ex:

{
  "system": "You are an AI assistant specializing in AWS services.",
  "messages": [
    { "role": "user", "content": "Tell me about Amazon SageMaker." },
    { "role": "assistant", "content": "Amazon SageMaker is a fully managed service for building, training, and deploying machine learning models at scale." },
    { "role": "user", "content": "How does it integrate with other AWS services?" },
    { "role": "assistant", "content": "SageMaker integrates with AWS services like S3 for data storage, Lambda for event-driven computing, and CloudWatch for monitoring."}
  ]
}

Continued/further pre-training (domain specialization)

Also called domain-adaptation fine-tuning, to make a model an expert in a specific domain
Purpose: further trains a general LLM on additional unlabeled text data, typically domain-specific (e.g., legal, medical, or technical text)
Data Type: uses raw text or a specialized corpus without explicit instructions or labels
Effect: expands knowledge and domain expertise but does not necessarily improve instruction-following ability
Example: training a general LLM on medical textbooks to enhance medical terminology comprehension

Evaluating a model in Amazon Bedrock

Need to evaluate a model for quality control
Bedrock comes with some built-in task types:
- Text summarization
- Question and answer
- Text classification
- Open-ended text generation
- ... and others
You can bring your own prompt dataset or use built-in curated prompt datasets
Scores can be calculated automatically by a judge model
- Or you can do manual human evaluation

Automated Metrics to Evaluate an FM

ROUGE: Recall-Oriented Understudy for Gisting Evaluation

A metric that you can use to evaluate the quality of text summarization and text generation
You can use ROUGE to assess the performance of an FM for text generation
ROUGE-N: measure the number of matching n-grams between a reference text and the generated text
- Check how many n words match between them
ROUGE-L: longest common sub-sequence between reference and generated text

BLEU: Bilingual Evaluation Understudy

Slightly more advanced than ROUGE
Evaluate the quality of generated text, especially for translations
Considers both precision and penalizes for too much brevity
Looks at a combination of n-grams (1, 2, 3, 4)

BERTScore: Bidirectional Encoder Representations from Transformers

Looks at the semantic similarity (the actual meaning) between generated text
Uses pre-trained BERT models to compare contextualized embeddings of both texts and computes the cosine similarity between them
It's capable of capturing more nuance between the texts

Perplexity

Looks at how well the model predicts the next token (lower is better)

Business Metrics To Evaluate a Model On

Overall, model response quality is the most important
User Satisfaction: gather users' feedback and assess their satisfaction with the model responses (e.g. user satisfaction for an e-commerce platform)
Average Revenue Per User (ARPU): average revenue per user attributed to the GenAI app (e.g. monitor e-commerce user base revenue)
Cross-Domain Performance: measure the model's ability to perform across different domain tasks (e.g. monitor multi-domain e-commerce platform)
Conversion Rate: generate recommended desired outcomes such as purchases (e.g. optimizing e-commerce platform for higher conversion rate)
Efficiency: evaluate the model's efficiency in computation, resource utilization, etc. (e.g. improve production line efficiency)

RAG & Knowledge Base in Bedrock

RAG: Retrival-Augmented Generation

RAG is the process of improving the quality and consistency of LLMs by referencing an external data source (knowledge base) that is outside of the LLM's training data sources (e.g. in S3)
Bedrock takes care of creating Vector Embeddings of your data in a database of your choice
An augmented prompt gets sent to the LLM as a combination of the query and retrieval text together
- Augmented prompt = query (original prompt) + retrieval text (data from knowledge base vector embeddings)
Use cases:
- Customer service chatbot
  - Knowledge base: products, features, specifications, troubleshooting guides, and FAQs
  - RAG application: chatbot that can answer customer queries
- Legal research and analysis
  - Knowledge base: laws, regulations, case precedents, legal opinions, and expert analysis
  - RAG application: chatbot that can provide relevant information for specific legal queries
- Healthcare question answering
  - Knowledge base: diseases, treatments, clinical guidelines, research papers, patients, etc.
  - RAG application: chatbot that can answer complex medical queries

RAG Vector Embeddings

Multiple parts that come together: knowledge base --> document chunks --> embeddings model --> vector database
The knowledge base (e.g. docs in S3) are broken into chunks and sent through an embeddings model (e.g. Titan, Cohere) which vectorize the data and store it in a vector database (OpenSearch, Aurora, MongoDB, Redis, Pinecone)

RAG Vector Database Types

The exam may ask to choose which type is best for a specific situation

Preferred choices, great because they're great performance

Amazon OpenSearch Service: search & analytics database
- Real-time similarity queries; stores millions of vector embeddings
- Offers scalable index management and fast nearest-neighbor (kNN) search capability
Amazon DocumentDB (with MongoDB compatibility): NoSQL database
- Real-time similarity queries; stores millions of vector embeddings

Relational DBs

Amazon Aurora: relational database, proprietary on AWS
Amazon RDS for PostgreSQL: relational database, open-source

Graph DB

Amazon Neptune

RAG Data Sources

Amazon S3
Confluence
Microsoft SharePoint
Salesforce
Web pages (your website, social media feed, etc.)

Amazon Q

A fully managed GenAI assistant for your employees and developers
It's based on your company's knowledge and data
- Can answer questions, provide summaries, generate content, automate tasks, etc.
- Perform routine actions (e.g. submit time-off requests, send meeting invites)
It's built on Amazon Bedrock, but you can't choose the underlying FM

Q Data Sources

Data Connectors (fully managed RAG): connects to 40+ popular enterprise data sources
- AWS services: S3, RDS, Aurora, WorkDocs, etc.
- MS 365, Salesforce, Google Drive, Gmail, Slack, SharePoint, etc.
Plugins: allow you to interact with 3rd party services
- Jira, ServiceNow, Zendesk, Salesforce, etc.
- Ex: can have it create a Jira issue
- Custom plugins: connects to any 3rd party application using APIs

Q & IAM Identity Center

You can have users be authenticated through IAM Identity Center
This ensures users receive responses generated only from the documents they have access to
IAM Identity Center can be configured with external Identity Providers (IdP)
- Some IdPs include: Google Login, Microsoft Active Directory, etc.

Admin Controls

Control and customize responses to your organizational needs
Admin controls are essentially like Bedrock Guardrails
Can block specific words or topics
Can Q respond only with internal knowledge (vs using external knowledge)
Controls can be global or topic-level (more granular rules)

Q Apps

Create GenAI-powered apps without coding, by using natural language
Makes it super easy for anyone in your company to create
It can leverage your company's internal data
Also has the option to use plugins (e.g. Jira)

Q Developer

Chatbot:
- Can answer questions about the AWS documentation and AWS service selection
- Can answer questions about resources in your AWS account
- Can suggest CLI commands to run to make changes to your account
- Can help you do bill analysis, resolve errors, troubleshoot, and more
Code companion
- Helps you code new applications (similar to GitHub Copilot)
- Can provide real-time code suggestions and security scans
- Also provides a software agent that can implement features, generate documentation, and bootstrap new projects (boilerplate code)
- It integrates with multiple different IDEs

Q for AWS Services

Q for Quicksight
- Amazon QuickSight is used to visualize your data and create dashboards about them
- Instead you can use Amazon Q to:
  - Create executive summaries of your data
  - Ask and answer questions about your data
  - Then generate and edit visuals for your dashboards
Q for EC2
- Q can provide guidance and suggestions for EC2 instance types that are best suited for your new workload
- You can provide requirements using natural language to get even more suggestions or ask for advice by providing other workload requirements
Q for Chatbot
- AWS Chatbot is a way for you to deploy an AWS Chatbot in a Slack or MS Teams channel that knows about your AWS account
  - It can troubleshoot issues, receive notifications for alarms, security findings, billing alerts, and create support requests
- You can access Amazon Q directly in AWS Chatbot to accelerate understanding of AWS services, troubleshoot issues, and identify remediation paths
Q for Glue
- AWS Glue is an ETL (Extract, Transform, & Load) service used to move data across places
- Amazon Q for Glue can help with:
  - Chat:
    - Answer general questions about Glue
    - Provide links to the documentation
  - Data integration code generation:
    - Answer questions about AWS Glue ETL scripts
    - Generate new code
  - Troubleshoot:
    - Understand errors in AWS Glue jobs
    - Provide step-by-step instructions, to find the root cause and resolve your issues

Cloud Services

PartyRock

For the exam, it's not considered a core service
GenAI app-building playground (powered by Amazon Bedrock)
Allows you to experiment creating GenAI apps with various FMs (no coding or AWS account required)
UI is similar to Amazon Q Apps (with less setup and no AWS account required)

CloudWatch vs CloudTrail vs Inspector

CloudTrail focuses on logging and auditing, especially for API calls
CloudWatch focuses on monitoring and operational insight
Inspector focuses on proactive security assessments
They often complement each other, with CloudTrail providing the event logs that can be ingested by CloudWatch for deeper operational insights

AWS CloudTrail

Can use CloudTrail to log actions that are taken by a user, role, or service in your account
- Actions are recorded as events in CloudTrail
It can track user activity and changes that are made to AWS resources
- However, it does not directly assess the security posture of your environment or identify potential security vulnerabilities
- Instead, it provides a history of AWS API calls for auditing, compliance, and troubleshooting purposes

Amazon CloudWatch

Can use CloudWatch to gather and view metrics that relate to account resources
You can use CloudWatch to view the number of API calls to Amazon Bedrock
- However, it does not provide a mechanism to examine which user made the API call
How to monitor Amazon Bedrock by using CloudWatch: link

Amazon Inspector

Is a vulnerability management service that continuously scans workloads for software vulnerabilities and unintended network exposure
It assesses the security and compliance of your AWS resources by performing automated security checks based on best practices and common vulnerabilities
It can assess EC2 instances and Amazon ECR repositories to provide detailed findings and recommendations for remediation

AWS Artifact

Provides on-demand access to security and compliance documents
It does not identify security vulnerabilities across EC2 instances and Amazon ECR repositories or provide recommendations for remediation

AWS Audit Manager

Helps you assess internal risk with pre-built frameworks that translate evidence from cloud services into security IT audit reports

AWS Config

Provides a detailed view of your AWS resource configurations
It helps track resource configurations and changes
It does not assess security vulnerabilities or compliance against specific regulations or standards
Instead, it focuses on monitoring resource configurations for compliance with desired configurations and best practices

Amazon Macie

Can be used to discover, classify, and protect sensitive data that is stored in Amazon S3
It's useful for data security

AWS Trusted Advisor

Provides information on how to optimize account environments for cost and performance, while maintaining high security standards

Amazon Titan

High-performance FM from AWS
Can be customized with your own data

Amazon AI Hardware

GPU-based EC2 instances: (P3, P4, P5, ..., G3...G6...)
AWS Trainium
- ML chip built to perform Deep Learning on 100B+ parameter models
AWS Inferentia
- ML cup built to deliver inference at high performance and low cost

Amazon SageMaker

Summary for AI Practitioner exam

SageMaker: end-to-end ML service
SageMaker Automatic Model Tuning (AMT): tune hyperparameters
SageMaker Deployment & Inference: real-time, serverless, batch, async
SageMaker Studio: unified interface for SageMaker
SageMaker Data Wrangler: explore and prepare datasets, create features
SageMaker Feature Store: store features metadata in a central place
SageMaker Clarify: compare models, explain model outputs, detect bias
SageMaker Ground Truth: RLHF, humans for model grading and data labeling
SageMaker Model Cards: ML model documentation
SageMaker Model Dashboard: view all your models in one place
SageMaker Model Monitor: monitoring and alerts for your model
SageMaker Model Registry: centralized repository to manage ML model versions
SageMaker Pipelines: CI/CD for Machine Learning
SageMaker Role Manager: access control
SageMaker JumpStart: ML model hub & pre-built ML solutions
SageMaker Canvas: no-code interface for SageMaker
MLFlow on SageMaker: use MLFlow tracking servers on AWS
Network Isolation Mode: run SageMaker job containers without any outbound internet access
SageMaker DeepAR forecasting algorithm: used to forecast time series data; leverages (RNN)

Overview

For AI Practitioner exam, only need to know about SageMaker and its capabilities at a high level
Fully managed service for developers and data scientists to build ML models
It's an end-to-end ML service used to:
- Collect and prepare data
- Build and train ML models
- Deploy the models and monitor the performance of the predictions
SageMaker built-in algorithms:
- Supervised Algorithms
  - Linear regressions and classifications
  - KNN Algorithms (for classification)
- Unsupervised Algorithms
  - Principal Component Analysis (PCA): reduce number of features
  - K-means: find grouping within data
  - Anomaly Detection
- Textual Algorithms: NLP, summarization, etc.
- Image Processing: classification, detection, etc.
SageMaker Automatic Model Tuning (AMT)
- Used for hyperparameter tuning
- Saves you time and money

SageMaker Model Deployment and Inference

Deploy with one click, automatic scaling, no servicer to manage (as opposed to self-hosted)
Managed solution: reduced overhead
Deployment Types
- Serverless
  - Best for handling idle periods between traffic spikes
  - You need to be able to tolerate more latency (cold starts) though
- Real-time (lowest latency)
  - One prediction at a time
  - Suitable for use cases with low latency or high throughput requirements
  - It offers the lowest latency requirements because of the 60-second processing times
- Asynchronous (medium latency)
  - For large payload sizes up to 1GB
  - Suitable for use cases with larger datasets and processing times of up to 1 hour
  - Near real-time latency requirements
- Batch transform (highest latency)
  - Prediction for an entire dataset (multiple predictions)
  - Suitable for offline processing when data can be processed in batches

SageMaker Studio

A unified interface for end-to-end ML development
Can do many things:
- Team collaboration
- Tune and debug ML models
- Deploy ML models
- Automated workflows

SageMaker Data Wrangler

Helps prepare tabular and image data for machine learning
Does data preparation, transformation, and feature engineering
Can also use it to visualize your data

SageMaker Feature Store

Ingests features from a variety of sources
Gives you the ability to publish directly from SageMaker Data Wrangler into SageMaker Feature Store
Features are discoverable within SageMaker Studio

SageMaker Clarify

Helps you evaluate FMs; it's part of SageMaker Studio
Can evaluate with human-factors
Can use built-in datasets or bring your own
Another important feature is model explainability
- This is a set of tools to help explain how ML models make predictions
It can also detect and explain biases in your datasets and models

SageMaker Ground Truth

Used for RLHF
- Model review, customization, and evaluation
Human feedback for ML
With SageMaker Ground Truth Plus, you can label data

SageMaker ML Governance

SageMaker Model Cards
- Can use SageMaker Model Cards to create records and to document details about ML models in a single place
- They support transparent and explainable model development by providing comprehensive, immutable documentation of essential model info
SageMaker Model Dashboard
- SageMaker Model Dashboard is a central place to view, search, and explore all models in an AWS account
- It provides insights into model deployment, usage, performance tracking, and monitoring
SageMaker Role Manager
- Can use SageMaker Role Manager to define user permissions for ML activities
SageMaker Model Monitor
- SageMaker Model Monitor monitors the quality of ML models and data in production
SageMaker Model Registry
- Centralized repository allows you to track, manage, and version ML models
- Can manage approval status of a model, automate model deployment, share models, etc.
SageMaker Pipelines
- A workflow that automates the process of building, training, and deploying an ML model
- Allows for CI/CD for ML, enabling you to iterate faster, reduce errors (no manual steps), have repeatable mechanisms, etc.
- A pipeline is composed of Steps, and each Step performs a specific task
- Step types:
  - Processing: for data processing (e.g. feature engineering)
  - Training: for hyperparameter tuning
  - AutoML: automatically train a model
  - Mode: create or register a SageMaker model
  - ClarifyCheck: perform drift checks against baselines (data bias, model bias, model explainability)
  - QualityCheck: perform drift checks against baselines (data quality, model quality)

SageMaker JumpStart

Provides pre-trained, open source (foundational) models for you to use
It simplifies the process of getting started with machine learning, offering a wide range of ready-to-use solutions that can be easily deployed and modified as needed
Models can be fully customized for your data
Models are deployed on SageMaker directly (full control of deployment options)
(?) It offers FMs that you can use for summarization and audit use cases
Two Options:
- ML Hub:
  - Browse
  - Experiment
  - Customize
  - Deploy
- ML Solutions:
  - Access & browse
  - Select & customize
  - Deploy

SageMaker Canvas

Can build, evaluate, and deploy ML models with a visual interface (no coding required)
Can also be used in coordination with Bedrock to fine-tune and deploy language models
Can leverage Data Wrangler for data preparation
Has ready-to-use models from Rekognition, Comprehend, and Textract
Makes it easy to build a full ML pipeline without writing code and by leveraging various AWS AI services

MLFLow on SageMaker

MLFlow: an open-source tool which helps ML teams manage the entire ML lifecycle
MLFlow Tracking Servers: used to track runs and experiments

Other AI Services

Amazon Rekognition

Rekognition is a fully managed AI service that uses deep learning to analyze images and videos
You need to provide it with labeled images or videos to train the model
It provides features such as object and scene detection, facial analysis, and text detection
It does not modify or generate new images

Amazon Personalize

Personalize is a fully managed ML service that targets recommendations, such as search results or user segments based on interaction data
You can use it to target a marketing campaign
- For example, it can recommend segments of users who are most likely to respond to a promotion

Amazon Textract

Can be used to add document text detection and analysis to applications
Can use it to identify handwritten text, to extract text from documents, and to extract specific information from documents
It does not provide access to FMs

Amazon Kendra

An intelligent search service that provides answers to questions based on the data that is provided (can be from document)
It uses semantic and contextual understanding to provide specific answers
Can extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs, etc.)
It does not provide access to FMs

Amazon Q Business

A generative AI virtual assistant that can answer questions, summarize content, generate content, and complete tasks based on the data that is provided
It does not provide access to FMs and is not open source

Amazon Polly

A text-to-speech (TTS) service that can convert text into lifelike speech

Amazon Lex

Can create conversational interfaces for applications
It uses natural language understanding and automatic speech recognition to create chatbots

Amazon Comprehend

Uses natural language processing (NLP) to extract insights and relationships from text or documents
- Language of the text
- Extracts key phrases, places, people, brands, or events
- Etc. (more from udemy video 71)
Fully managed and serverless
Sample use cases:
- Analyze customer interactions (emails) to find what leads to a positive or negative experience
It's useful on its own, but you have the option of Custom Classification
- Organize documents into categories (classes) that you define
Named Entity Recognition (NER)
- One of the main benefits of Comprehend
- It extracts predefined, general-purpose entities such as people, places, organizations, dates, and other standard categories, from text
Custom Entity Recognition
- More in udemy video 77

Amazon Translate

Provides translation between multiple languages
Cannot be used to improve transcription for domain-specific speech

Amazon Transcribe

Can be used to convert speech into text
You can use batch language identification to automatically identify the language of audio files
If media contains domain-specific or non-standard terms, you can use a custom vocabulary or a custom model to improve the accuracy of the transcriptions
Amazon Transcribe Medical
- A HIPAA compliant model tailored for healthcare

Amazon Forecast

Fully managed service that can deliver highly accurate forecasts
Ex: predict the future sales of a raincoat
50% more accurate than looking at the data itself
Reduce forecasting time from months to hours
Use cases: product demand planning, financial planning, resource planning

Amazon Mechanical Turk

Crowdsourcing marketplace to perform simple human tasks
Distributed virtual workforce
Use cases: image classification, data collection, business processing
Integrates with Amazon A2I, SageMaker Ground Truth, etc.

[Amazon Augmented AI (A2I)

Gives human oversight of ML predictions in production
The ML model can be built on AWS or elsewhere (SageMaker, Rekognition)

AWS AI - robbiehume/CS-Notes GitHub Wiki

Quick comparison

Services overview

Machine Learning and AI Services

Natural Language Processing (NLP) Services

Speech and Conversational AI Services

Computer Vision Services

Analytics and Data Processing Services

Security and Compliance Services

Overview

Why AWS AI Managed Services

AWS Cloud

AWS Global Infrastructure

ML terms you may encounter on the exam

GPT, BERT, and GAN are most common on the exam. But also just remember which is best for certain things: ResNet (images), WaveNet (audio), GAN (data augmentation), GPT/BERT (language) so that on the exam you can tell which is correct by process of elimination

Other definitions

AI model issues

Tradeoffs of customizing an FM

GenAI Concepts

ML development pipeline

AI model performance techniques

Prompt Engineering

Prompt Performance Optimization

Prompt Latency

Prompt Engineering Techniques

Prompt Templates

Pricing

Cost savings

Model improvement techniques (cheapest to expensive)

Guardrails for Amazon Bedrock

Agents in Bedrock

CloudWatch and Bedrock

Model Fine-Tuning in Amazon Bedrock

Fine-tuning vs further (continued) pre-training

Single-Turn Messaging

Multi-Turn Messaging

Continued/further pre-training (domain specialization)

Evaluating a model in Amazon Bedrock

Automated Metrics to Evaluate an FM

ROUGE: Recall-Oriented Understudy for Gisting Evaluation

BLEU: Bilingual Evaluation Understudy

BERTScore: Bidirectional Encoder Representations from Transformers

Perplexity

Business Metrics To Evaluate a Model On

RAG & Knowledge Base in Bedrock

RAG: Retrival-Augmented Generation

RAG Vector Embeddings

RAG Vector Database Types

Preferred choices, great because they're great performance

Relational DBs

Graph DB

RAG Data Sources

Amazon Q

Q Data Sources

Q & IAM Identity Center

Admin Controls

Q Apps

Q Developer

Q for AWS Services

Cloud Services

CloudWatch vs CloudTrail vs Inspector

Amazon Titan

Amazon AI Hardware

Amazon SageMaker

Summary for AI Practitioner exam

Overview

SageMaker Studio

Other AI Services

[Amazon Augmented AI (A2I)