AWS AI - robbiehume/CS-Notes GitHub Wiki
AI Page
Quick comparison
- Transcribe: speech-to-text; Polly: text-to-speech (TTS); Lex: chatbot with voice (speech) & text, recognizes intentions
- Lex: interpret speech; Comprehend: interpret text
- Kendra: search documents; Comprehend: interpret documents
- Bedrock: GenAI FMs; SageMaker JumpStart: open-source and broader models, you manage more of the data/tuning
Services overview
Machine Learning and AI Services
- Amazon SageMaker: Build, train, and deploy machine learning models
- Amazon Bedrock: Build generative AI apps using foundation models
- Amazon Personalize: Real-time personalized recommendations using ML
- Amazon Forecast: Time-series forecasting using machine learning
- Amazon Q Business: AI assistant for business insights and productivity
- Amazon Q Developer: AI-powered assistant for coding and development tasks
Natural Language Processing (NLP) Services
- Amazon Comprehend: Analyze text for insights, sentiment, and entities
- Amazon Translate: Real-time and batch text translation
- Amazon Textract: Extract text and data from scanned documents
Speech and Conversational AI Services
- Amazon Lex: Build chatbots and voice assistants
- Amazon Polly: Convert text to lifelike speech (text-to-speech)
- Amazon Transcribe: Convert speech to text (speech-to-text)
Computer Vision Services
- Amazon Rekognition: Analyze images and videos for objects, people, and activities
Analytics and Data Processing Services
- AWS Glue: Fully managed ETL service for data preparation
- Amazon Kinesis: Real-time streaming data processing
- Amazon QuickSight: Business intelligence (BI) service for dashboards and reports
Security and Compliance Services
- AWS Identity and Access Management (IAM): Secure access and permissions management
- Amazon CloudWatch: Monitor AWS services and applications
- AWS Key Management Service (KMS): Manage encryption keys securely
Overview
Why AWS AI Managed Services
- The primary advantage of using AWS GenAI service to build applications is that it allows you to quickly prototype, deploy, and scale high-performance AI apps
- AWS AI Services are pre-trained ML services for you uses cases
- Main benefits
- Responsiveness and availability
- Redundancy and regional coverage: deployed across multiple AZs and AWS regions
- Performance: specialized CPU and GPUs for specific use cases for cost saving
- Token-based pricing: only pay for what you use
- Provisioned throughput is also offered for predictable workloads and gives cost savings and predictable performance
AWS Cloud
- Shared responsibility model
- AWS is responsible for security "of" the cloud, including infrastructure, hardware, and software
- The customer is responsible for security "in" the cloud, including data, applications, and access management
AWS Global Infrastructure
- Each AWS Region consists of a minimum of three Availability Zones (AZ)
- Each Availability Zones (AZ) consists of one or more discrete data centers
ML terms you may encounter on the exam
GPT, BERT, and GAN are most common on the exam. But also just remember which is best for certain things: ResNet (images), WaveNet (audio), GAN (data augmentation), GPT/BERT (language) so that on the exam you can tell which is correct by process of elimination
- GPT (Generative Pre-Trained Transformer): generate human text or computer code based on input prompts
- BERT (Bidirectional Encoder Representation from Transformers): similar intent to GPT, but reads the text in two directions, which makes it great for translation purposes
- GAN (Generative Adversarial Network): models used to generate synthetic data such as images, videos, or sounds that resemble the training data
- Helpful for data augmentation where you have underrepresented training data and you want to generate some synthe
- RNN (Recurrent Neural Network): meant for sequential data such as time-series or text; useful in speech recognition, time-series predictions
- ResNet (Residual Network): Deep Convolutional Neural Network (CNN) used for iamage recognition tasks, object detection, facial recognition
- SVM (Support Vector Machine): ML algorithm for classification and regression
- WaveNet: model used to generate raw audio waveform, used in Speech Synthesis
- XGBoost (Extreme Gradient Boosting): an implementation of gradient boosting and is used for regressions
Other definitions
- Foundational Model (FM): a large, general-purpose model that is pre-trained on diverse datasets that can be fine-tuned for downstream tasks
- They use unlabeled datasets for self-supervised learning
- Fine-tuning: the process to further train and refine a pre-trained LLM on a smaller, targeted dataset
- Reinforcement learning: a technique to train an ML model to achieve a goal and maximize cumulative reward
- It uses a trial-and-error process and a reward-based system
- You cannot use reinforcement learning to assess the performance of an FM for text generation
- Transfer Learning: the broader concept of re-using a pre-trained model to adapt to a new related task
- Widely used for image classification and NLP (models like BERT and GPT)
- It can appear on the exam as a general ML concept
- Fine-tuning is a specific kind of transfer learning
- F1 score: used to evaluate a model's accuracy for binary classification
- F1 scores use precision and recall to evaluate how accurate a model correctly classifies the correct class
- You cannot use the F1 score to assess the performance of an FM for text generation
AI model issues
- Not enough training time: leads to low accuracy on both the training data and testing data
- Underfitting: when the model does not identify the relationships in the training data
- It leads to low accuracy on both the training and testing data
- Overfitting: when the model learns from the training data but is unable to perform well when given new data
Tradeoffs of customizing an FM
- Higher cost, higher implementation complexity
GenAI Concepts
Tokenization: converting raw text into a sequence of tokens
- Word-based tokenization: text is split into individual words
- Subword tokenization: some words can be split too (helpful for long words)
- Can experiment at https://platform.openai.com/tokenizer Context Window: the number of tokens an LLM can consider when generating text
- The larger the context window, the more information and coherence
- Large context windows require more memory and processing power
- It should be the first factor to look at when considering a model Embeddings: create vectors (array of numerical values) out of text, images, or audio
- Vectors have a high dimensionality to capture many features for one input token, such as semantic meaning, syntactic role, sentiment
- Embedding models are great for powering search applications
- Words (tokens) that have a semantic relationship have similar embeddings
- Two ways to visualize embeddings:
- Dimensionality reduction of word embeddings to 2D and graph them
- Color visualization of vectors (tokens that have similar colors are semantically similar)
ML development pipeline
Data collection
- A step to label, ingest, and aggregate data that you will use for ML model training
- During data collection, you ingest and aggregate data from multiple sources
- Then, you label the data
Model training
Model evaluation
- Can use model evaluation to evaluate a model's performance and metrics
AI model performance techniques
Hyperparameter tuning
- A method to adjust the behavior of an ML algorithm
Feature engineering
- A method to select and transform variables when you create a predictive model
- It includes feature creation, feature transformation, feature extraction, and feature selection
- It enhances the data by increasing the number of variables in the training dataset to ultimately improve model performance
Prompt Engineering
- Prompt Engineering is the process of developing, designing, and optimizing prompts to enhance the output of FMs for your needs
- A good prompt consists of
- Instructions: a task for the model to do (description, how the model should perform)
- Context: external information to guide the model
- Input data: the input for which you want a response
- Output indicator: the output type or format
- Negative Prompting
- A technique where you explicitly instruct the model on what not to include or do in the response
- It helps to
- Avoid unwanted content: reduces the chances of irrelevant or inappropriate content
- Maintain focus: helps the model stay on topic and not stray into areas that are not useful or desired
- Enhance clarity: prevents the use of complex terminology or detailed data, making the output clearer and more accessible
Prompt Performance Optimization
- System prompts: how the model should behave and reply
- Temperature (0-1): creativity of the model's output
- Low (ex: 0.2): outputs are more conservative, repetitive, and focused on the most likely response
- High (ex: 1.0): outputs are more diverse, creative, and unpredictable, but may be less coherent
- Top P (0-1):
- Low P (ex: 0.25): only consider the 25% most likely words; will make a more coherent response
- High P (ex: 0.99): consider a broad range of possible words; possibly more creative and diverse output
- Top K: limits the number of probable words
- Low K (ex: 10): smaller number of probable words --> more coherent response
- High K (ex: 500): larger number of probable words --> more diverse and creative response
- Length: maximum length of the response
- Stop Sequences: tokens that signal the model to stop generating output
Prompt Latency
- Latency is how fast the model responds
- It's impacted by a few parameters:
- The model size
- The model type (Llama has a different performance than Claude)
- The number of tokens in the input (the bigger the slower)
- The number of tokens in the output (the bigger the slower)
- Latency is not impacted by Top P, Top K, or Temperature
Prompt Engineering Techniques
- Zero-Shot Prompting: present a task to the model without providing examples or explicit training for that task
- You fully rely on the model's general knowledge
- The larger and more capable the FM, the more likely you'll get good results
- Few-Shot Prompting: present examples of a task to the model to guide its output
- You provide a "few shots" to the model to perform the task
- If you provide one example only, this is called "one-shot" or "single-shot"
- Chain of Thought (CoT) Prompting: divide the task into a sequence of reasoning steps
- This leads to more structure and coherence
- Using a sentence like "think step by step" helps
- Helpful when solving a problem as a human usually requires several steps
- Can be combined with few-shot prompting
- RAG: not considered prompt engineering, but often compared to it on the exam
- Combine the model's capability with external data sources to generate a more informed and contextually rich response
Prompt Templates
- Simplify and standardize the process of generating prompts
- Benefits:
- Processes user input text and output prompts from FMs
- Orchestrates between the FM, action groups, and knowledge bases
- Formats and returns responses to the user
- You can also provide examples with few-shot prompting to improve the model performance
- Prompt templates can be used with Bedrock Agents
- Prompt template injections:
- AKA "Ignoring the prompt template" attack
- Users could try to enter malicious inputs to hijack our prompt and provide information on a prohibited or harmful topic
- Protecting against prompt injections:
- Add explicit instructions (Guardrails) to ignore any unrelated or potential malicious content
Amazon Bedrock
- Used to build GenAI applications on AWS
- It's a fully managed service that provides a unified API to access popular FMs
- You get to keep control of your data used to train the model
- It supports image generation models from providers such as Stability AI or AWS
- You can use Amazon Bedrock to consume FMs through a unified API without the need to train, host, or manage ML models
- This is the most suitable solution for a company that does not want to train or manage ML models for image generation
- Out-of-the box features: RAG, LLM Agents, etc.
- Bedrock Studio: gives nice UI access to Bedrock to your team so they can easily create AI-powered applications
- Watermark detection: checks if an image was generated by Amazon Titan Generator
Pricing
- On-Demand: great for predictable workloads, no long-term commitment
- Pay-as-you-go (no commitment); works with Base Models only
- Text models: charged for every input/output token processed
- Embedding models: charged for every input token processed
- Image models: charged for every image generated
- Batch: can provide discounts of up to 50%, but takes longer
- Multiple do predictions at a time (output is a single file in S3)
- Provisioned Throughput: (usually) no cost savings, but maintains capacity and performance
- Purchase model units for a certain time (1 month, 6 months, etc.)
- Throughput: max. number of input/output tokens processed per minute
- Works with Base, Fine-tuned, and Custom Models
Cost savings
- Model pricing type: choose the most cost-effective type (see above) that meets your performance requirements
- Temperature, Top K, Top P: no impact on pricing
- Model size: usually a smaller model will be cheaper (varies based on providers)
- Number of input/output tokens: this is the main driver of cost
Model improvement techniques (cheapest to expensive)
- Prompt engineering: no model training needed (no additional training or fine-tuning)
- RAG: uses external knowledge (FM doesn't need to know everything; less complex)
- No FM changes (no additional computation or fine-tuning)
- Instruction-based Fine-tuning: FM is fine-tuned with specific instructions (requires additional computation)
- Domain Adaptation Fine-tuning: model is trained on a domain-specific dataset (requires intensive computation)
Guardrails for Amazon Bedrock
- Helps you control the interaction between users and FMs
- Can filter undesireable and harmful content
- Can remove PII
- Can reduce hallucinations
- Can be used to ensure that the content aligns with safety and compliance policies
- You have the ability to create multiple Guardrails and monitor and analyze user inputs that can violate the Guardrails
Agents in Bedrock
- Agents can manage and carry out various multi-step tasks related to infrastructure provisioning, application deployment, and operational activities
- Task coordination: perform tasks in the correct order and ensure information is passed correctly between tasks
- Agents are configured to perform specific pre-defined action groups
- They can integrate with other systems, services, databases and APIs to exchange data or initiate actions
- They can also leverage RAG to retrieve info when necessary
CloudWatch and Bedrock
- Model Invocation Logging
- This will send logs of all invocations to CloudWatch and/or S3
- This can include text, images, and embeddings
- You can analyze the data further and build alerting thanks to CloudWatch Logs Insights
- CloudWatch Metrics
- Can have Bedrock publish metrics to CloudWatch
- Such as ContentFilteredCount, which helps to see if Guardrails are functioning
- Can also build CloudWatch Alarms on top of Metrics to get alerted when a Guardrail is triggered or when Bedrock exceeds a threshold for a specific metric
- Can have Bedrock publish metrics to CloudWatch
Model Fine-Tuning in Amazon Bedrock
- Adapt a copy of an FM with your own data
- Fine-tuning will change the weights of the base FM
- Use cases
- A chatbot designed with a particular persona or tone, or geared towards a specific purpose
- E.g. assisting customers or crafting advertisements
- Training using more up-to-date info than what the model previously had access to
- Training with exclusive data
- E.g. your historical emails or messages; records from customer service interactions
- Targeted use cases (categorization, assessing accuracy)
- A chatbot designed with a particular persona or tone, or geared towards a specific purpose
- Training data must:
- Adhere to a specific format
- Be stored in S3
- You must use "provisioned throughput" pricing model to use a fine-tuned model
- This is a different pricing model than on-demand
- Things to know about fine-tuning:
- Re-training an FM requires a higher budget
- You must prepare the data, do the fine-tuning, and evaluate the model
- Instruction-based fine-tuning is usually cheaper as computations are less intense and the amount of data required is usually less
- Running a fine-tuned model is also more expensive (provisioned throughput)
- Fine-tuning is a specific kind of transfer learning, so the answer to a question might be transfer learning instead of fine-tuning
Fine-tuning vs further (continued) pre-training
-
Aspect Further (Continued) Pre-training (AWS) Fine-Tuning (AWS) Instruction-Based Fine-Tuning (AWS) Scope of Training Full model retraining (high cost) Updates selected model layers (e.g., output layers) Aligns the model to human-like responses using multi-task datasets Data Source Type AWS-hosted datasets or customer-provided domain-specific datasets (e.g., AWS Data Exchange) Task-specific datasets in S3 or SageMaker Data Wrangler Instruction datasets, multi-task prompts stored in Amazon S3 Outcome Domain-specialized models for further customization Task-optimized models (e.g., fraud detection model) Multi-task models (e.g., chatbots, Q&A systems using Bedrock Agents) Best AWS Use Case Building a domain-specific foundation model (e.g., a healthcare LLM) Training a customer support model for a specific use case Building a general-purpose chatbot that can complete diverse tasks AWS Cost Impact Highest due to large datasets and long compute time (e.g., GPU costs) Moderate (depends on the dataset size and compute resources) Moderate (depends on instruction dataset size and model size)
Instruction-based fine-tuning
- Improves the performance of a pre-trained FM on domain-specific tasks
- Domain-specific tasks: further trained on a particular field or area of knowledge
- Instruction-based fine-tuning uses labeled examples that are prompt-response pairs and phrased as instructions
- Purpose: tailors the model for specific tasks by training it on labeled instruction-response pairs
- Effect: makes the model follow instructions more effectively and improves alignment with user expectations
- Example: training an LLM on question-answer pairs to improve performance in customer support
-
Single-Turn Messaging
- Part of instruction-based fine-tuning
- Components:
- system (optional): context for the conversation
- messages: a list of message objects, each containing:
- role: either user or assistant
- content: the text content of the message
- Ex:
{ "system": "You are a helpful assistant.", "messages": [ {"role": "user", "content": "What is AWS"}, {"role": "assistant", "content": "it's Amazon Web Services"} ] }
-
Multi-Turn Messaging
- Provide instruction-based fine-tuning for a conversation
- Chatbots = multi-turn environment
- You must alternate between "user" and "assistant" roles
- Ex:
{ "system": "You are an AI assistant specializing in AWS services.", "messages": [ { "role": "user", "content": "Tell me about Amazon SageMaker." }, { "role": "assistant", "content": "Amazon SageMaker is a fully managed service for building, training, and deploying machine learning models at scale." }, { "role": "user", "content": "How does it integrate with other AWS services?" }, { "role": "assistant", "content": "SageMaker integrates with AWS services like S3 for data storage, Lambda for event-driven computing, and CloudWatch for monitoring."} ] }
Continued/further pre-training (domain specialization)
- Also called domain-adaptation fine-tuning, to make a model an expert in a specific domain
- Purpose: further trains a general LLM on additional unlabeled text data, typically domain-specific (e.g., legal, medical, or technical text)
- Data Type: uses raw text or a specialized corpus without explicit instructions or labels
- Effect: expands knowledge and domain expertise but does not necessarily improve instruction-following ability
- Example: training a general LLM on medical textbooks to enhance medical terminology comprehension
Evaluating a model in Amazon Bedrock
- Need to evaluate a model for quality control
- Bedrock comes with some built-in task types:
- Text summarization
- Question and answer
- Text classification
- Open-ended text generation
- ... and others
- You can bring your own prompt dataset or use built-in curated prompt datasets
- Scores can be calculated automatically by a judge model
- Or you can do manual human evaluation
Automated Metrics to Evaluate an FM
ROUGE: Recall-Oriented Understudy for Gisting Evaluation
- A metric that you can use to evaluate the quality of text summarization and text generation
- You can use ROUGE to assess the performance of an FM for text generation
- ROUGE-N: measure the number of matching n-grams between a reference text and the generated text
- Check how many n words match between them
- ROUGE-L: longest common sub-sequence between reference and generated text
BLEU: Bilingual Evaluation Understudy
- Slightly more advanced than ROUGE
- Evaluate the quality of generated text, especially for translations
- Considers both precision and penalizes for too much brevity
- Looks at a combination of n-grams (1, 2, 3, 4)
BERTScore: Bidirectional Encoder Representations from Transformers
- Looks at the semantic similarity (the actual meaning) between generated text
- Uses pre-trained BERT models to compare contextualized embeddings of both texts and computes the cosine similarity between them
- It's capable of capturing more nuance between the texts
Perplexity
- Looks at how well the model predicts the next token (lower is better)
Business Metrics To Evaluate a Model On
- Overall, model response quality is the most important
- User Satisfaction: gather users' feedback and assess their satisfaction with the model responses (e.g. user satisfaction for an e-commerce platform)
- Average Revenue Per User (ARPU): average revenue per user attributed to the GenAI app (e.g. monitor e-commerce user base revenue)
- Cross-Domain Performance: measure the model's ability to perform across different domain tasks (e.g. monitor multi-domain e-commerce platform)
- Conversion Rate: generate recommended desired outcomes such as purchases (e.g. optimizing e-commerce platform for higher conversion rate)
- Efficiency: evaluate the model's efficiency in computation, resource utilization, etc. (e.g. improve production line efficiency)
RAG & Knowledge Base in Bedrock
RAG: Retrival-Augmented Generation
- RAG is the process of improving the quality and consistency of LLMs by referencing an external data source (knowledge base) that is outside of the LLM's training data sources (e.g. in S3)
- Bedrock takes care of creating Vector Embeddings of your data in a database of your choice
- An augmented prompt gets sent to the LLM as a combination of the query and retrieval text together
- Augmented prompt = query (original prompt) + retrieval text (data from knowledge base vector embeddings)
- Use cases:
- Customer service chatbot
- Knowledge base: products, features, specifications, troubleshooting guides, and FAQs
- RAG application: chatbot that can answer customer queries
- Legal research and analysis
- Knowledge base: laws, regulations, case precedents, legal opinions, and expert analysis
- RAG application: chatbot that can provide relevant information for specific legal queries
- Healthcare question answering
- Knowledge base: diseases, treatments, clinical guidelines, research papers, patients, etc.
- RAG application: chatbot that can answer complex medical queries
- Customer service chatbot
RAG Vector Embeddings
- Multiple parts that come together: knowledge base --> document chunks --> embeddings model --> vector database
- The knowledge base (e.g. docs in S3) are broken into chunks and sent through an embeddings model (e.g. Titan, Cohere) which vectorize the data and store it in a vector database (OpenSearch, Aurora, MongoDB, Redis, Pinecone)
RAG Vector Database Types
- The exam may ask to choose which type is best for a specific situation
Preferred choices, great because they're great performance
- Amazon OpenSearch Service: search & analytics database
- Real-time similarity queries; stores millions of vector embeddings
- Offers scalable index management and fast nearest-neighbor (kNN) search capability
- Amazon DocumentDB (with MongoDB compatibility): NoSQL database
- Real-time similarity queries; stores millions of vector embeddings
Relational DBs
- Amazon Aurora: relational database, proprietary on AWS
- Amazon RDS for PostgreSQL: relational database, open-source
Graph DB
- Amazon Neptune
RAG Data Sources
- Amazon S3
- Confluence
- Microsoft SharePoint
- Salesforce
- Web pages (your website, social media feed, etc.)
Amazon Q
- A fully managed GenAI assistant for your employees and developers
- It's based on your company's knowledge and data
- Can answer questions, provide summaries, generate content, automate tasks, etc.
- Perform routine actions (e.g. submit time-off requests, send meeting invites)
- It's built on Amazon Bedrock, but you can't choose the underlying FM
Q Data Sources
- Data Connectors (fully managed RAG): connects to 40+ popular enterprise data sources
- AWS services: S3, RDS, Aurora, WorkDocs, etc.
- MS 365, Salesforce, Google Drive, Gmail, Slack, SharePoint, etc.
- Plugins: allow you to interact with 3rd party services
- Jira, ServiceNow, Zendesk, Salesforce, etc.
- Ex: can have it create a Jira issue
- Custom plugins: connects to any 3rd party application using APIs
Q & IAM Identity Center
- You can have users be authenticated through IAM Identity Center
- This ensures users receive responses generated only from the documents they have access to
- IAM Identity Center can be configured with external Identity Providers (IdP)
- Some IdPs include: Google Login, Microsoft Active Directory, etc.
Admin Controls
- Control and customize responses to your organizational needs
- Admin controls are essentially like Bedrock Guardrails
- Can block specific words or topics
- Can Q respond only with internal knowledge (vs using external knowledge)
- Controls can be global or topic-level (more granular rules)
Q Apps
- Create GenAI-powered apps without coding, by using natural language
- Makes it super easy for anyone in your company to create
- It can leverage your company's internal data
- Also has the option to use plugins (e.g. Jira)
Q Developer
- Chatbot:
- Can answer questions about the AWS documentation and AWS service selection
- Can answer questions about resources in your AWS account
- Can suggest CLI commands to run to make changes to your account
- Can help you do bill analysis, resolve errors, troubleshoot, and more
- Code companion
- Helps you code new applications (similar to GitHub Copilot)
- Can provide real-time code suggestions and security scans
- Also provides a software agent that can implement features, generate documentation, and bootstrap new projects (boilerplate code)
- It integrates with multiple different IDEs
Q for AWS Services
- Q for Quicksight
- Amazon QuickSight is used to visualize your data and create dashboards about them
- Instead you can use Amazon Q to:
- Create executive summaries of your data
- Ask and answer questions about your data
- Then generate and edit visuals for your dashboards
- Q for EC2
- Q can provide guidance and suggestions for EC2 instance types that are best suited for your new workload
- You can provide requirements using natural language to get even more suggestions or ask for advice by providing other workload requirements
- Q for Chatbot
- AWS Chatbot is a way for you to deploy an AWS Chatbot in a Slack or MS Teams channel that knows about your AWS account
- It can troubleshoot issues, receive notifications for alarms, security findings, billing alerts, and create support requests
- You can access Amazon Q directly in AWS Chatbot to accelerate understanding of AWS services, troubleshoot issues, and identify remediation paths
- AWS Chatbot is a way for you to deploy an AWS Chatbot in a Slack or MS Teams channel that knows about your AWS account
- Q for Glue
- AWS Glue is an ETL (Extract, Transform, & Load) service used to move data across places
- Amazon Q for Glue can help with:
- Chat:
- Answer general questions about Glue
- Provide links to the documentation
- Data integration code generation:
- Answer questions about AWS Glue ETL scripts
- Generate new code
- Troubleshoot:
- Understand errors in AWS Glue jobs
- Provide step-by-step instructions, to find the root cause and resolve your issues
- Chat:
Cloud Services
PartyRock
- For the exam, it's not considered a core service
- GenAI app-building playground (powered by Amazon Bedrock)
- Allows you to experiment creating GenAI apps with various FMs (no coding or AWS account required)
- UI is similar to Amazon Q Apps (with less setup and no AWS account required)
CloudWatch vs CloudTrail vs Inspector
- CloudTrail focuses on logging and auditing, especially for API calls
- CloudWatch focuses on monitoring and operational insight
- Inspector focuses on proactive security assessments
- They often complement each other, with CloudTrail providing the event logs that can be ingested by CloudWatch for deeper operational insights
AWS CloudTrail
- Can use CloudTrail to log actions that are taken by a user, role, or service in your account
- Actions are recorded as events in CloudTrail
- It can track user activity and changes that are made to AWS resources
- However, it does not directly assess the security posture of your environment or identify potential security vulnerabilities
- Instead, it provides a history of AWS API calls for auditing, compliance, and troubleshooting purposes
Amazon CloudWatch
- Can use CloudWatch to gather and view metrics that relate to account resources
- You can use CloudWatch to view the number of API calls to Amazon Bedrock
- However, it does not provide a mechanism to examine which user made the API call
- How to monitor Amazon Bedrock by using CloudWatch: link
Amazon Inspector
- Is a vulnerability management service that continuously scans workloads for software vulnerabilities and unintended network exposure
- It assesses the security and compliance of your AWS resources by performing automated security checks based on best practices and common vulnerabilities
- It can assess EC2 instances and Amazon ECR repositories to provide detailed findings and recommendations for remediation
AWS Artifact
- Provides on-demand access to security and compliance documents
- It does not identify security vulnerabilities across EC2 instances and Amazon ECR repositories or provide recommendations for remediation
AWS Audit Manager
- Helps you assess internal risk with pre-built frameworks that translate evidence from cloud services into security IT audit reports
AWS Config
- Provides a detailed view of your AWS resource configurations
- It helps track resource configurations and changes
- It does not assess security vulnerabilities or compliance against specific regulations or standards
- Instead, it focuses on monitoring resource configurations for compliance with desired configurations and best practices
Amazon Macie
- Can be used to discover, classify, and protect sensitive data that is stored in Amazon S3
- It's useful for data security
AWS Trusted Advisor
- Provides information on how to optimize account environments for cost and performance, while maintaining high security standards
Amazon Titan
- High-performance FM from AWS
- Can be customized with your own data
Amazon AI Hardware
- GPU-based EC2 instances: (P3, P4, P5, ..., G3...G6...)
- AWS Trainium
- ML chip built to perform Deep Learning on 100B+ parameter models
- AWS Inferentia
- ML cup built to deliver inference at high performance and low cost
Amazon SageMaker
Summary for AI Practitioner exam
- SageMaker: end-to-end ML service
- SageMaker Automatic Model Tuning (AMT): tune hyperparameters
- SageMaker Deployment & Inference: real-time, serverless, batch, async
- SageMaker Studio: unified interface for SageMaker
- SageMaker Data Wrangler: explore and prepare datasets, create features
- SageMaker Feature Store: store features metadata in a central place
- SageMaker Clarify: compare models, explain model outputs, detect bias
- SageMaker Ground Truth: RLHF, humans for model grading and data labeling
- SageMaker Model Cards: ML model documentation
- SageMaker Model Dashboard: view all your models in one place
- SageMaker Model Monitor: monitoring and alerts for your model
- SageMaker Model Registry: centralized repository to manage ML model versions
- SageMaker Pipelines: CI/CD for Machine Learning
- SageMaker Role Manager: access control
- SageMaker JumpStart: ML model hub & pre-built ML solutions
- SageMaker Canvas: no-code interface for SageMaker
- MLFlow on SageMaker: use MLFlow tracking servers on AWS
- Network Isolation Mode: run SageMaker job containers without any outbound internet access
- SageMaker DeepAR forecasting algorithm: used to forecast time series data; leverages (RNN)
Overview
- For AI Practitioner exam, only need to know about SageMaker and its capabilities at a high level
- Fully managed service for developers and data scientists to build ML models
- It's an end-to-end ML service used to:
- Collect and prepare data
- Build and train ML models
- Deploy the models and monitor the performance of the predictions
- SageMaker built-in algorithms:
- Supervised Algorithms
- Linear regressions and classifications
- KNN Algorithms (for classification)
- Unsupervised Algorithms
- Principal Component Analysis (PCA): reduce number of features
- K-means: find grouping within data
- Anomaly Detection
- Textual Algorithms: NLP, summarization, etc.
- Image Processing: classification, detection, etc.
- Supervised Algorithms
- SageMaker Automatic Model Tuning (AMT)
- Used for hyperparameter tuning
- Saves you time and money
SageMaker Model Deployment and Inference
- Deploy with one click, automatic scaling, no servicer to manage (as opposed to self-hosted)
- Managed solution: reduced overhead
- Deployment Types
- Serverless
- Best for handling idle periods between traffic spikes
- You need to be able to tolerate more latency (cold starts) though
- Real-time (lowest latency)
- One prediction at a time
- Suitable for use cases with low latency or high throughput requirements
- It offers the lowest latency requirements because of the 60-second processing times
- Asynchronous (medium latency)
- For large payload sizes up to 1GB
- Suitable for use cases with larger datasets and processing times of up to 1 hour
- Near real-time latency requirements
- Batch transform (highest latency)
- Prediction for an entire dataset (multiple predictions)
- Suitable for offline processing when data can be processed in batches
- Serverless
SageMaker Studio
- A unified interface for end-to-end ML development
- Can do many things:
- Team collaboration
- Tune and debug ML models
- Deploy ML models
- Automated workflows
SageMaker Data Wrangler
- Helps prepare tabular and image data for machine learning
- Does data preparation, transformation, and feature engineering
- Can also use it to visualize your data
SageMaker Feature Store
- Ingests features from a variety of sources
- Gives you the ability to publish directly from SageMaker Data Wrangler into SageMaker Feature Store
- Features are discoverable within SageMaker Studio
SageMaker Clarify
- Helps you evaluate FMs; it's part of SageMaker Studio
- Can evaluate with human-factors
- Can use built-in datasets or bring your own
- Another important feature is model explainability
- This is a set of tools to help explain how ML models make predictions
- It can also detect and explain biases in your datasets and models
SageMaker Ground Truth
- Used for RLHF
- Model review, customization, and evaluation
- Human feedback for ML
- With SageMaker Ground Truth Plus, you can label data
SageMaker ML Governance
-
- Can use SageMaker Model Cards to create records and to document details about ML models in a single place
- They support transparent and explainable model development by providing comprehensive, immutable documentation of essential model info
-
- SageMaker Model Dashboard is a central place to view, search, and explore all models in an AWS account
- It provides insights into model deployment, usage, performance tracking, and monitoring
-
- Can use SageMaker Role Manager to define user permissions for ML activities
-
- SageMaker Model Monitor monitors the quality of ML models and data in production
-
- Centralized repository allows you to track, manage, and version ML models
- Can manage approval status of a model, automate model deployment, share models, etc.
-
- A workflow that automates the process of building, training, and deploying an ML model
- Allows for CI/CD for ML, enabling you to iterate faster, reduce errors (no manual steps), have repeatable mechanisms, etc.
- A pipeline is composed of Steps, and each Step performs a specific task
- Step types:
- Processing: for data processing (e.g. feature engineering)
- Training: for hyperparameter tuning
- AutoML: automatically train a model
- Mode: create or register a SageMaker model
- ClarifyCheck: perform drift checks against baselines (data bias, model bias, model explainability)
- QualityCheck: perform drift checks against baselines (data quality, model quality)
SageMaker JumpStart
- Provides pre-trained, open source (foundational) models for you to use
- It simplifies the process of getting started with machine learning, offering a wide range of ready-to-use solutions that can be easily deployed and modified as needed
- Models can be fully customized for your data
- Models are deployed on SageMaker directly (full control of deployment options)
- (?) It offers FMs that you can use for summarization and audit use cases
- Two Options:
- ML Hub:
- Browse
- Experiment
- Customize
- Deploy
- ML Solutions:
- Access & browse
- Select & customize
- Deploy
- ML Hub:
SageMaker Canvas
- Can build, evaluate, and deploy ML models with a visual interface (no coding required)
- Can also be used in coordination with Bedrock to fine-tune and deploy language models
- Can leverage Data Wrangler for data preparation
- Has ready-to-use models from Rekognition, Comprehend, and Textract
- Makes it easy to build a full ML pipeline without writing code and by leveraging various AWS AI services
MLFLow on SageMaker
- MLFlow: an open-source tool which helps ML teams manage the entire ML lifecycle
- MLFlow Tracking Servers: used to track runs and experiments
Other AI Services
Amazon Rekognition
- Rekognition is a fully managed AI service that uses deep learning to analyze images and videos
- You need to provide it with labeled images or videos to train the model
- It provides features such as object and scene detection, facial analysis, and text detection
- It does not modify or generate new images
Amazon Personalize
- Personalize is a fully managed ML service that targets recommendations, such as search results or user segments based on interaction data
- You can use it to target a marketing campaign
- For example, it can recommend segments of users who are most likely to respond to a promotion
Amazon Textract
- Can be used to add document text detection and analysis to applications
- Can use it to identify handwritten text, to extract text from documents, and to extract specific information from documents
- It does not provide access to FMs
Amazon Kendra
- An intelligent search service that provides answers to questions based on the data that is provided (can be from document)
- It uses semantic and contextual understanding to provide specific answers
- Can extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs, etc.)
- It does not provide access to FMs
Amazon Q Business
- A generative AI virtual assistant that can answer questions, summarize content, generate content, and complete tasks based on the data that is provided
- It does not provide access to FMs and is not open source
Amazon Polly
- A text-to-speech (TTS) service that can convert text into lifelike speech
Amazon Lex
- Can create conversational interfaces for applications
- It uses natural language understanding and automatic speech recognition to create chatbots
Amazon Comprehend
- Uses natural language processing (NLP) to extract insights and relationships from text or documents
- Language of the text
- Extracts key phrases, places, people, brands, or events
- Etc. (more from udemy video 71)
- Fully managed and serverless
- Sample use cases:
- Analyze customer interactions (emails) to find what leads to a positive or negative experience
- It's useful on its own, but you have the option of Custom Classification
- Organize documents into categories (classes) that you define
- Named Entity Recognition (NER)
- One of the main benefits of Comprehend
- It extracts predefined, general-purpose entities such as people, places, organizations, dates, and other standard categories, from text
- Custom Entity Recognition
- More in udemy video 77
Amazon Translate
- Provides translation between multiple languages
- Cannot be used to improve transcription for domain-specific speech
Amazon Transcribe
- Can be used to convert speech into text
- You can use batch language identification to automatically identify the language of audio files
- If media contains domain-specific or non-standard terms, you can use a custom vocabulary or a custom model to improve the accuracy of the transcriptions
- Amazon Transcribe Medical
- A HIPAA compliant model tailored for healthcare
Amazon Forecast
- Fully managed service that can deliver highly accurate forecasts
- Ex: predict the future sales of a raincoat
- 50% more accurate than looking at the data itself
- Reduce forecasting time from months to hours
- Use cases: product demand planning, financial planning, resource planning
Amazon Mechanical Turk
- Crowdsourcing marketplace to perform simple human tasks
- Distributed virtual workforce
- Use cases: image classification, data collection, business processing
- Integrates with Amazon A2I, SageMaker Ground Truth, etc.
[Amazon Augmented AI (A2I)
- Gives human oversight of ML predictions in production
- The ML model can be built on AWS or elsewhere (SageMaker, Rekognition)