AWS ‐ CP ‐ Module 6 - rFronteddu/general_wiki GitHub Wiki

Introduction To Storage

  • Block (block-level storage volumes that attach to EC2 instances like physical HD, can be modified while in use without disrupting the instance),
  • object (manages data as objects in a flat address space. Scales well, supports metadata for search and analytics), and
  • file (file systems accessible over networks, simultaneous users) storage and AWS services.
  • AWS Storage Services
    • Block Storage - EC2 Instance Store: unmanaged non-persistent, high-performance.
    • Block Storage - Elastic Block Store (EBS): Managed, persistent.
    • Object Storage - Simple Storage Service (S3): Fully managed.
    • File Storage - Elastic File System (EFS): Fully managed, NSF file system for on-premise and cloud.
    • File Storage - FSx: Fully managed for popular file systems.
    • Other - Storage GW: Fully managed, hybrid-cloud to provide cloud storage with on-premises access.
    • Other - EDR: Fully managed, to recover resources into AWS.

Responsibility Model For Storage:

  • Fully Managed: AWS is responsible for hardware, infrastructure to storage stack (data durability, availability, encryption at rest, and replication). Customers responsible for data management, access controls, and proper service configuration.
  • Managed: AWS manages storage infrastructure, hardware redundancy, and volume replication. Customers data backup strategies, encryption configuration, volume performance optimization, and capacity planning.
  • Unmanaged: Customers takes full responsibility for data management, backup/recovery, encryption, performance optimization, and durability. AWS only maintains physical hardware and network infrastructure.

Block Storage - Instance Store (Temp) And EBS (Permanent)

  • EC2 instance store - Key takeaway: no data persistence, performant, cost effective: IS is the default block-level storage physically attached to the EC2 instance host computer. Data is lost when an instance is stopped or terminated, best for temp memory-based storage needs like buffers, caches, and scratch data.
  • EBS - Key takeaway: data persistence, portability, automatic replication within an AZ: EBS volumes act like persistent external hard drives, offering consistent and low-latency performance for workloads like databases and file systems. EBS volumes can be backed up, resized, and attached to different EC2 instances. To create an EBS volume, you define the configuration for things like volume size and type. After the volume has been created, it can be attached to an EC2 instance. Incremental backups through EBS snapshots are recommended.

EBS Data Lifecycle

  • Describe EBS snapshots and their use cases for data management.
  • Describe the EBS data lifecycle and how it integrates with AWS services.
  • Identify the customer responsibility in relation to EBS snapshots and Amazon Data Lifecycle Manager

EBS snapshots are point-in-time backups of EBS volume. Uses: disaster recovery, data migration, volume resizing, and for creating consistent backups of production workloads. Incremental, so they only save the blocks on the volume that have changed after your most recent snapshot. EBS snapshots are stored redundantly in multiple AZ using S3.

Customer is responsible for scheduling and managing regular EBS snapshots as part of backup strategy. This includes monitoring costs, deleting unnecessary snapshots, encryption, verify snapshot integrity, and test restoration procedures regularly.

creation, retention, and deletion of EBS snapshots can be automated using Amazon Data Lifecycle Manager which can schedule snapshots during off-peak hours to minimize performance impact and automatically delete outdated backups to control storage costs.

S3

  • Describe Amazon S3, including its benefits and use cases.
  • Identify the security management features of Amazon S3.

Object storage is well-suited for large amounts of unstructured data (documents, images, and videos). S3 stores files as objects in containers known as buckets, and each object can range in size from a few bytes to several terabytes. An object is the fundamental unit of data storage. When you upload a file to S3, it becomes an object and is stored durably across multiple facilities within your chosen Region. Each object typically includes the data itself, metadata, and a unique identifier, or key. Each object is UID within a bucket by its key, which is essentially its file name.

An S3 bucket is a container for storing objects in Amazon S3. Buckets have a globally unique name across all of AWS, which helps to identify and organize your stored data. Buckets serve as the basic unit for access control and can hold a virtually unlimited number of objects. Buckets make possible to group related objects and apply policies at the bucket level.

When creating a bucket, you specify its name and the Region where it will reside. Buckets can be configured with various settings, including versioning, logging, and access permissions.

Benefits:

  • Unlimited storage
  • Object lifecycle management
  • Broad range of use cases

Everything you store in S3 is private by default. Permissions must be explicitly granted (such as making them public for internet availability). More granularity can be provided through bucket policies (which actions are enabled on the bucket and on bucket objects), identity-based policies (control what actions users, groups, or roles can perform on S3 resources using identity-based policies.), and encryption (at rest (storage) and in transit (comms)).

S3 Storage Classes and Lifecycle

  • Describe the different Amazon S3 storage classes and how they differ from each other.
  • Describe Amazon S3 Lifecycle policies and how they relate to pricing.

S3 offers various storage classes to suit a variety of workloads with specific performance, access, resiliency, and cost requirements. They're also designed to address data residency requirements, unpredictable access patterns, archival storage needs, and offer the most cost-effective options for different access patterns.

  • S3 Standard: General-purpose storage. Default.

  • S3 Intelligent-Tiering: This tier is useful for data with unknown or changing access patterns. S3 Intelligent-Tiering stores objects in three tiers: a frequent access tier, an infrequent access tier, and an archive instant access tier. S3 monitors access patterns of your data and automatically moves your data to the most cost-effective storage tier based on frequency of access.

  • S3 Standard Infrequent Access (S3SIA): For data that is accessed less frequently but requires rapid access when needed. Ideal to store long-term backups, disaster recovery files, and so on.

  • S3 One Zone IA: Stores data in a single AZ, reducing costs compared to S3 Standard-IA, which uses three zones. This storage class suits customers seeking affordable storage for infrequently accessed data without high availability needs. Good for secondary backups or easily re-creatable data.

  • S3 Express One Zone: Stores data in a single AZ. It was purpose-built to deliver consistent single-digit millisecond data access for your most frequently accessed data and latency-sensitive applications. Speed up to 10x faster and request costs up to 80% lower than S3 Standard.

  • S3 Glacier Instant Retrieval: For archiving data that is rarely accessed and requires millisecond retrieval. Data stored in this storage class offers a cost savings of up to 68 percent compared to the S3 Standard-IA storage class, with the same latency and throughput performance.

    • Glacier Flexible Retrieval: Offers low-cost storage for archived data that is accessed 1–2 times per year, 1–5 minutes for access using an expedited retrieval. You can also request bulk retrievals in up to 5–12 hours at no additional cost. Good for backup, disaster recovery, offsite data storage needs, and for when some data occasionally must be retrieved in minutes.
  • Glacier Deep Archive: Lowest-cost Amazon S3 storage class. It supports long-term retention and digital preservation for data that might be accessed once or twice per year. Default retrieval time of 12 hours. It is designed for customers that retain data sets for 7–10 years or longer, to meet regulatory compliance requirements.

  • S3 Outposts: Outposts delivers object storage to your on-premises AWS Outposts environment using S3 APIs and features, and serves workloads with local data residency requirements. It also helps maintain optimal performance when data must remain in close proximity to on-premises applications.

S3 Lifecycle

To avoid manually managing your object storage tier configurations, you can use S3 Lifecycle configurations to automate the process. When you define a lifecycle configuration for an object or group of objects, you can choose to automate between two types of actions, as follows:

  • Transition actions: define when objects should transition to another storage class.
  • Expiration actions: define when objects expire and should be permanently deleted

Amazon EFS

  • Describe Amazon EFS, including its benefits and use cases.
  • Describe Amazon EFS storage classes and how they can impact pricing.
  • Describe the EFS lifecycle policies and how they interact with storage classes.

EFS is a fully managed, scalable file storage service for use with AWS cloud services and on-premises resources. It operates using the Linux Network File System (NFS) protocol, and automatically scales to petabytes as you add or remove files without disrupting applications. EFS is designed to support a wide variety of workloads and can be accessed by multiple EC2 instances simultaneously.

  • EFS automatically replicates data across multiple AZs in a region.
  • EFS supports thousands of concurrent NFS connections
  • EFS automatically grows and shrinks as you add and remove files

Storage classes:

  • Standard: The EFS Standard and EFS Standard-Infrequent Access (Standard-IA) storage classes offer Multi-AZ resilience and the highest levels of durability and availability. They have a higher cost associated with them due to higher availability and durability.
  • One Zone: The EFS One Zone and EFS One Zone-Infrequent Access (EFS One Zone-IA) provide additional savings by saving your data in a single AZ. By using just one AZ, reduce storage costs when compared to the Standard EFS storage classes.
  • Archive Storage: The EFS Archive storage class is cost-optimized for data that is accessed only a few times a year or less and that does not need the sub-millisecond latencies of EFS Standard. EFS Archive offers a storage price up to 50% lower compared to EFS Infrequent Access, providing a more cost-optimized experience for cold, rarely-accessed data.

Lifecycle" You can create lifecycle policies that determine when and how files transition between different storage tiers. These automated policies help ensure your data resides in the most cost-effective storage class without manual intervention.

  • Transition to IA: This policy instructs lifecycle management when to move files into the Infrequent Access storage, which is cost-optimized for data that is accessed only a few times each quarter. By default, files that are not accessed in Standard storage for 30 days are transitioned into IA.
  • Transition to Archive: This policy instructs lifecycle management when to move files into the Archive storage class, which is cost-optimized for data that is accessed only a few times each year or less. By default, files that are not accessed in Standard storage for 90 days are transitioned into Archive.
  • Transition to Standard: This policy instructs lifecycle management whether to transition files out of IA or Archive and back into Standard storage when the files are accessed in the IA or Archive storage. By default, files are not moved back to Standard storage, and they remain in the IA or Archive storage class when they are accessed.

Amazon FSx

  • Describe Amazon FSx, including its benefits and use cases.
  • Describe the available file system options for Amazon FSx.

FX is a fully managed File System that supports multiple filesystem protocols, including Windows File Server, Lustre, OpenZFS, and NetAPP ONTAP. It handles hardware provisioning, patching, and backups.

AWS Storage GW

  • Describe AWS Storage Gateway, including its benefits and use cases.
  • Describe the three available gateway types in Storage Gateway.

Storage Gateway is a hybrid cloud storage service that makes it possible to seamlessly integrate on-premises environments with AWS Cloud storage. You can use it to extend your local storage to the cloud while maintaining low-latency access to frequently used data. SG can be used to streamline storage management and reduce costs for practical hybrid cloud storage use cases. These include moving backups to the cloud, using on-premises file shares backed by cloud storage, and providing low-latency access to data in AWS for on-premises applications.

  • Smooth connectivity between on-premises applications and AWS Cloud storage, preserving existing workflows and minimizing disruption.
  • centralized management of hybrid storage environments, enhancing accessibility, security, and compliance.
  • locally keeps frequently accessed data for quick access while managing less-used data in the cloud.
  • Reduces on-premises storage costs by using cloud storage for data archiving, backup, and disaster recovery purposes.

GW Types: Storage Gateway offers three distinct types of gateways to meet different hybrid storage needs:

  • S3 File Gateway: bridges your local environment with S3. It provides on-premises applications with access to virtually unlimited cloud storage through familiar file protocols. S3 File Gateway makes it possible to store and retrieve cloud objects using familiar file operations. When you deploy an S3 File Gateway, it appears to your local systems as a standard file server. Files written to this server are automatically uploaded to S3 while maintaining local access to recently used data through intelligent caching. This means your applications can continue working with files as they always have while the actual data is securely stored in the AWS Cloud.

  • Volume GW: With VGW, you create virtual storage volumes while maintaining local access to your data. It essentially functions as a bridge between your on-premises infrastructure and AWS Cloud storage by presenting your cloud data as iSCSI volumes that can be mounted by your existing applications. Volume Gateway operates in two main configurations:

    • Cached volume mode stores primary data in the cloud while frequently accessed data is cached locally for low-latency access.
    • Stored volume mode locally keeps your complete dataset while asynchronously backing it up to the cloud as EBS snapshots.
  • Tape GW: makes it possible to replace physical tape infrastructure with virtual tape capabilities while benefitting from the durability and scalability of AWS Cloud storage. Tape Gateway provides an interface that works with existing tape backup software, making the transition from physical tapes to cloud storage seamless. When you deploy a Tape Gateway, it presents itself to your backup applications as standard tape hardware. Your backup software writes data to these virtual tapes just as it would to physical tapes and stored in Amazon S3. You can also configure Tape Gateway to automatically transition less frequently accessed data to a more cost-effective storage class for long-term retention.

AWS Elastic Disaster Recovery

  • Describe Elastic Disaster Recovery, including its benefits and use cases.

Elastic Disaster Recovery replicates critical workloads to AWS with minimal downtime. Your servers' block-level data is continuously replicated to AWS, making it ideal for uses that require robust disaster recovery solutions. It supports both physical and virtual servers to enable rapid recovery during disruptions, which is particularly valuable for industries like healthcare where system availability is crucial.

You can use Elastic Disaster Recovery to reduce downtimes and data loss while eliminating the costs associated with maintaining secondary data centers. It also offers non-disruptive disaster recovery testing, meaning it's capable of quickly launching recovery instances when needed.

Benefits:

  • Business Resilience: Maintain business operations with continuous block-level data replication and the ability to recover workloads within minutes during disruptions.
  • Streamlined Disaster Recovery: Automate disaster recovery processes through an intuitive console, reducing complex manual configurations and minimizing the risk of human error.
  • Cost optimization: Eliminate expensive secondary data centers and pay only for what you use, with minimal upfront investment and no standby infrastructure costs.
Resource Description
EC2 Instance Store User Guide Temporary storage option that is directly attached to the host computer of an EC2 instance, providing high-performance but non-persistent storage.
Amazon Elastic Block Store (Amazon EBS) Scalable block storage service that provides persistent, high-performance volumes you can attach to your EC2 instances for data storage and applications.
Amazon Elastic Block Store (Amazon EBS) FAQ Frequently asked questions about Amazon EBS.
Amazon EBS Snapshots User Guide EBS Snapshots are point-in-time backups of your cloud storage volumes, making it possible to protect data and restore it when needed.
Amazon Data Lifecycle Manager User Guide A service that streamlines the creation, retention, and deletion of Amazon EBS snapshots.
Amazon Simple Storage Service (Amazon S3) A scalable cloud storage service that can store and retrieve any amount of data from anywhere on the web.
Amazon Simple Storage Service (Amazon S3) FAQ Frequently asked questions about Amazon S3.
Amazon S3 Storage Classes) Amazon S3 offers various storage classes, from high-performance frequent access to cost-effective archival options, tailored to different data retrieval needs and budget constraints.
Amazon S3 Versioning User Guide Amazon S3 versioning keeps multiple variants of objects, offering recovery from unintended deletions or modifications by preserving every update to your files.
Amazon S3 Buckets User Guide S3 buckets are cloud storage containers that securely hold various types of data, allowing convenient access and management through the AWS online infrastructure.
Amazon Elastic File System (Amazon EFS) A scalable, fully-managed file storage service that lets multiple AWS resources access shared data simultaneously without capacity planning.
Amazon Elastic File System (Amazon EFS) FAQ Frequently asked questions about Amazon EFS.
Amazon FSx(opens in a new tab) A fully managed file storage service that lets you launch and run file systems like Windows File Server, Lustre, NetApp ONTAP, and OpenZFS in the AWS cloud.
Amazon FSx for Windows File Server An Amazon FSx option providing reliable, high-performance file storage compatible with Windows applications in the AWS Cloud.
Amazon FSx for NetApp ONTAP An Amazon FSx option providing file storage with advanced data management capabilities and compatibility with both Windows and Linux workloads on AWS.
Amazon FSx for OpenZFS An Amazon FSx option that provides high-performance, scalable storage using the popular open-source ZFS file system.
Amazon FSx for Lustre An Amazon FSx option designed to accelerate workloads by providing fast data access for compute-intensive applications in AWS.
AWS Storage Gateway A hybrid cloud storage service that provides seamless and secure integration between on-premises environment and AWS cloud storage services.
Amazon S3 File Gateway A Storage Gateway configuration that provides local file access to S3 objects while caching frequently accessed data locally for faster retrieval.
Tape Gateway A Storage Gateway configuration used for backing up data to Amazon S3 while maintaining compatibility with existing tape-based backup applications.
Volume Gateway A Storage Gateway configuration that provides iSCSI block storage volumes to on-premises applications, offering both cached and stored modes.