S3 and Glacier - vedratna/aws-learning GitHub Wiki

  • Maximum Size: 5TB
  • Largest Object in a single PUT: 5GB
  • Recommended to use multipart upload if object is larger than 100MB
  • Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and an S3 bucket. Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
  • You might want to use Transfer Acceleration on a bucket for various reasons, including the following:
  • You have customers that upload to a centralized bucket from all over the world.
  • You transfer gigabytes to terabytes of data on a regular basis across continents.
  • You are unable to utilize all of your available bandwidth over the Internet when uploading to Amazon S3.
  • You can transfer data to and from the acceleration-enabled bucket by using one of the following s3-accelerate endpoint domain names:
  • s3-accelerate.amazonaws.com – to access an acceleration-enabled bucket.
  • s3-accelerate.dualstack.amazonaws.com – to access an acceleration-enabled bucket over IPv6. Amazon S3 dual-stack endpoints support requests to S3 buckets over IPv6 and IPv4.
  • You can point your Amazon S3 PUT object and GET object requests to the s3-accelerate endpoint domain name after you enable Transfer Acceleration. After Transfer Acceleration is enabled, it can take up to 20 minutes for you to realize the performance benefit. However, the accelerate endpoint will be available as soon as you enable Transfer Acceleration.
  • Server-side encryption is about protecting data at rest. Using server-side encryption with customer-provided encryption keys (SSE-C) allows you to set your own encryption keys. When you upload an object, Amazon S3 uses the encryption key you provide to apply AES-256 encryption to your data and removes the encryption key from memory. Amazon S3 will reject any requests made over HTTP when using SSE-C. For security reasons, it is recommended that you consider any key you send erroneously using HTTP to be compromised. You should discard the key and rotate as appropriate.
  • For Amazon S3 REST API calls, you have to include the following HTTP Request Headers:
  • x-amz-server-side-encryption-customer-algorithm
  • x-amz-server-side-encryption-customer-key
  • x-amz-server-side-encryption-customer-key-MD5
  • For presigned URLs, you should specify the algorithm using the x-amz-server-side-encryption-customer-algorithm request header.

  • The most effective choice here is to use the S3 sync feature that is available in AWS CLI. In this way, you can comfortably synchronize the data in your on-premises server and in AWS a week before the migration. And on Friday, just do another sync to complete the task. Remember that the sync feature of S3 only uploads the "delta" or in other words, the "difference" in the subset. Therefore, it will only take just a fraction of the time to complete the data synchronization compared to the other methods.

  • S3 storage classes/tiers: standard, intelligent tiering, standard IA, standard one oz IA, s3 Glacier, s3 Glacier Deep Archive

  • The S3 Intelligent-Tiering storage class is designed to optimize costs by automatically moving data to the most cost-effective access tier, without performance impact or operational overhead. It works by storing objects in two access tiers: one tier that is optimized for frequent access and another lower-cost tier that is optimized for infrequent access. For a small monthly monitoring and automation fee per object, Amazon S3 monitors access patterns of the objects in S3 Intelligent-Tiering, and moves the ones that have not been accessed for 30 consecutive days to the infrequent access tier. If an object in the infrequent access tier is accessed, it is automatically moved back to the frequent access tier. There are no retrieval fees when using the S3 Intelligent-Tiering storage class, and no additional tiering fees when objects are moved between access tiers. It is the ideal storage class for long-lived data with access patterns that are unknown or unpredictable.

  • S3 standard: replicated >= 3 AZ,

  • S3 Intelligent Tiering: replicated >= 3 AZ, Minimum storage 30 days, Monitoring and Automation fees. No other fees

  • S3 standard IA: replicated >=3 AZ, Minimum storage 30 days, Minimum Billable size 128 KB, Retrieval fees is higher than standard

  • S3 standard one AZ IA: Minimum storage 30 days, Minimum Billable size 128 KB, Retrieval fees is higher than standard

  • S3 Glacier : replicated >=3 AZ, Minimum storage 90 days, Minimum Billable size 40 KB. Retrieval fees is higher than s3 IA and take few mins to hours.

  • S3 Glacier deep Archive: replicated >=3 AZ, Minimum storage 180 days, Minimum Billable size 40 KB. Retrieval fees is higher than Glacier and take few hours.

  • AWS Elemental MediaStore is an AWS storage service optimized for media. It gives you the performance, consistency, and low latency required to deliver live streaming video content. AWS Elemental MediaStore acts as the origin store in your video workflow. Its high performance capabilities meet the needs of the most demanding media delivery workloads, combined with long-term, cost-effective storage. (underneath it uses s3)

  • S3 event can be sent/publish to SNS, SQS and Lambda

Glacier

  • There can be 1000 Vaults per region in Account
  • Archives in Vault are immutable
  • A Vault can have unlimited no. of Archives
  • An Archive can size upto 4TB
  • Retrieval from Glacier can be through Expedite, Standard or Bulk.
  • You can Retrieve specific data from Archive

Supported lifecycle transitions

  • Amazon S3 supports the following Lifecycle transitions between storage classes using an S3 Lifecycle configuration.
  • You can transition from the following:
  • The S3 Standard storage class to any other storage class.
  • Any storage class to the S3 Glacier or S3 Glacier Deep Archive storage classes.
  • The S3 Standard-IA storage class to the S3 Intelligent-Tiering or S3 One Zone-IA storage classes.
  • The S3 Intelligent-Tiering storage class to the S3 One Zone-IA storage class.
  • The S3 Glacier storage class to the S3 Glacier Deep Archive storage class.

Unsupported lifecycle transitions

  • Amazon S3 does not support any of the following Lifecycle transitions.
  • You can't transition from the following:
  • Any storage class to the S3 Standard storage class.
  • Any storage class to the Reduced Redundancy storage class.
  • The S3 Intelligent-Tiering storage class to the S3 Standard-IA storage class.
  • The S3 One Zone-IA storage class to the S3 Standard-IA or S3 Intelligent-Tiering storage classes.

Constraints

  • Lifecycle storage class transitions have the following constraints:
  • Object Size and Transitions from S3 Standard or S3 Standard-IA to S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA
  • When you transition objects from the S3 Standard or S3 Standard-IA storage classes to S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA, the following object size constraints apply:
  • Larger objects ‐ For the following transitions, there is a cost benefit to transitioning larger objects:
  • From the S3 Standard or S3 Standard-IA storage classes to S3 Intelligent-Tiering.
  • From the S3 Standard storage class to S3 Standard-IA or S3 One Zone-IA.
  • Objects smaller than 128 KB ‐ For the following transitions, Amazon S3 does not transition objects that are smaller than 128 KB because it's not cost effective:
  • From the S3 Standard or S3 Standard-IA storage classes to S3 Intelligent-Tiering.
  • From the S3 Standard storage class to S3 Standard-IA or S3 One Zone-IA.
  • Minimum Days for Transition from S3 Standard or S3 Standard-IA to S3 Standard-IA or S3 One Zone-IA
  • Before you transition objects from the S3 Standard or S3 Standard-IA storage classes to S3 Standard-IA or S3 One Zone-IA, you must store them at least 30 days in the S3 Standard storage class. For example, you cannot create a Lifecycle rule to transition objects to the S3 Standard-IA storage class one day after you create them. Amazon S3 doesn't transition objects within the first 30 days because newer objects are often accessed more frequently or deleted sooner than is suitable for S3 Standard-IA or S3 One Zone-IA storage. Similarly, if you are transitioning noncurrent objects (in versioned buckets), you can transition only objects that are at least 30 days noncurrent to S3 Standard-IA or S3 One Zone-IA storage.
  • Minimum 30-Day Storage Charge for S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA
  • The S3 Intelligent-Tiering, S3 Standard-IA, and S3 One Zone-IA storage classes have a minimum 30-day storage charge. Therefore, you can't specify a single Lifecycle rule for both an S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA transition and a S3 Glacier or S3 Glacier Deep Archive transition when the S3 Glacier or S3 Glacier Deep Archive transition occurs less than 30 days after the S3 Intelligent-Tiering, S3 Standard-IA, or S3 One Zone-IA transition. The same 30-day minimum applies when you specify a transition from S3 Standard-IA storage to S3 One Zone-IA or S3 Intelligent-Tiering storage. You can specify two rules to accomplish this, but you pay minimum storage charges.
  • Amazon WorkDocs is a fully managed, secure content creation, storage, and collaboration service. With Amazon WorkDocs, you can easily create, edit, and share content, and because it’s stored centrally on AWS, access it from anywhere on any device. Amazon WorkDocs makes it easy to collaborate with others, and lets you easily share content, provide rich feedback, and collaboratively edit documents. You can use Amazon WorkDocs to retire legacy file share infrastructure by moving file shares to the cloud. Amazon WorkDocs lets you integrate with your existing systems, and offers a rich API so that you can develop your own content-rich applications. Amazon WorkDocs is built on AWS, where your content is secured on the world's largest cloud infrastructure.
  • Amazon WorkDocs Content Manager is a high-level utility tool that uploads content or downloads it from an Amazon WorkDocs site. It can be used for both administrative and user applications. For user applications, a developer must construct the Amazon WorkDocs Content Manager with anonymous AWS credentials and an authentication token. For administrative applications, the Amazon WorkDocs client must be initialized with AWS Identity and Access Management (IAM) credentials. In addition, the authentication token must be omitted in subsequent API calls.