AWS ‐ Storage Services ‐ S3 | S3 Glacier | Elastic Block Store (EBS) | Elastic File System (EFS) - FullstackCodingGuy/Developer-Fundamentals GitHub Wiki
read
1 GB to store in US-East-1: (Updated at 2016.dec.20)
-
Glacier: .004/Month (Note: Major price cut in 2016)
-
S3: .023/Month
-
S3-IA (announced in 2015.09): .0125/Month (+.01/gig retrieval charge)
-
EBS: .045-0.1/Month (depends on speed - SSD or not) + IOPS costs
-
EFS: .3/Month
-
Further storage options, which may be used for temporary storing data while/before processing it:
-
SNS
-
SQS
-
Kinesis stream
-
DynamoDB, SimpleDB
-
The costs above are just samples. There can be differences by region, and it can change at any point. Also there are extra costs for data transfer (out to the internet). However they show a ratio between the prices of the services.
There are a lot more differences between these services:
- Generally Available (out of preview), but may not yet be available in your region
- Network filesystem (that means it may have bigger latency but it can be shared across several instances; even between regions)
- It is expensive compared to EBS (~10x more) but it gives extra features.
- It's a highly available service.
- It's a managed service
- You can attach the EFS storage to an EC2 Instance
- Can be accessed by multiple EC2 instances simultaneously
- Since 2016.dec.20 it's possible to attach your EFS storage directly to on-premise servers via Direct Connect. ()
- A block storage (so you need to format it). This means you are able to choose which type of file system you want.
- As it's a block storage, you can use Raid 1 (or 0 or 10) with multiple block storages
- It is really fast
- It is relatively cheap
- With the new announcements from Amazon, you can store up to 16TB data per storage on SSD-s.
- You can snapshot an EBS (while it's still running) for backup reasons
- But it only exists in a particular region. Although you can migrate it to another region, you cannot just access it across regions (only if you share it via the EC2; but that means you have a file server)
- You need an EC2 instance to attach it to
- New feature (2017.Feb.15): You can now increase volume size, adjust performance, or change the volume type while the volume is in use. You can continue to use your application while the change takes effect.
- An object store (not a file system).
- You can store files and "folders" but can't have locks, permissions etc like you would with a traditional file system
- This means, by default you can't just mount S3 and use it as your webserver
- But it's perfect for storing your images and videos for your website
- Great for short term archiving (e.g. a few weeks). It's good for long term archiving too, but Glacier is more cost efficient.
- Great for storing logs
- You can access the data from every region (extra costs may apply)
- Highly Available, Redundant. Basically data loss is not possible (99.999999999% durability, 99.9 uptime SLA)
- Much cheaper than EBS.
- You can serve the content directly to the internet, you can even have a full (static) website working direct from S3, without an EC2 instance
- Long term archive storage
- Extremely cheap to store
- Potentially very expensive to retrieve
- Takes up to 4 hours to "read back" your data (so only store items you know you won't need to retrieve for a long time)
- As it got mentioned in JDL's comment, there are several interesting aspects in terms of pricing. For example Glacier, S3, EFS allocates the storage for you based on your usage, while at EBS you need to predefine the allocated storage. Which means, you need to over estimate. ( However it's easy to add more storage to your EBS volumes, it requires some engineering, which means you always "overpay" your EBS storage, which makes it even more expensive.)
Source: AWS Storage Update – New Lower Cost S3 Storage Option & Glacier Price Reduction
- Object storage service
- Storage, Protection and Management
- Terminology:
- Bucket - a container, an isolated storage for objects, contains objects, like hard drive where file objects are stored, Limitation: Max 100 Buckets per AWS account
- Object - are fundamental s3 entities
User can create middleware to process the data on s3
- S3 Object Lambda
- Event Notification implementation on s3 objects
Tools to analyse the data and provide insights
- Amazon S3 Storage Lens - Understand, Analyse and Optimize the storage, Gives Dashboard
- Storage class analysis - to analyse the usage patterns
- S3 Inventory with reports - audit reports, replication and encryption status
- It provides Read-after-write consistency for put and delete requests
- It is strongly consistent read operations
- Client Side - your app encrypts data before sending to s3
- Server Side (Data encryption at rest) - s3 responsible to encrypt while storing
- Data Encryption in transit - via TLS protocol when transferred from database
- Encryption is performed using "Customer Master Key" (CMK) - this key can be stored either inside the application or in a storage so that it can be accessible by the application
- CMS - is a customer provided key for encryption
- S3 is going to manage the enc/dec along with the keys associated
- 3 types of encryption keys
- S3-managed keys(SSE-S3) - encryption happens with a root key which is rotated regularly
- SSE Key Management Service - Server-side encryption with customer master keys (SSE-KMS) - encryption with customer provided key
- SSE-c encryption with customer provided key
- S3 bucket is by default private to you (owner)
- Resource based policy - manage access for resources
- User based policy - manage access for users
It helps control
- Who can access
- What resources can be accessed
- Types of actions allowed to perform
- Amazon CloudWatch Alarms - monitoring metrics, trigger alarms for autoscaling policies
- AWS CloudTrail logs - records all the entries to be used for auditing
- Amazon s3 access logs - to audit logs of individual buckets
- AWS Trusted Advisor - uses all above services and make recommendations to improve the system, cost, performance with security best practices
using
-
REST Service
-
REST API or AWS SDK (preferred option to invoke)
-
Requests can be performed using authenticated or anonymous way
-
User needs to be authenticated and authorized to access the s3
-
S3 supports both IPv4 and IPv6 protocols
-
It would require IAM policy changes if you want to upgrade from v4 to v6
-
Dual-stack endpoint - is special request type that supports both v4 & v6 protocols
- It is service to archive and long time backup of your data
- It is safe, secure, durable service
- Primary use case for glacier is, long term storage
- Offload administrative overhead such as managing infra, system maintenance and updates to amazon to worry about
- It exposes RESTful web service to interact
- It uses JSON format for persist the data
- Vault - is a partition in s3 glacier, similar to s3 bucket
- Archive - it is like object, it goes into vault,
- Jobs - to perform upload/download of archive, as the operation takes hours to complete
- Notifications - signals triggered when jobs are completed
- Limitations
- User can create Max 1000 vaults per aws region
- Vault name must be unique in the region
- Limitation: single operation max 4GB of uploading archive, if bigger file, use multi-load upload operation, which will upload objects upto 14TBs
- Not available - Cannot download archive from aws mgmt console, user should use aws cli/rest api/amazon sdk
-
It is block storage
-
Creates block-level volumes for EC2 instances and mounted as devices
-
Once mounted, it will look like any other storage drive in ec2 instances
-
it is independency from the ec2 instance, it lives independently irrespective ec2 instance
-
EBS Elastic Volume - dynamically increase volume but not decrease dynamically
- Change IOPS capacity, volume type
-
Volume types
- SSD
- HDD
- Prev.Generation (unoptimized)
-
Limitations
- Storage Capacity - 64TBs max
- Service may not get access to all 64tb capacity
- Partitioning limits capacity upto 64 zeta bytes
- Data block size
-
Snapshots - a backup of the data, incremental in nature
-
Building EBS and NVMe on Linux
- Nitro - is both hardware and software to build, it would be driver that handles the storage
- Is similar to EBS - server-less solution design to store data
- scale on demand, it has web interface to use
- aws managed infrastructure
- limitless storage capacity
- supports NFS 4.1, 4.0 file system types
- Multiple compute instances - can be used with ec2, lambda functions
- Pay per use
- It has many options
- It has multiple availability zones for high availability
- It can be mounted to ec2 or other compute instances
- It can have the lifecycle management - ex: when a data is reached a certain age, it automatically can be deleted
- Create file system
- Create mount targets
- Create security groups - to enable how the ec2 instance and efs file system work together