03 | Other Storage Services - devian-al/AWS-Solutions-Architect-Prep GitHub Wiki

Snowball

Snowball Simplified

Snowball is a giant physical disk that is used for migrating high quantities of data into AWS. It is a peta-byte scale data transport solution. Using a large disk like Snowball helps to circumvent common large scale data transfer problems such as high network costs, long transfer times, and security concerns. Snowballs are extremely secure by design and once the data transfer is complete, the snowballs are wiped clean of your data.

Snowball Key Details

  • Snowball is a strong choice for a data transfer job if you need a secure and quick data transfer ranging in the terabytes to many petabytes into AWS.
  • Snowball can also be the right choice if you don’t want to make expensive upgrades to your existing network infrastructure, if you frequently experience large backlogs of data, if you're located in a physically isolated environment, or if you're in an area where high-speed internet connections are not available or cost-prohibitive.

    As a rule of thumb, if it takes more than one week to upload your data to AWS using the spare capacity of your existing internet connection, then you should consider using Snowball.

  • For example, if you have a 100 Mb connection that you can solely dedicate to transferring your data and you need to transfer 100 TB of data in total, it will take more than 100 days for the transfer to complete over that connection. You can make the same transfer in about a week by using multiple Snowballs.
  • Here is a reference for when Snowball should be considered based on the number of days it would take to make the same transfer over an internet connection

Screen Shot 2020-06-07 at 10 53 22 PM

Snowball Edge and Snowmobile

  • Snowball Edge is a specific type of Snowball that comes with both compute and storage capabilities via AWS Lambda and specific EC2 instance types. This means you can run code within your snowball while your data is en route to an Amazon data center. This enables support of local workloads in remote or offline locations and as a result, Snowball Edge does not need to be limited to a data transfer service. An interesting use case is with airliners. Planes sometimes fly with snowball edges onboard so they can store large amounts of flight data and compute necessary functions for the plane’s own systems. Snowball Edges can also be clustered locally for even better performance.
  • Snowmobile is an exabyte-scale data transfer solution. It is a data transport solution for 100 petabytes of data and is contained within a 45-foot shipping container hauled by a semi-truck. This massive transfer makes sense if you want to move your entire data center with years of data into the cloud.

Storage Gateway

Storage Gateway Simplified

Storage Gateway is a service that connects on-premise environments with cloud-based storage in order to seamlessly and securely integrate an on-prem application with a cloud storage backend.

Storage Gateway Key Details

  • The Storage Gateway service can either be a physical device or a VM image downloaded onto a host in an on-prem data center. It acts as a bridge to send or receive data from AWS.
  • Storage Gateway can sit on top of VMWare's ESXi hypervisor for Linux machines and Microsoft’s Hyper-V hypervisor for Windows machines.
  • The three types of Storage Gateways are below
    • File Gateway - Operates via NFS or SMB and is used to store files in S3 over a network filesystem mount point in the supplied virtual machine. Simply put, you can think of a File Gateway as a file system mount on S3.
    • Volume Gateway - Operates via iSCSI and is used to store copies of hard disk drives or virtual hard disk drives in S3.
      • These can be achieved via Stored Volumes or Cached Volumes. Simply put, you can think of Volume Gateway as a way of storing virtual hard disk drives in the cloud.
      • Applications interfacing with AWS over the Volume Gateway is done over the iSCSI block protocol.
      • Data written to these volumes can be asynchronously backed up into AWS Elastic Block Store (EBS) as point-in-time snapshots of the volumes’ content.
      • These kind of snapshots act as incremental backups that capture only changed state similar to a pull request in Git. Further, all snapshots are compressed to reduce storage costs.
    • Tape Gateway - Operates as a Virtual Tape Library
      • This mean all S3 features like versioning, lifecycle management, bucket policies, cross region replication, etc. can be applied as a part of Storage Gateway.
      • Tape Gateway offers a durable, cost-effective way of archiving and replicating data into S3 while getting rid of tapes (old-school data storage).
      • The Virtual Tape Library, or VTL, leverages existing tape-based backup infrastructure to store data on virtual tape cartridges that you create on the Tape Gateway.
      • It’s a great way to modernize and move backups into the cloud.
  • Relevant file information passing through Storage Gateway like file ownership, permissions, timestamps, etc. are stored as metadata for the objects that they belong to. Once these file details are stored in S3, they can be managed natively.

Stored Volumes vs. Cached Volumes

Stored Volumes

  • let you store data locally on-prem and backs the data up to AWS as a secondary data source.

  • Stored Volumes allow low-latency access to entire datasets, while providing high availability over a hybrid cloud solution.

  • Further, you can mount Stored Volumes on application infrastructure as iSCSI drives so when data is written to these volumes, the data is both written onto the on-prem hardware and asynchronously backed up as snapshots in AWS EBS or S3.

  • Cached Volumes

    • differ as they do not store the entire dataset locally like Stored Volumes.
      • Instead, AWS is used as the primary data source and the local hardware is used as a caching layer.
      • Only the most frequently used components are retained onto the on-prem infrastructure while the remaining data is served from AWS.
      • This minimizes the need to scale on-prem infrastructure while still maintaining low-latency access to the most referenced data.

AWS Transfer Family

  • AWS Transfer Family is a `secure transfer service for moving files into and out of AWS storage services, such as Amazon S3 and Amazon EFS.
  • With Transfer Family, you do not need to run or maintain any server infrastructure of your own.
  • You can provision a Transfer Family server with multiple protocols (SFTP, FTPS, FTP).

Benefits

  • Fully managed service and scales in real time.
  • You don’t need to modify your applications or run any file transfer protocol infrastructure.
  • Supports up to 3 Availability Zones and is backed by an auto scaling, redundant fleet for your connection and transfer requests.
  • Integration with S3 and EFS lets you capitalize on their features and functionalities as well.
  • Managed File Transfer Workflows (MFTW) is a fully managed, serverless File Transfer Workflow service to set up, run, automate, and monitor processing of files uploaded using Transfer Family.
  • Server endpoint types
    • Publicly accessible
    • Can be changed to a VPC hosted endpoint. Server must be stopped before making the change.
    • VPC hosted
    • Can be optionally set as Internet Facing. Take note that only SFTP and FTPS are supported for the VPC hosted endpoint.
  • Custom Hostnames
    • Your server host name is the hostname that your users enter in their clients when they connect to your server. You can use a custom domain for this. To redirect traffic from your registered custom domain to your server endpoint, you can use Amazon Route 53 or any DNS provider.

Amazon FSx for Windows

Amazon FSx for Windows File Server provides a fully managed native Microsoft File System.

Key Details

  • With FSx for Windows, you can easily move your Windows-based applications that require file storage in AWS.
  • It is built on Windows Server and exists solely for Microsoft-based applications so if you need SMB-based file storage then choose FSx.
  • FSx for Windows also permits connectivity between on-premise servers and AWS so those same on-premise servers can make use of Amazon FSx too.
  • You can use Microsoft Active Directory to authenticate into the file system.
  • Amazon FSx for Windows provides multiple levels of security and compliance to help ensure your data is protected. Amazon

    FSx automatically encrypts your data at-rest and in-transit.

  • You can access Amazon FSx for Windows from a variety of compute resources, not just EC2.
  • You can deploy your Amazon FSx for Windows in a single AZ or in a Multi-AZ configuration.
  • You can use SSD or HDD for the storage device depending on your requirements.
  • FSx for Windows support daily automated backups and admins in taking backups when needed as well.
  • FSx for Windows removes duplicated content and compresses common content

    By default, all data is encrypted at rest.

Amazon FSx for Lustre

Amazon FSx for Lustre makes it easy and cost effective to launch and run the open source Lustre file system for high-performance computing applications. With FSx for Lustre, you can launch and run a file system that can process massive data sets at up to hundreds of gigabytes per second of throughput, millions of IOPS, and sub-millisecond latencies.

Key Details

  • FSx for Lustre is compatible with the most popular Linux-based AMIs, including Amazon Linux, Amazon Linux 2, Red Hat Enterprise Linux (RHEL), CentOS, SUSE Linux and Ubuntu.
  • Since the Lustre file system is designed for high-performance computing workloads that typically run on compute clusters, choose EFS for normal Linux file system if your requirements don't match this use case.

    FSx Lustre has the ability to store and retrieve data directly on S3 on its own.