cloud migration hybrid - kurt2439/wiki GitHub Wiki

Cloud Migrations and Hybrid Solutions

VMWare

  • AWS Management Portal for vCenter enabled you to manage your AWS resources using VMware vCenter

    • Installs as a vcenter plugin within you rexisting vcenter environment
    • Enables you to migrate vmware VMs to amazon EC2 and manage AWS resource from within vcenter
    • Select OS, Region, Environment, Subnet, Instance Type, Private IP (optional), Security Group
  • Uses Cases:

    • Migration to EC2
    • Reach new geographies from vcenter
    • Self service AWS portal within vcenter
    • leverage vcenter experience while getting started with AWS

Migrating Data to Cloud with Storage Gateway

  • gateway-cached volume: iscsi based block storage

  • gateway-stored volume: isci based block storage

  • gateway virtual tape library: iscsi based virtual tape solution

  • Snapshots provide a point in time data that has been writen to your AWS storage gateway volumes.

    • BUT only capture data that has been written to your storage volume which can exclude data that has been buffered by either the client or the OS
    • Best way to guarantee that your OS has flushed data is to shut down the machine, then snapshot, and bring back online.
    • Go into S3 and turn that snapshot data into an EBS volume and mount to an EC2 instance

Data Pipeline

  • pre-lambda service -- lambda replaces this in today's world?

  • web service that help you process and move data between different AWS compute and storage services as well as on-premise data sources at specified intervals

  • Pipeline : name of the container that contains the data nodes, activities, pre-conditions and schedules required in order to move your data from one location to another.

    • Can run on EC2 instance or an EMR instance
    • Will provision and terminate these instance resource for you automatically
  • Can be used on premise

    • Supplies a task runner package that can be installed on premise. Continiously polls the AWS Data Pipeline service for work to perform. When it's time to run a particular activity on your on-premise resources, for example, executing a db stored procedure or a datbase dump, AWS data pipeline will issue the appropriate command to the task runner.
  • Datanode: the end destination for your data. can reference a specific amazon S3 path. AWS data pipeline supports an expression language that makes it easy to reference data which is generated on a regular basis. Could specify the log file name format.

  • Activity: an action that AWS data pipeline initiates on your behalf as part of a pipeline (EMR or Hive jobs, copies, SQL queries or CLI scripts)

    • You can specify your own custom activities also using "ShellCommandActivity"
  • Precondition: readiness check that can be optionally associated with a data source or activity.

    • If a datasource has a pre-condition then that check must complete successfully before any activities consuming the data source are launched.
    • If an activity has a precondition then the precondition check must complete successfully before the activity is run. This can be useful if you are running an activity that is expensive to compute and should not run until specific criteria are met.
    • Example Preconditions:
    • DynamoDBDataExists - does data exist?
    • DynamoDBTableExists - does table exist?
    • S3KeyExists - does S3 path exists?
    • S3PrefixExists - does a file exist within that S3 path?
    • ShellCommandPrecondition - custom preconditions
  • Schedule: define when your pipeline activities run and the frequency with which the service expects your data to be available

Exam Tips

  • AWS Data Pipeline Web service that helps you move data between different AWS compute and storage services
  • Can be integrated on premise
  • Can be scheduled
  • Will provision and terminate resources as and when required.
  • Has been replaced largely by lambda
  • Pipeline consists of:
    • Datanode
    • Activity
    • Precondition
    • Schedule

Migrations and Networking

  • CIDR Reservations

    • Allowed block size is between /16 and /28
    • If you create more than one subnet in a VPC the CIDR blocks cannot overlap...
    • The first 4 IP addresses and the lsat IP address in each subnet CIDR block are not available for you to use and cannot be assigned to an instance!
    • 10.0.0.0: network adress
    • 10.0.0.1: reserved by AWS for the VPC router
    • 10.0.0.2: reserved by AWS for mapping to the amazon-provided DNS
    • 10.0.0.3: reserved by AWS for future use
    • 10.0.0.255: network broadcast address. Do not support broadcast in a VPC so the address is reserved (no broadcast??)
  • VPN to Direct Connect Migrations

    • Add Direct Connect alongside Site to Site VPN
    • Add Direct Connect and VPN to the same BGP community
    • Then configure BGP so that your VPN connection has a higher cost than the direct connect connection

Exam Tips

  • CIDR blocks between /16 and /28
  • AWS reserves 5 ip addresses per CIDR block
  • You can migrate VPN to Direct Connect by using BGP when VPN has higher BGP cost than the direct connect connection.