Driver: AWS - adobe/aquarium-fish GitHub Wiki

AWS driver

AWS is the first cloud driver you can use with Aquarium Fish - now in addition to your local resources you can borrow the resources from outside. It needs some additional preparation to integrate those instances to your infrastructure to work similarly as the local envs, but sometimes it worth it.

How it works?

How AWS driver works?

Like any other remote driver - this one doesn't consume your Fish node machine resources, but uses the cloud resources to spin-up the environments.

Dedicated hosts

In order to spin-up mac instances you really need to have a dedicated host and it's pricey, because Apple decided that on AWS you would not be able to release the host for 24h straight...

But luckily we found a workaround that utilizes another "feature" of Mac machines on AWS - it's mandatory scrubbing process, which is running every time you stop or terminate an instance. It puts dedicated host in pending state for ~1h30m, but the nice thing that you don't have to pay for that.

So Fish just runs empty instance in case dedicated host is younger then 24h and don't do anything for parameterized delay (scrubbing_delay). It's not optimal solution (consumes 5-9 times more machines), but it allows you to pay none for the time dedicated host is just sitting there.

The rest of the dedicated hosts are running as usual and releasing when not in use anymore.

Usage

To use the driver you need:

  • Create the image - you can use regular AMI's provided by AWS, Aquarium Bait will support in adobe/aquarium-bait#3
  • Put the AWS driver configuration & credentials into the Fish config
  • Run the Aquarium Fish node, create Label and send Application to receive the resource you want

Configuration

Describes the driver options in the drivers section in the aquarium-fish config file:

drivers:
  - name: aws
    cfg:
      region:     string  # The AWS EC2 region to use
      key_id:     string  # IAM role credential key id
      secret_key: string  # IAM role credential secret key

      account_ids:   []string  # Trusted account IDs to filter vpc, subnet, sg, images, snapshots... Default is the same as creds account
      instance_tags: map       # Instance tags to use when this node provision them (to identify the node for example)

      dedicated_pool: map  # Managed dedicated pools are used to allocate dedicated hosts on demand and manage their life to save some money
        <name>: map
          type: string  # Type of the dedicated hosts pool (example: "mac2.metal")
          zone: string  # Where to allocate the dedicated host (example: "us-west-2c")
          max:  uint    # Maximum dedicated hosts to allocate

          # Is a special optimization for the Mac dedicated hosts to send them in [scrubbing process] to
          # save money when we can't release the host due to Apple's license of [24 hours] min limit.
          #
          # Details:
          #
          # Apple forces AWS and any of their customers to keep the Mac dedicated hosts allocated for at
          # least [24 hours]. So after allocation you have no way to release the dedicated host even if
          # you don't need it. This makes the mac hosts very pricey for any kind of dynamic allocation.
          # In order to workaround this issue - Aquarium implements optimization to keep the Mac hosts
          # busy with [scrubbing process], which is triggered after the instance stop or termination and
          # puts Mac host in pending state for 1-2hr. That's the downside of optimization, because you
          # not be able to use the machine until it will become available again.
          #
          # That's why this ScrubbingDelay config exists - we need to give Mac host some time to give
          # the workload a chance to utilize the host. If it will not be utilized in this duration - the
          # manager will start the scrubbing process. When the host become old enough - the manager will
          # release it to clean up space for new fresh mac in the roster.
          #
          # * When this option is unset or 0 - no optimization is enabled.
          # * When it's set - then it's a duration to stay idle and then allocate and terminate empty
          # instance to trigger scrubbing.
          #
          # Current implementation is attached to state update, which could be API consuming, so this
          # duration should be >= 1 min, otherwise API requests will be too often.
          #
          # [24 hours]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-mac-instances.html#mac-instance-considerations
          # [scrubbing process]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mac-instance-stop.html
          scrubbing_delay: duration  # Format example: 10m30s
          

Account configuration:

EC2 user needs not much permissions, but it's always a good idea to check the used requests in aws driver sources or use AWS IAM Access Advisor to remove unused permissions:

ec2:AllocateHosts           # Used in dedicated pool to create new dedicate hosts
ec2:CreateSnapshots         # To create snapshots of the disks - for caching
ec2:CreateTags              # Tag the resources we own like instances, volumes & snapshots - very useful
ec2:DescribeHosts           # Get info about the available dedicated hosts to use them during Allocation, also used in dedicated pool for management
ec2:DescribeImages          # Get info about the available images and find their ID's
ec2:DescribeInstances       # List the running instances
ec2:DescribeInstanceTypes   # Used to figure out the architecture of the host to find the right image for triggering mac scrubbing process
ec2:DescribeSecurityGroups  # To locate the security group by name or ID
ec2:DescribeSnapshots       # To list the snapshots and their tags and find the latest ID
ec2:DescribeSubnets         # To find the subnet ID
ec2:DescribeVolumes         # Locate volumes to connect
ec2:DescribeVpcs            # To locate the vpc ID by tag
ec2:ReleaseHosts            # Used in dedicated pool to release dedicated hosts
ec2:RunInstances            # Run instance duh
ec2:StopInstances           # To make a safe snapshot after the instance shutdown
ec2:TerminateInstances      # Terminate instances duh
kms:ListAliases             # Find the kms key ID by alias
servicequotas:ListServiceQuotas # Determine the limits for the project to identify the capacity
servicequotas:ListAWSDefaultServiceQuotas # Determine the limits for the project to identify the capacity

Label definition

Describes the available options of the driver label definition:

definition:
  driver: aws

  options:
    image:          string  # EC2 AMI ID/Name (not a tag) you want to use as image for the instance
    instance_type:  string  # EC2 instance type, [AWS Instance Types](https://aws.amazon.com/ec2/instance-types/)
    security_group: string  # EC2 VPC Security group ID/Name (not a tag) to attach to the instance
    tags:           map     # EC2 Tags to add during instance creation
    encrypt_key:    string  # KMS Key ID or Alias in format "alias/<name>" for newly created disks

    userdata_format: string  # Empty if not needed or "json", "env", "ps1" to store the metadata in instance userdata field
    userdata_prefix: string  # Could be used with "env" or "ps1" format to add some prefix to each flattened key of the metadata

    pool: string  # Which dedicated pool (from configuration) to use to run this instance - otherwise will not use any specific pool

  resources:
    cpu:     uint    # Amount of CPUs (threads), not used and defined in `instance_type`
    ram:     uint    # Amount of memory (in GB), not used and defined in `instance_type`
    network: string  # Empty, VPC ID, Subnet ID or Tag:Value of vpc/subnet, if empty - will use default VPC, if VPC - will use the underused subnet of it

    disks:   map     # Disks to create/use in the VM
      <path>:        # Path of the disk device, [AWS User Guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html)
        type:  string  # Disk type and additional data in format "<type>[:iops[:throughput]]", "gp3" by default
        label: string  # Additional tags in format "<tag_key>:<tag_value>,..." - empty by default
        size:  uint    # Size of the new disk (in GB), raw disk will be created
        clone: string  # Disk Snapshot ID/Tag:Value to use as a source disk

    lifetime: duration  # Lifetime of the Resource in "1h2m3s" format. If "" or "0" - then default will be used, if negative - no timeout.

NOTICE: You can use names or tags where it's possible only if the owner is the same as the project for security reasons.

Available ApplicationTask's

AWS driver supports the next tasks that could be executed during the instance runtime:

Task: snapshot

Takes snapshot of the instance disks.

  • options:
    • full:bool - with full=true will also create a snapshot of the root (image) disk
  • when:
    • ALLOCATED - execute any time during ALLOCATED status, be careful to make sure you synced disks you want to snapshot, otherwise it's risky to get not completed data in the snapshot.
    • DEALLOCATE, RECALLED - executes after ALLOCATED changed to those statuses, but before the actual termination procedures. It will soft-stop the instance, so you can be sure the data on the disks will be consistent.

Examples:

⚠️ **GitHub.com Fallback** ⚠️