Driver: AWS - adobe/aquarium-fish GitHub Wiki
AWS is the first cloud driver you can use with Aquarium Fish - now in addition to your local resources you can borrow the resources from outside. It needs some additional preparation to integrate those instances to your infrastructure to work similarly as the local envs, but sometimes it worth it.
Like any other remote driver - this one doesn't consume your Fish node machine resources, but uses the cloud resources to spin-up the environments.
In order to spin-up mac instances you really need to have a dedicated host and it's pricey, because Apple decided that on AWS you would not be able to release the host for 24h straight...
But luckily we found a workaround that utilizes another "feature" of Mac machines on AWS - it's mandatory scrubbing process, which is running every time you stop or terminate an instance. It puts dedicated host in pending state for ~1h30m, but the nice thing that you don't have to pay for that.
So Fish just runs empty instance in case dedicated host is younger then 24h and don't do anything for parameterized delay (scrubbing_delay). It's not optimal solution (consumes 5-9 times more machines), but it allows you to pay none for the time dedicated host is just sitting there.
The rest of the dedicated hosts are running as usual and releasing when not in use anymore.
To use the driver you need:
- Create the image - you can use regular AMI's provided by AWS, Aquarium Bait will support in adobe/aquarium-bait#3
- Put the AWS driver configuration & credentials into the Fish config
- Run the Aquarium Fish node, create Label and send Application to receive the resource you want
Describes the driver options in the drivers
section in the aquarium-fish config file:
drivers:
- name: aws
cfg:
region: string # The AWS EC2 region to use
key_id: string # IAM role credential key id
secret_key: string # IAM role credential secret key
account_ids: []string # Trusted account IDs to filter vpc, subnet, sg, images, snapshots... Default is the same as creds account
instance_tags: map # Instance tags to use when this node provision them (to identify the node for example)
dedicated_pool: map # Managed dedicated pools are used to allocate dedicated hosts on demand and manage their life to save some money
<name>: map
type: string # Type of the dedicated hosts pool (example: "mac2.metal")
zone: string # Where to allocate the dedicated host (example: "us-west-2c")
max: uint # Maximum dedicated hosts to allocate
# Is a special optimization for the Mac dedicated hosts to send them in [scrubbing process] to
# save money when we can't release the host due to Apple's license of [24 hours] min limit.
#
# Details:
#
# Apple forces AWS and any of their customers to keep the Mac dedicated hosts allocated for at
# least [24 hours]. So after allocation you have no way to release the dedicated host even if
# you don't need it. This makes the mac hosts very pricey for any kind of dynamic allocation.
# In order to workaround this issue - Aquarium implements optimization to keep the Mac hosts
# busy with [scrubbing process], which is triggered after the instance stop or termination and
# puts Mac host in pending state for 1-2hr. That's the downside of optimization, because you
# not be able to use the machine until it will become available again.
#
# That's why this ScrubbingDelay config exists - we need to give Mac host some time to give
# the workload a chance to utilize the host. If it will not be utilized in this duration - the
# manager will start the scrubbing process. When the host become old enough - the manager will
# release it to clean up space for new fresh mac in the roster.
#
# * When this option is unset or 0 - no optimization is enabled.
# * When it's set - then it's a duration to stay idle and then allocate and terminate empty
# instance to trigger scrubbing.
#
# Current implementation is attached to state update, which could be API consuming, so this
# duration should be >= 1 min, otherwise API requests will be too often.
#
# [24 hours]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-mac-instances.html#mac-instance-considerations
# [scrubbing process]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mac-instance-stop.html
scrubbing_delay: duration # Format example: 10m30s
EC2 user needs not much permissions, but it's always a good idea to check the used requests in aws driver
sources or use AWS IAM Access Advisor to remove unused permissions:
ec2:AllocateHosts # Used in dedicated pool to create new dedicate hosts
ec2:CopyImage # Used by TaskImage to re-encrypt the temporary image
ec2:CreateImage # Used by TaskImage to create image of the instance
ec2:CreateSnapshots # To create snapshots of the disks - for caching
ec2:CreateTags # Tag the resources we own like instances, volumes & snapshots - very useful
ec2:DeleteSnapshot # Used by TaskImage to complete image delete by removing it's snapshots
ec2:DeregisterImage # Used by TaskImage to cleanup the tmp image after re-encrypting
ec2:DescribeHosts # Get info about the available dedicated hosts to use them during Allocation, also used in dedicated pool for management
ec2:DescribeImages # Get info about the available images and find their ID's
ec2:DescribeInstanceAttribute # Used by TaskImage to detect instance disks
ec2:DescribeInstanceTypes # Used to figure out the architecture of the host to find the right image for triggering mac scrubbing process
ec2:DescribeInstances # List the running instances
ec2:DescribeSecurityGroups # To locate the security group by name or ID
ec2:DescribeSnapshots # To list the snapshots and their tags and find the latest ID
ec2:DescribeSubnets # To find the subnet ID
ec2:DescribeVolumes # Locate volumes to connect
ec2:DescribeVpcs # To locate the vpc ID by tag
ec2:ReleaseHosts # Used in dedicated pool to release dedicated hosts
ec2:RunInstances # Run instance duh
ec2:StopInstances # To make a safe snapshot after the instance shutdown
ec2:TerminateInstances # Terminate instances duh
kms:ListAliases # Find the kms key ID by alias
servicequotas:ListServiceQuotas # Determine the limits for the project to identify the capacity
servicequotas:ListAWSDefaultServiceQuotas # Determine the limits for the project to identify the capacity
Also for triggering of the mac dedicated host scrubbing process it needs to have the default VPC, please check details in #71.
Describes the available options of the driver label definition:
definition:
driver: aws
options:
image: string # EC2 AMI ID/Name/Tag:Value of the image you want to use (Tag:Value is usually a bad idea for reproducibility)
instance_type: string # EC2 instance type, [AWS Instance Types](https://aws.amazon.com/ec2/instance-types/)
security_group: string # EC2 VPC Security group ID/Name (not a tag) to attach to the instance
tags: map # EC2 Tags to add during instance creation
encrypt_key: string # KMS Key ID or Alias in format "alias/<name>" for newly created disks
pool: string # Which dedicated pool (from configuration) to use to run the instance - otherwise will not use any specific pool
userdata_format: string # Empty if not needed or "json", "env", "ps1" to store the metadata in instance userdata field
userdata_prefix: string # Could be used with "env" or "ps1" format to add some prefix to each flattened key of the metadata
# TaskImage options
task_image_name: string # Use this name to new image with defined name + "-DATE.TIME" suffix
task_image_encrypt_key: string # KMS Key ID or Alias in format "alias/<name>" if need to re-encrypt the newly created AMI snapshots
resources:
cpu: uint # Amount of CPUs (threads), not used and defined in `instance_type`
ram: uint # Amount of memory (in GB), not used and defined in `instance_type`
network: string # Empty, VPC ID, Subnet ID or Tag:Value of vpc/subnet, if empty - will use default VPC, if VPC - will use the underused subnet of it
disks: map # Disks to create/use in the VM
<path>: # Path of the disk device, [AWS User Guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/device_naming.html)
type: string # Disk type and additional data in format "<type>[:iops[:throughput]]", "gp3" by default
label: string # Additional tags in format "<tag_key>:<tag_value>,..." - empty by default
size: uint # Size of the new disk (in GB), raw disk will be created
clone: string # Disk Snapshot ID/Tag:Value to use as a source disk
lifetime: duration # Lifetime of the Resource in "1h2m3s" format. If "" or "0" - then default will be used, if negative - no timeout.
NOTICE: You can use names or tags where it's possible only if the owner is the same as the project for security reasons.
AWS driver supports the next tasks that could be executed during the instance runtime:
Takes snapshot of the instance disks.
- options:
-
full:bool
- with full=true will also create a snapshot of the root (image) disk
-
- when:
-
ALLOCATED
- execute any time during ALLOCATED status, be careful to make sure you synced disks you want to snapshot, otherwise it's risky to get not completed data in the snapshot. -
DEALLOCATE
- executes after ALLOCATED changed to those statuses, but before the actual termination procedures. It will soft-stop the instance, so you can be sure the data on the disks will be consistent.
-
Creates new AMI from the instance.
- options:
-
full:bool
- with full=true will include the attached disks
-
- when:
-
ALLOCATED
- execute any time during ALLOCATED status, be careful to make sure you synced disks you want to create an image from, otherwise it's risky to get not completed data in the AMI. -
DEALLOCATE
- executes after ALLOCATED changed to those statuses, but before the actual termination procedures. It will soft-stop the instance, so you can be sure the data on the disks will be consistent.
-
- Create Label:
- Run application: