Aquarium: Internal Processes - adobe/aquarium-fish GitHub Wiki

The Aquarium cluster have a hidden gem - it's the internal processes which are executed on the background and hidden from eyes.

Overview of Application process

The main process happening on the Aquarium-Fish is Application process - when user requests for the Resource (new environment) to be allocated. So, it starts with Election, then Allocation happens and then, after the user request (or timeout) Deallocation happens.

Every Application (no matter where it come from - API or Gate driver) initially goes to Election Process to figure out which node will manage the incoming resource request. The Application for Resource is stored in the internal database to quickly share across the cluster and then Election Process is started. If any node in the cluster has available resources to execute the Application - it votes as available, otherwise as not available. Then each node checks if all the active nodes of the cluster voted and figures out who won and sets ELECTED state and executes Application if it’s the winner. Last resort in the vote is always a random number, so it’s quite rare occasion of more than one winner at a time, but if that happens or no Node voted available the Election Process starts again in 30 seconds on the next round. For just one node in the cluster, it quickly passes through with virtually no delay for the allocation process.

Allocation process of the Application happens mostly in the background and prepares everything necessary for the Allocation - for example checks again for the resources availability and picks the Provider Driver to execute. Then Fish calls driver to allocate the resource and if something went wrong during this process - the Application gets back to the NEW state and the process starts over to allow another Fish Node to try its luck. After a few unsuccessful retries Application went to ERROR state, or if it was successful then Application goes to ALLOCATED state.

Then User or Timeout could trigger deallocate at any time - so the Fish commands Provider Driver to deallocate the Resource.

Allocation/deallocation process

Let's check how Jenkins server requesting the Resource from cluster to execute workload on it and destroy afterwards:

img/Aquarium-jenkins_build_process.svg

  • xcode12.2 - is the created label that uses macos1015-xcode122-ci image from Aquarium Bait packer specification packer/macos1015/xcode122/ci.yml.
  • Application - request to the cluster to allocate a resource with specifications stored in required label. It also contains additional metadata that can be used by the Resource.
  • Driver - specified in Label and could implement any way to get the required Resource - for example can be a VM, Docker container, Cloud VM or even the hardware machine itself (to run the UI tests for example).

Worker election process

The election process is a part of the distributed Aquarium Fish system, which provides resilient method to choose the workload executor:

img/Aquarium_Fish-distributed_election_process.svg

Cluster sync process

Overall the cluster was designed to be as simple as possible - and thanks to database architecture it's way simpler then the analogues. The main complexity is hidden in p2p mechanisms, but overall it could be understood from one read.

img/Aquarium_Fish-how_cluster_sync_works.svg