Aquarium - adobe/aquarium-fish GitHub Wiki
Aquarium originated as an experimental project aimed at managing organizational resources across a fully distributed and heterogeneous system. Designed primarily to function as an internal cloud, Aquarium operates as a user-space application, dynamically provisioning resources without being bound to specific hardware.
Components
- Aquarium Bait - image builder
- Aquarium Fish - resource management daemon
- Aquarium Net Jenkins - Jenkins plugin
- Packer Plugin Aquarium - Packer plugin
How It Works
Aquarium centralizes and unifies resource management, alleviating the need for pipeline schedulers and other resource-intensive systems to handle these tasks independently.
img/Aquarium-how_aquarium_manages_resource.svg
In simple terms, you integrate a cluster of Aquarium Fish nodes using a plugin or a wrapper built on the Aquarium API or by using Gate Drivers. This system handles resource allocation requests from a scheduler, and deallocates them either on demand or automatically upon timeout.
A Bit of History
Initially, the primary need was for a reliable and stable CI/Release infrastructure. Many organizations struggle with efficient resource management, particularly when dealing with heterogeneous build/test environments such as MacOS and Windows, which demand high performance and minimal overhead. Based on extensive discussions, the following core requirements were identified:
- Secured (it's by default)
The architecture need to be designed as secured from ground-up (this item here is to clarify this important moment).
- Preserve build environments for over 10 years
We must ensure that legacy releases remain buildable, especially for long-term contracts (e.g., government obligations).
- Provide protected, immutable build/test environments
Resources are created from images and destroyed after use to prevent cross-contamination between workloads.
- Enable rule-based resource sharing
A separate resource cluster shares resources based on predefined limitations and policies.
- Simplify Jenkins maintenance
This model allows for seamless addition/removal of Jenkins instances, supporting concurrent operations and easy migration.
- Support local development and testing
Full automation and resource management allow for local system emulation and validation across use cases.
- Ensure a fully automated, human-free build process
For security and compliance, manual intervention in build environments is prohibited, supporting secure CI and Release automation.
- Facilitate debugging via environment snapshots
Pipeline failures can be debugged locally or remotely using saved environment snapshots.
- Simplify disaster recovery (business continuity)
Through configuration automation, multiple clusters and p2p routing provide resilience with minimal resource waste.
The first milestone was decoupling Jenkins agents from the controller, supported by stable build images - thus the Aquarium proof of concept (PoC) was born. Experience with cloud services and Kubernetes demonstrated promising methods to meet these challenges, prompting further experimentation.
Aquarium Bait
Aquarium Bait was the first component developed, aimed at reliably baking build and test images for the CI/Release environments, because pure configuration management approach proved overly complex, largely due to systems attempting to maintain end state without knowing their prior state - a flawed strategy by nature. Layered image builds addressed this by executing configuration scripts a single time on a known previous image state.
VMWare VMX (Fusion/Workstation) was chosen for the PoC for several reasons:
- Userspace operation - easy local testing for developers
- Cross-platform - supports Mac, Windows, and Linux (focus on Mac/Linux hosts to minimize automation effort)
- Organizational site license
- Performance - Fusion on MacOS incurred only a 2.1-4.3% performance penalty in real pipelines
- Compared to ESXi:
- No need for hardware-level support; leverages OS-level virtualization APIs
- Simpler licensing model, avoiding CPU-based constraints
- No OS-switch required; VMX supports both VM and native MacOS execution
- Higher portability between dev and prod environments
- ESXi’s features are overkill given MacOS VM license restrictions
- Easier migration to open-source VM/container solutions
- Performance is comparable across VMWare products (~5% variance)
Development focused initially on MacOS (and later Windows) due to automation challenges. VMX remains the most viable option for sandboxing, cross-platform support, and full automation - including shell and UI via VNC. Images built in VM can later be deployed to physical hardware.
The first cloud driver implemented was AWS, thanks to the platform's abstraction-friendly architecture. While layered images weren't linked in the cloud, AWS storage still allowed efficient use.
Aquarium Fish
Aquarium Fish was the next logical step, relying on pre-built images. A distributed system required internal data synchronization, and DQLite was initially selected to avoid implementing cluster consensus mechanisms. However, due to limitations in cross-platform support and its C-based nature, DQLite was replaced with Go-based SQLite with custom synchronization, and later, with a Bitcask key-value store to handle high parallelism.
Golang was selected for its dqlite bindings, lack of GIL, extensive package ecosystem, and ability to produce single binaries.
The database object model was tailored to distributed cluster needs. For instance, Jenkins "Labels" became versioned environment definitions, preserving specific image versions for long-term reproducibility. Workload execution decisions were made using cluster-wide voting.
The architecture adopted proven patterns (K-V DB, Protobuf, plugins), evolving into a lightweight user-space service. It also tackled security concerns by sandboxing execution environments, limiting access, and destroying resources post-use.
Aquarium Net Jenkins
Aquarium Net Jenkins was the final component, demonstrating how Jenkins could dynamically request agents from Aquarium Fish. For this to work, both Aquarium Fish and Bait images needed metadata processing (similar to cloud-init), which the plugin provides (agent name, secret, Jenkins URL, etc.).
Built on the Scripted Cloud Plugin, it simplifies functionality akin to the Kubernetes Plugin. It uses Jenkins’ cloud extension point to detect queued workloads and match them with available Aquarium Fish resources. A request is issued to the cluster, which allocates and provisions a node, connects it back to Jenkins, and tears it down post-execution.
Though relatively simple, the plugin encapsulates the key principles of interacting with the Aquarium API. It uses Protobuf-generated client code and can bootstrap from an init host while dynamically discovering other cluster nodes for redundancy.
Packer Plugin Aquarium
The plugin allows to use Aquarium as the builder and resource provider for Hashicorp Packer. It's recent addition and still in developement, but already can help to test the images building with Aquarium Bait using ProxySSH feature of Aquarium Fish. So now it should be much easier to organize automated images build process using nothing but already established Aquarium ecosystem.
Internal Processes
You can find the internal processes documentation here: Aquarium: Internal Processes