Bazel remote caching - magma/magma GitHub Wiki

:warning: As of the 4th of January 2023 the bazel remote cache for magma is deprecated, see #14796.

:warning: Instructions on how to re-deploy the remote cache can be found in the magma/ci-infra repository.

The remote cache feature in Bazel allows to use pre-built caches from remote locations to accelerate builds. Our current remote cache setup resides in AWS. By default this feature is currently only used in CI. This is because the needed bandwidth can not be assumed in local setups and because of AWS costs.

How to set up the remote cache for bazel

The following graph illustrates what arguments for the script bazel/scripts/remote_cache_bazelrc_setup.sh should be used to set up the remote cache for bazel. The first two arguments are always required.

The CACHE_KEY argument specifies what cache entries should be considered.
- Any string is valid.
- Filled caches are e.g. bazel-base-image, magma-dev-vm
The REMOTE_DOWNLOAD_OPTIMIZATION argument ('true'|'false') determines the remote cache download behaviour.
The BAZEL_REMOTE_PASSWORD can enable write access to the remote cache.

graph TD;
    A[Do you want to use the remote cache?]--Yes-->B;
    A--No-->C[Disk cache is the default.];
    B[Do you have the remote cache password?]--Yes-->D;
    B --No-->H;
    D --Yes--->G[bazel/scripts/remote_cache_bazelrc_setup.sh CACHE_KEY 'false' PASSWORD];
    D[Do you need to use sudo?]--No----> E[bazel/scripts/remote_cache_bazelrc_setup.sh CACHE_KEY 'true' PASSWORD];
    H--Yes-->M[bazel/scripts/remote_cache_bazelrc_setup.sh CACHE_KEY 'false'];
    H[Do you need to use sudo?]--No---> K[bazel/scripts/remote_cache_bazelrc_setup.sh CACHE_KEY 'true'];
    style G fill:#FDD,stroke:#555,stroke-width:4px
    style E fill:#FDD,stroke:#555,stroke-width:4px
    style M fill:#FDD,stroke:#555,stroke-width:4px
    style K fill:#FDD,stroke:#555,stroke-width:4px

Download optimization in CI

The following graph visualizes the remote download options that should be used depending on the commands that need to be run in CI:

graph TD;
   D --Yes-->G[No remote download optimization.];
    D[Is sudo required?]--No--> E;
    E[Is a 'bazel run' executed?] --Yes--> F[ '--remote_download_toplevel' should be used. ];
    E--No-->I['--remote_download_minimal' should be used.];

Solving breaking changes in remote caches

How can this happen?

If, for example, the version of a dev library that is installed externally (not via Bazel) is changed. Lets say folly-dev was downgraded from 0.58 to 0.57 in the Bazel base image.

The caches are used in CI in various workflows. On the Bazel base image for example in bazel.yml with the cache "bazel-base-image". The cache contains artifacts where folly-dev 0.58 is linked. This information is not included in the respective cache keys. This is, if the workflow runs on a Bazel base container using folly-dev 0.57 then linking the artifacts will fail.

How to solve this?

This needs to be done in three steps where each step is represented by a PR.

In all workflows where the affected caches are used change in BAZEL_BASE_IMAGE: "ghcr.io/magma/magma/bazel-base:latest" "latest" to the respective sha that is currently tagged as "latest", e.g., "sha-4a878d8". Find the sha values here. Why?: This makes sure that all PRs created from this moment on will use the unchanged and working Bazel base image.
Wait an appropriate number of days (all PRs created before 1. are rebased), e.g., three working days. Create a PR with the respective change, e.g, change the folly-dev version in the Bazel base image. Note: After the merge the new Bazel base image is created and is tagged as latest. All workflows will still use the old image tagged by the explicit sha.
Directly after 2. is merged and the new Bazel base image is successfully created, change the tag in all workflows back to "latest" and change the cache key, e.g., from "CACHE_KEY: bazel-base-image" to "CACHE_KEY: bazel-base-image-sha-c4de1e5". (Note "sha-c4de1e5" is the sha of the image created after 2. was merged.) Note: after the first master workflow runs using the new cache key, the respective cache is created from scratch. This is, these runs can take a while. Optionally this can be done before on a local setup if the password is available.

Naming of the cache key

We use the cache name followed by the sha of the oldest image that works with the caches. For example: "bazel-base-image-sha-c4de1e5". This references the cache that is used for the Bazel base image and makes clear that the image "sha-c4de1e5" is the first one that is compatible to this cache.

Service administration

The remote cache is deployed to AWS with Terraform. The Terraform code lives in the ci_infra repository of the Magma Github organization, in the folder bazel/remote_caching. The remote cache is deployed as ECS service, currently in us-east-1. You can use Cloudwatch Logs to see the server logs, and Cloudwatch Container Insights to take a look at resource consumption.