Bazel remote caching disaster recovery plan - magma/magma GitHub Wiki
:warning: As of the 4th of January 2023 the bazel remote cache for magma is deprecated, see #14796.
:warning: Instructions on how to re-deploy the remote cache can be found in the magma/ci-infra repository.
Disaster recovery in the case of poisoned caches
- Option 1: Delete the entire cache (requires magma AWS access)
- Step 1: Update the service to have desired task count zero. Wait for running tasks to stop.
- Step 2: Manually empty the s3 bucket prefixed with "bazel-remote-cache-".
- Step 3: Update the service to have desired task count one.
- Option 2: Delete and redeploy the entire remote caching setup (requires magma AWS and ci-infra repo access)
- Step 1: Tear down the bazel-remote infrastructure by running
terraform destroy -target=aws_s3_bucket.S3CacheBucket
(theforce_destroy = true
options needs to have been enabled in the terraform code). - Step 2: Redeploy with
terraform init
andterraform apply
.
- Step 1: Tear down the bazel-remote infrastructure by running
- Option 3: Invalidate the cache keys (requires CI codeowner approval)
- With bazel-remote this can be done by changing the
--remote_cache
URL e.g. fromhttps://user:pw@url:9090/current-cache
tohttps://user:pw@url:9090/new-cache
. - This needs to be changed in all affected workflows and might require rebasing.
- With bazel-remote this can be done by changing the
Disaster recovery in the case of broken remote cache service
- Option 1: Delete and redeploy the entire remote caching setup (requires magma AWS and ci-infra repo access)
- Tear down the bazel-remote infrastructure by running
terraform destroy -target=aws_s3_bucket.S3CacheBucket
(theforce_destroy = true
options needs to have been enabled in the terraform code). Then redeploy withterraform init
andterraform apply
.
- Tear down the bazel-remote infrastructure by running
- Option 2: Re-implement the GH caches (requires CI codeowner approval).