Deploying Notify - alphagov/notifications-manuals GitHub Wiki

Target audience

This is a relatively non-technical overview of how to use the new deploy pipeline, aimed primarily at developers who want to make deployments to production. It explains the overall structure of the new pipeline, but does not go into detail about how the individual pieces work.

What’s new

A single pipeline for the whole of Notify, instead of separate pipelines for each application. This has several benefits:

  • It’s simpler and more maintainable
  • It only needs a single lock per environment
  • It better captures the reality of how Notify is deployed "under the hood"
  • It removes the need for the special tricks that the old pipelines use to ensure that e.g.
    • the app pipelines won’t deploy a version of the infrastructure that hasn’t been deployed into the current environment yet
    • the app pipelines won’t deploy changes to other apps
  • It removes the risk that one pipeline is running the functional tests while another pipeline is applying Terraform, which undermined the value of the functional tests

Each environment now has its own separate Concourse team, with its own copy of the deploy-notify pipeline in it. The primary benefit of this is security: having a separate team for each environment allows us to limit the permissions that the Concourse workers have, and means we can move away from the model where the global Concourse workers have permission to do everything to every resource in every environment.

Continuous deployment by default - changes that are merged to main will automatically be released all the way to production unless the pipeline is instructed differently.

Deployment and testing as separate jobs - if the functional tests fail due to general flakiness, they can be retried in isolation instead of needing an entire new deployment.

Deployment bags

The new deploy pipeline is built around the concept of "deployment bags", which represent point-in-time snapshots of the current versions of all of Notify’s services.

The dev environments and preview each have their own pack-bag job. When this job is triggered, the current versions of all of its inputs are captured in a new version of the deployment bag. The bag is then deployed as a single unit.

The primary benefit of this is that it allows us to think of versions of Notify as a whole, and ensures that only combinations of services that have been tested together are able to be released to production.

The staging and production environments lack pack-bag jobs, and instead deploy the most recent version of the deployment bag that passed the previous environment (this will be detailed further below).

Screenshot 2025-01-30 at 13 54 34

The "meta-pipeline"

This is where the terminology gets slightly confusing. The below diagram shows a simplified overview of "the new deploy pipeline", which we sometimes choose to call the "meta-pipeline" to distinguish it from Concourse’s concept of "pipelines". The diagram shows the steps that a release goes through on its way to production.

The outer boxes in the diagram represent Concourse "teams", the boxes within those represent Concourse "pipelines", and the boxes within those are a simplified view of the Concourse "jobs" within each pipeline (in reality the "deploy" job in this diagram actually consists of several individual Concourse jobs, as described later).

d2(4)

Image building is handled by the pre-existing Concourse pipelines within the existing notify Concourse team, as it always has been.

When an image building job pushes a new image to ECR, this automatically triggers a run of pack-bag in the preview team. This captures/freezes the current versions of all of Notify’s services, so they can be deployed as a single unit.

When the new bag has been packed, this automatically triggers a run of deploy-notify in the preview team, which deploys that bag to the preview environment. If the deployment is successful and the tests pass, it tags that release of the deployment bag with the tag passed-preview-<timestamp>.

The staging and production environments lack pack-bag jobs, and instead are triggered when the deployment bag from the previous stage is tagged with the success tag. In this way, we can "chain" as many or as few environments together as we like.

The deploy-notify pipeline

deploy-notify is a Concourse pipeline that deploys the whole of Notify to a given environment.

Each environment has its own Concourse team (displayed in the sidebar), and each team has its own copy of the new deploy pipeline, named deploy-notify.

Screenshot 2025-01-30 at 15 32 55

The general structure of the new pipeline is as follows (though there are some environment-specific differences):

  • start-deploy acquires the lock for the current environment (among other things)
  • deploy performs the actual deployment, running Terraform and the db-setup script
  • test runs all of the appropriate test suites for the current environment
  • signal-deploy-completion releases the lock (among other things)

Testing

The test job runs all of the appropriate tests for the current environment. At the time of writing, this is:

  • Preview: Functional tests.
  • Staging: API client integration tests, smoke tests & provider tests.
    • This differs from the old world, in which the API client tests were handled by preview.
  • Production: Smoke tests & provider tests.

If the tests fail due to general flakiness, the test job may be retried via the normal Concourse mechanism, without requiring a new deployment.

If the tests fail due to a genuine issue with the new release, then the release will not be allowed to proceed to the next environment. In this case, you will need to roll back your changes, and release the deployment lock (both of these are discussed below).

The production pipeline additionally has a separate tab for the periodic smoke tests that run every 10 minutes.

Screenshot 2025-01-30 at 13 58 24

Locking

When a deployment to a given environment begins, the start-deploy job first acquires a lock, to prevent concurrent deployments to the same environment. Assuming the deployment is successful, the signal-deploy-completion job will then release the lock.

If a deployment is unsuccessful, either because the deployment itself failed, or because the tests failed, the lock will not be released. In this case, you will need to manually invoke the force-unlock-pipeline job in the operator tab to unlock it, before the next deployment may take place.

⚠️ Terraform itself also locks the state file during a deployment, so it is important not to interrupt or cancel the deploy job (though it is perfectly safe to interrupt the test job, if desired).

Runbook/common tasks

Pinning a specific version of an app

You can pin a specific version of an app, or mark a version of an app as "bad/broken" using the normal Concourse mechanisms, by selecting the appropriate input of the pack-bag job in the preview environment.

For example, to pin an older release of notifications-api, navigate to the deploy-notify pipeline in the preview team, select the pack-bag tab, select the app you want to pin/deselect, and use the normal Concourse ticks and pins. You can then manually trigger the pack-bag job to kick off a new deployment.

Screenshot 2025-01-30 at 13 56 24

Reverting a deployment

You can also pin or deselect a specific version of the entire deployment bag, using the normal Concourse mechanisms, by selecting the deployment-bag resource in staging or production.

For example, to revert a broken deployment in the production environment, navigate to the deploy-notify pipeline in the production team, select the deployment-bag resource, and de-select the broken release by clicking the check mark next to it. You can then manually trigger the start-deploy job to perform the rollback.

⚠️ The same caveats around reverting deployments that have always applied continue to apply: be very careful when rolling back changes that involve database migrations. The deploy pipeline currently provides no mechanism to roll back a migration.

Screenshot 2025-01-30 at 13 22 13

You can also freeze any environment at a specific version of the deployment bag by selecting the pin next to it, and manually triggering the start-deploy job.

Screenshot 2025-01-30 at 13 22 06
⚠️ **GitHub.com Fallback** ⚠️