Account wide terraform - alphagov/notifications-manuals GitHub Wiki
This document is intended for engineers who want to learn the basics of what Account-wide Terraform (AWT) is, how to use it to make infra changes and how to use it to debug issues. It assumes you know Terraform and GDS CLI basics.
AWT acts as a bootstrap for deploying the initial infra required for other foundational Notify infra to be deployed. It does not deploy the Notify application and the infra it requires, which is deployed from notifications-aws.
- Improved security - IAM config for Concourse workers is stored in this repo. If this was stored in
notifications-aws
a Concourse worker may be able to access this config -
notifications-aws
uses roles some of the infra we have declared in this repo so must be deployed first. - Application of environment-specific config - Infra changes may need to be applied to a specific env (e.g. for testing new IAM permissions) and they can be tested by specifying an environment to apply them from in this repo. There are 2 types of envs you can apply AWT changes to:
-
notify-env -
dev[a-f]
,preview
,staging
,production
-
notify-deploy-env
notify-deploy
,notify-deploy-staging
(AWS accounts hosting concourse)
-
notify-env -
Terraform commands are run via make
commands in the root of the repo.
make <env name> <init | plan | apply | destroy>
In rare cases you may need to bootstrap an environment.
make <env name> bootstrap
NOTE: Changes to the
production
environment must have a cyberthumb before being applied.
When applying any changes to the Terraform. There is no pipeline for deploying AWT changes, so they must be done locally.
For example, if you wanted to move away from using parameter store (SSM) for storing secrets and move to Secrets Manager instead, you would need to change what permissions the manipulate_dev_secrets
role has access to to preserve the behaviour of that role being able to access secrets.
This role currently has these permissions:
# notify-deploy-env/roles.tf
data "aws_iam_policy_document" "manipulate_dev_secrets" {
# Other statemment omitted for brevity
statement {
effect = "Allow"
actions = ["ssm:DescribeParameters"]
resources = ["*"]
We currently cannot list secrets in secrets manager with the devsecrets
role.
gds aws notify-deploy-staging-devsecrets -s
wesley.hindle@GDS13716 notifications-aws-account-wide-terraform % aws secretsmanager list-secrets
An error occurred (AccessDeniedException) when calling the ListSecrets operation: User: arn:aws:sts::390844751771:assumed-role/wesley.hindle-devsecrets/1753170992424991000 is not authorized to perform: secretsmanager:ListSecrets because no identity-based policy allows the secretsmanager:ListSecrets action
To test whether this role could access data in secrets manager we must give it access to secrets manager.
# notify-deploy-env/roles.tf
data "aws_iam_policy_document" "manipulate_dev_secrets" {
# Other statemment omitted for brevity
statement {
effect = "Allow"
actions = [
"ssm:DescribeParameters",
"secretsmanager:ListSecrets"
]
resources = ["*"]
If we run a terraform plan
to see the changes. In our case:
gds aws notify-deploy-staging-admin -- make notify-deploy-staging plan
Terraform will perform the following actions:
# aws_iam_policy.manipulate_dev_secrets will be updated in-place
~ resource "aws_iam_policy" "manipulate_dev_secrets" {
id = "arn:aws:iam::390844751771:policy/ManipulateDevSecrets"
name = "ManipulateDevSecrets"
~ policy = jsonencode(
# Omitted for brevity
~ Action = "ssm:DescribeParameters" -> [
+ "ssm:DescribeParameters",
+ "secretsmanager:ListSecrets",
]
Plan: 0 to add, 1 to change, 0 to destroy.
And then apply it:
gds aws notify-deploy-staging-admin -- make notify-deploy-staging apply
And try to list the secret again, we can now see it.
wesley.hindle@GDS13716 notifications-aws-account-wide-terraform % gds aws notify-deploy-staging-devsecrets -s
wesley.hindle@GDS13716 notifications-aws-account-wide-terraform % aws secretsmanager list-secrets
{
"SecretList": [
"Name": "ecr-pullthroughcache/test123/ecr/dockerhub_credentials",
# Omitted
}
Occasionally manual changes happen in the AWS console which can result in new errors. A quick way to check if this is the case is to run a plan
and see what changes are shown in the format
gds aws notify-<env name>-admin -- make <env-name> init | plan | apply | delete
Terraform will perform the following actions:
# module.readonly_users["wesley.hindle"].aws_iam_role.gds_user_role will be updated in-place
~ resource "aws_iam_role" "gds_user_role" {
id = "wesley.hindle-readonly"
name = "wesley.hindle-readonly"
~ tags = {
- "Manually-Added-Tag" = "Says Hello" -> null
}
~ tags_all = {
- "Manually-Added-Tag" = "Says Hello" -> null
# (2 unchanged elements hidden)
}
# (8 unchanged attributes hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
This example applies to notify-env
repo.
I want to look through some Cloudwatch logs, but I'm now unable to, whereas previously I was.

We know that there's some sort of IAM permission, but rather than messing about comparing what's on main
and your branch we can instead run a terraform plan
to see what the changes are.
# Note apply runs a plan first before applying any changes
gds aws notify-dev-a-admin -- make dev-a apply
Terraform will perform the following actions:
# module.readonly_users["wesley.hindle"].aws_iam_role_policy_attachment.gds_user_role_policy_attachments[0] will be created
+ resource "aws_iam_role_policy_attachment" "gds_user_role_policy_attachments" {
+ id = (known after apply)
+ policy_arn = "arn:aws:iam::aws:policy/ReadOnlyAccess"
+ role = "wesley.hindle-readonly"
}
Plan: 1 to add, 0 to change, 0 to destroy.
So we know there is config drift, but some inspection is required before blindly applying the change to see if it will actually fix our problem.
arn:aws:iam::aws:policy/ReadOnlyAccess
is an AWS-managed policy, so we can easily search online for what permissions this role grants.
"Version" : "2012-10-17",
"Statement" : [
{
"Sid" : "ReadOnlyActionsGroup1",
"Effect" : "Allow",
"Action" : [
"logs:Describe*",
# Other actions omitted for brevity
We now have confidence that re adding this policy to the role will grant the logs:DescribeLogGroups
action required to be able to see log groups. After applying the change we can:

NOTE: This section refers to how changes have been rolled out in the past and is not best practice.
Once your changes have been merged in via the PR process they will then need applying to each environment locally as there is no pipeline to apply changes to this repo. This process will take a few days to complete.
Instructions on how to apply changes can be found in the How to apply AWT changes
section above.
Changes should be first be applied to all unoccupied dev
environments. You should then communicate on the #govuk-notify-infrastructure-team channel that those use dev envs will need to apply these changes to their environments themselves, or that you will do it for them if they wish. You must communicate that this will overwrite any manual changes to AWT's infra they have made and that it may alter the behaviour of the feature they're working on in their environment.
Once the changes have been applied to all dev envs you can then roll the changes out to the staging
environment. After running an apply locally you can then manually trigger a deploy on the staging
environment, which will run the automated tests and flag any issues the new changes have introduced.
If these tests fail you do not need to worry about pinning the production
pack bag, as the changes have only been applied to staging
at this point.
If successful, you should wait a few days until at least 1 other deploy to production
has successfully rolled out. At which point it can be assumed that the changes have not broken anything.
After running an apply locally you can then manually trigger a deploy on the production
environment.