Production access - alphagov/notifications-manuals GitHub Wiki

Two Eye (2i) process

The GDS cyber team will be alerted every time there is Production access

All actions should be approved by a second team member to make sure they are genuine. See the 2i process section below. Each approval counts for a single action - if you need to e.g. SSH onto a machine multiple times, you will need to go through the process again.

If Cyber get an alert, they'll check for a corresponding approval. If they can't see one, Cyber will contact us via Slack to make sure it's genuine. If the event is out of hours, they'll be woken up and then will in turn wake up the person on Notify Pagerduty to check.

If we need to add/remove alerting or tweak the configuration (for example, filter what apps alerts get fired on), we should email Gabriel Currie/Romina Ahmad from the Cyber Security team, and they'll look at the feasibility and schedule time to do the work on it.

When the alert will fire

  • if you assume an admin role into the Production AWS account, for example:
    • gds aws notify-prod-admin -l
  • if you assume an admin role into the Deploy AWS account, for example:
    • gds aws notify-deploy-admin -- terraform plan
  • if you go into the production DB with a edit role, for example:
    • gds aws notify-prod-admin -- db-connect.sh notifydb -- psql

Triggering the 2i process

  1. Before you do the action, go to the #cyber-security-notifications channel in Slack.
  2. Click on the 'Workflows' folder at the top, and select 'Action Notification'
  3. Click 'Action Notification'.
  4. Write a brief sentence for what are you doing:
    • Connecting to Notify DB with write access e.g. to update a record.
    • SSHing onto on notify production ecs cluster e.g. to run a manual task.
    • Assume Admin role in Notify AWS e.g. to run Terraform.
  5. Pick another team member to confirm the action is not unexpected.
  6. Once the the team member has confirmed your action you will get a notification from slackbot, then you can do the thing!

If it is an emergency and no other team member is around, for example out of hours, then you should still do the above, picking someone and then doing the action even if they aren't able to approve. If ongoing access is required for an incident, ask for the approval to be "pinned" for up to a few hours in #cyber-security-help.

Database access

See https://github.com/alphagov/notifications-manuals/wiki/Postgres-databases for information on how to connect

Safety tips

Running update queries on production is inherently dangerous. To reduce your chance of learning this the hard way, you should:

  • always use your local environment for testing and development
  • consider whether writing a migration would be better than running a query on production
  • always pair with another developer
  • write the query in a text editor first where you can format it for readability
  • write the where clause first, so if you accidentally hit enter it isn’t a disaster
    • bad: update services set name = 'foo'
    • good update services where id = 'aaaa-bbbb-cccc-dddd'
  • write the query as a select first
    • select from services where id = 'aaaa-bbbb-cccc-dddd'
    • 1 RESULT – looks good
    • updates services where id = 'aaaa-bbbb-cccc-dddd set name = 'foo''
    • consider saving the result of the select somewhere as a backup or audit trail

Commands you probably shouldn’t be running anywhere except your local machine

  • insert
  • delete
  • truncate
  • alter table (this may be needed in an alter but acquires an exclusive access lock on a table so can be very very dangerous)
  • vacuum (unless you really understand the consequences – it can lock tables, sometimes for hours if they’re very big tables)