What to do during an incident - alphagov/notifications-manuals GitHub Wiki
Security incidents and data breaches
You should follow the normal process below but you also need to follow the GDS guidance and specifically:
-
If the incident is related to cyber security then you should report this to the Cyber Security team as soon as possible.
-
If the incident involves a data breach it should be reported to the GDS Privacy team at [email protected].
-
If the incident involves a data breach it should be reported to the Cabinet Office Data Protection Officer at [email protected]. This doesn't need to be done until we've reached a conclusion of what data has been lost. Check with the GDS Privacy team as they might do this on our behalf.
-
If the incident involves spamming or phishing from our platform inform NCSC at [email protected] after you have gathered information on who and what was sent.
You may need to do just one of these or you may need to do all of them depending on the incident.
We also suggest you include the GDS Information Assurance team ([email protected]) in either situation.
Normal incidents
It's better to assume something is an incident and start documenting it early.
-
Nominate an incident lead and comms lead and write their names in slack:
- The incident lead should prioritise technical investigation and fixes.
- The comms lead should be responsible for communicating the incident to both our users and internal stakeholders.
- The comms lead should ensure events are well documented in an incident report.
- The comms lead may decide to delegate note taking in the incident report to another team member if they have too much to do.
- If an incident is severe or high profile, ask a product manager or designer to help with public comms.
-
Comms lead to begin note taking in a Google Doc and share in #govuk-notify-incident:
-
Join the standing incident Meet and remind people you're in there:
- Only the comms lead and the incident lead need to be in the Meet.
- Other people can listen in for info or if one of the incident people asks for help. Having a standing Meet allows people to join easily and find where the discussion is happening
-
Comms lead to update Statuspage if necessary.
-
Once the incident investigation is in a good state and we understand the impact the comms lead should notify GDS stakeholders about the incident. You should assume the readers have minimal knowledge of Notify so make sure in your email you are clear about what the user impact is in plain english. Comms to our users should take priority over comms to GDS stakeholders.
- Send details of the incident to the [email protected] and the [email protected] groups, and CC notify-support@ email address for reference.
- Optional: post in #incident if it would help other parts of GDS to know about it.
After the incident
After the incident, the incident lead should:
- Schedule a meeting to review the report.
- Ask a tech lead to update the DSP Monthly Incident Review document. Update the incidents to review table for the next meeting with the team name, date of incident, incident priority, link to any incident actions and one line description of the incident (with a link to the incident report).
Escalate to someone senior for help
If you are struggling to resolve the incident and want to call in some support then you can use our team contact details. It is better to escalate and find out you didn't need it in the end then not to escalate at all.
If you need a Senior Civil Servant (someone in Digital Service Platforms senior management team) to help with a P1 incident there is a Senior Civil Servant (SCS) pagerduty rota. You can read more about what they would be expecting if called.
- In PagerDuty, select "New Incident"
- Enter "GOV.UK Notify" in the "Impacted Service" field
- Assign to "DSP SCS escalation" (or GaaP SCS escalation if you can't find the DSP one)
- Fill in other fields as needed, and select "Create Incident"
Severity of incidents
This is a non complete list of different severity incidents that might happen.
Severity | Description | Response time (time to open your laptop, post something in Slack that you are looking at this and start investigating) |
---|---|---|
P1 | API is unavailable | 30 minutes |
P1 | www.notifications.service.gov.uk is unavailable | 30 minutes |
P1 | Text message or email sending is unavailable | 30 minutes |
P2 | Letter sending is unavailable | 30 minutes |
P2 | Delivery receipts are unavailable | 30 minutes |
P2 | Service callbacks are unavailable | 30 minutes |
P2 | Inbound text messages are unavailable | 30 minutes |
P2 | Sending documents by email is unavailable | 30 minutes |
P2 | Downloading documents sent by email is unavailable | 30 minutes |
P2 | Severe delays to text message or email sending | 30 minutes |
P3 | Severe delays to letter sending | Next working day |
P3 | Other minor degraded service | Next working day |
P4 | Incident with no user impact (such as Concourse unavailable | Next working day |