Pagerduty - alphagov/notifications-manuals GitHub Wiki
Service and Escalation policies
Services
Notify has set up two "services" in PagerDuty. A "service" is used to group alerts, have independent integrations and use escalation policies.
- GOV.UK Notify P1 outages
- Alerts sent by Email, SNS (Cloudwatch), Elastalert, Concourse, Managed prometheus, Cronitor and Sentry and Live Call Routing
- Sends alerts to Slack incident channel
- Automatically escalates using the "P1 outages" escalation policy (see below)
- Allows manual escalation to "Escalate to Managers" escalation policy
- GOV.UK Notify Warnings
- Alerts sent by Email, SNS (Cloudwatch), Elastalert, Concourse, Alertmanager (prometheus), Managed prometheus, Cronitor, Sentry
- Sends alerts to Slack incident channel
- Automatically escalates using the "Warnings" escalation policy (see below)
- Allows manual escalation to "Escalate to Managers" escalation policy
Escalation policies
There are three escalation policies configured in PagerDuty:
- Escalate to Managers - Policy not linked to a service and therefore is not in use
- P1 outages - Escalates P1 outages to:
- Tech leads during office hours
- Out Of Hours (OOH) team at other times
- Unacknowledged incidents are escalated to Notify managers after 5 minutes
- Warnings - Escalates lower priority alerts to Tech leads during office hours
PagerDuty integrations
Concourse PagerDuty alerts
Concourse has PagerDuty integration keys for both the GOV.UK Notify P1 outages and GOV.UK Notify Warnings service. The Jinja templates define the priority_level for an alert, which defines which integration key should be used.
| Type | Name | PagerDuty Service |
|---|---|---|
| smoke | file-uploads | Warnings |
| smoke | text-message-sending | P1 outages |
| smoke | email-sending | P1 outages |
| smoke | letter-sending | Warnings |
| provider | text-message-delivery-receipts | Warnings |
| provider | text-message-receiving | Warnings |
| provider | email-delivery-receipts | P1 outages |
Sentry PagerDuty alerts
See the information on the Sentry page.
Holiday
Before hols
- Wait until the day before you go on holiday
- Log in to pagerduty and go to Escalation Policies (under the "People" nav item) https://governmentdigitalservice.pagerduty.com/escalation_policies
- For each of "GOV.UK Notify - P1 outages" and "GOV.UK Notify - Warnings"
- Click the cog icon to edit
- Remove your on-call schedule from step 1 ("Notify the following users or schedules")
- Consider setting a slackbot reminder for the first morning you're back post-holiday to remind you to re-add yourself to the schedule.
After hols
When you return from holiday you need to add your schedule back to each escalation policy. Follow the steps above but in reverse.