10: DESIGN A NOTIFICATION SYSTEM - swchen1234/systemDesign GitHub Wiki

Step 1 - Understand the problem and establish design scope

  • What types of notifications does the system support?
  • Is it a real-time system/supported device?
  • What triggers notifications?
  • Will users be able to opt-out?
  • How many notifications are sent out each day?

Step 2 - Propose high-level design and get buy-in

三个组成

    1. Different types of notifications
    1. Contact info gathering flow
    1. Notification sending/receiving flow

1. Different types of notifications

  • iOS push notification
  • Android push notification
  • SMS message
  • Email 全都follow同样的组成,如下图是ios的组成

其中

  • Provider. A provider builds and sends notification requests to third party services(e.g. Apple Push Notification Service (APNS))
  • Device token: This is a unique identifier used for sending push notifications.
  • Payload: This is a JSON dictionary that contains a notification’s payload.
  • Third Party Service 基于不同notification type 有不同service

2. Contact info gathering flow

  • when a user installs our app or signs up for the first time, API servers collect user contact info and store it in the database
  • Datbase storage 可分为多个table 存储 mobile device tokens, phone numbers, or email addresses etc..

3. Notification sending/receiving flow

  • Service 1 to N: A service can be a micro-service, a cron job, or a distributed system that triggers notification sending events. Notification system: The notification system is the centerpiece of sending/receiving notifications.
  • Third-party services
  • Good extensibility means a flexible system that can easily plugging or unplugging of a third-party service.
  • Another important consideration is that a third-party service might be unavailable in new markets or in the future.
  • iOS, Android, SMS, Email Three problems are identified in this design:
  • Single point of failure (SPOF): A single notification server means SPOF.
  • Hard to scale: The notification system handles everything related to push notifications in one server. It is challenging to scale databases, caches, and different notification processing components independently.
  • Performance bottleneck: Processing and sending notifications can be resource intensive. For example, constructing HTML pages and waiting for responses from third party services could take time. Handling everything in one system can result in the system overload, especially during peak hours.

High-level design (improved)

  • Move the database and cache out of the notification server.
  • Add more notification servers and set up automatic horizontal scaling.
  • Introduce message queues to decouple the system components.

Notification servers:

  • Provide APIs for services to send notifications. Those APIs are only accessible internally or by verified clients to prevent spams.
  • Carry out basic validations to verify emails, phone numbers, etc.
  • Query the database or cache to fetch data needed to render a notification. • Put notification data to message queues for parallel processing.

Cache: User info, device info, notification templates are cached. DB: It stores data about user, notification, settings, etc. Message queues: They remove dependencies between components. Message queues serve as buffers when high volumes of notifications are to be sent out. Each notification type is assigned with a distinct message queue so an outage in one third-party service will not affect other notification types. Workers: Workers are a list of servers that pull notification events from message queues and send them to the corresponding third-party services.

Put them together:

  1. A service calls APIs provided by notification servers to send notifications.
  2. Notification servers fetch metadata such as user info, device token, and notification setting from the cache or database.
  3. A notification event is sent to the corresponding queue for processing. For instance, an iOS push notification event is sent to the iOS PN queue.
  4. Workers pull notification events from message queues. 5. Workers send notifications to third party services.
  5. Third-party services send notifications to user devices.

Step 3 - Design deep dive

Reliability

How to prevent data loss?

the notification system persists notification data in a database and implements a retry mechanism. The notification log database is included for data persistence.

用户会重复收到同样的吗?

When a notification event first arrives, we check if it is seen before by checking the event ID. If it is seen before, it is discarded. Otherwise, we will send out the notification.

Additional components and considerations

  • Notification template: 提供固定模版,简化编剧
  • Notification setting:用户定义
  • Rate limiting: 限制用户收到的通知数
  • Retry mechanism: 如果反复retry失败,an alert will be sent out to developers.
  • Security in push notifications: appKey and appSecret are used to secure push notification APIs [6]. Only authenticated or verified clients are allowed to send push notifications using our APIs.
  • Monitor queued notifications:一个很重要的监测指标为 total number of queued notifications.
  • Events tracking: 分析notification的点击率打开率来更好的理解用户。

Updated design