PFC Watchdog Design - mykolaf/SONiC GitHub Wiki

PFC Watchdog in SONiC

High Level Design Document

Rev 0.1

Table of Contents

List of Tables

Revision
Rev Date Author Change Description
0.1 Marian Pritsak Initial version

About this Manual

This document provides general information about the PFC Watchdog feature implementation in SONiC. PFC watchdog is designed to detect and mitigate PFC storm received for each port. PFC pause frames are used in lossless Ethernet to pause the link partner from sending packets. Such back-pressure mechanism could propagate to the whole network and cause the network stop forwarding traffic. PFC watchdog is to detect abnormal back-pressure caused by receiving excessive PFC pause frames, and mitigate such situation by disabling PFC caused pause temporarily. PFC watchdog has three functional blocks, i.e. detection, mitigation and restoration.

Scope

This document describes the high level design of the PFC WD feature.

Definitions/Abbreviation

Table 2: Abbreviations
Definitions/Abbreviation Description
PFC Priority Flow Control
WD Watchdog
ACL Access Control List

1 PFC WD Subsystem Requirements Overview

1.1 Functional requirements

  • Support PAUSE duration counters (M)
  • Support egress ACL for drop mitigation action (S)

2 Modules Design

2.1 App DB

2.1.1 PFC WD Table

; Defines PFC WD configuration on a port
key                              = PFC_WD_TABLE:ifname           ; configuration for watchdog on port
; field                          = value
state                            = "enabled"/"disabled"          ; state of WD as configured by user
t0                               = 1*3DIGIT                      ; t0 pause duration in msecs time after which
                                                                 ; queue is considered to be in PFC
                                                                 ; storm, and watchdog action is triggered.
t1                               = 1*3DIGIT                      ; t1 time in msecs after which
                                                                 ; queue in PFC storm state is
                                                                 ; restored if no pause frames were received.
action                           = "drop"/"forward"              ; action taken by watchdog in case
                                                                 ; of PFC storm detection.

2.2 PFC_WD DB

Orchagent can have different strategies or criteria for storm detection, hence it needs to tell syncd which counters to poll.

2.2.1 PFC WD Queue Table

; Defines schema for queues that must be polled by syncd, and corresponding counters
key                            = "PFC_WD_STATE":""queueId"      ; WD queue entry
; field                        = value
PORT_COUNTER_ID_LIST           = 1*64VCHAR                      ; list of ',' separated counter IDs
QUEUE_COUNTER_ID_LIST          = 1*64VCHAR                      ; list of ',' separated counter IDs

2.3 COUNTERS DB

2.3.1 COUNTERS table

; Defines schema for queue counters that are updated by PFC WD
key                                            = "COUNTERS":""queueId"  ; WD queue entry
; field                                        = value
PFC_WD_QUEUE_STATS_STORM_DETECTED              = 1*4DIGIT               ; deadlock counter
PFC_WD_QUEUE_STATS_STORM_RESTORED              = 1*4DIGIT               ; restoration counter
PFC_WD_QUEUE_STATS_TX_PACKETS                  = 1*20DIGIT              ; total packets transmitted during storm
PFC_WD_QUEUE_STATS_TX_DROPPED_PACKETS          = 1*20DIGIT              ; total Tx packets dropped due to storm
PFC_WD_QUEUE_STATS_RX_PACKETS                  = 1*20DIGIT              ; total packets received during storm
PFC_WD_QUEUE_STATS_RX_DROPPED_PACKETS_LAST     = 1*20DIGIT              ; total Rx packets dropped due to storm
PFC_WD_QUEUE_STATS_TX_PACKETS_LAST             = 1*20DIGIT              ; packets transmitted during last storm
PFC_WD_QUEUE_STATS_TX_DROPPED_PACKETS_LAST     = 1*20DIGIT              ; Tx packets dropped due to last storm
PFC_WD_QUEUE_STATS_RX_PACKETS_LAST             = 1*20DIGIT              ; packets received during last storm
PFC_WD_QUEUE_STATS_RX_DROPPED_PACKETS_LAST     = 1*20DIGIT              ; Rx packets dropped due to last storm

2.4 Criteria for storm detection

As different vendors support different counters, there must be a way to let every ASIC vendor decide how to tell if queue is stormed. For instance, a possible criteria could be one based on pause duration counter:

   (SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES > 0 && SAI_QUEUE_STAT_PACKETS.current - SAI_QUEUE_STAT_PACKETS.last == 0 && SAI_PORT_STAT_PFC_[queue]_RX_PKT.current - SAI_PORT_STAT_PFC_[queue]_RX_PKT > 0)
   ||
   (SAI_QUEUE_STAT_CURR_OCCUPANCY_BYTES ==  0 && SAI_QUEUE_STAT_PACKETS.current - SAI_QUEUE_STAT_PACKETS.last == 0 && SAI_PORT_STAT_PFC_[queue]_RX_PAUSE_DURATION.current == SAI_PORT_STAT_PFC_[queue]_RX_PAUSE_DURATION.last + t0 * delta)
   // delta is a percentage of time that queue had to be paused (e. g. 0.9)

Those criteria are coded in .lua script for Redis DB and called periodically based on configured timers by orchagent.

2.5 Action Handlers

As different platforms might choose different actions to be performed for mitigation and restoration, PfcWdAction class is defined to provide common interface for these handlers:

class PfcWdAction
{
   virtual sai_status_t MitigateHandler(sai_object_t port, uint32_t queueId);
   virtual sai_status_t RestoreHandler(sai_object_t port, uint32_t queueId);
};

Currently drop and forward actions are defined and provide corresponding mark and unmark handlers.

2.6 Sections in minigraph

Watchdog settings are specified per port, in the device minigraph under DeviceInfos/DeviceInfo/EthernetInterfaces/EthernetInterface. All configuration goes under DeviceInfos/DeviceInfo/EthernetInterfaces/EthernetInterface/PfcWatchdog tag. if its absent, watchdog is considered to be disabled for that port. DeviceInfos/DeviceInfo/EthernetInterfaces/EthernetInterface/PfcWatchdog/Action defines mitigation action to be taken in case of pause storm. DeviceInfos/DeviceInfo/EthernetInterfaces/EthernetInterface/PfcWatchdog/DetectionTime defines time interval for storm detection. DeviceInfos/DeviceInfo/EthernetInterfaces/EthernetInterface/PfcWatchdog/RestorationTime defines time interval for restoration queue from storm.

Sample of the minigraph with the PFC WD settings for port Ethernet1:
...
<DeviceInfos>
...
	<DeviceInfo>
		<EthernetInterfaces>
			...
			<EthernetInterface>
				<InterfaceName>Ethernet1</InterfaceName>
					...
					<PfcWatchdog>
						<Action>drop</Action>
						<DetectionTime>200</DetectionTime>
						<RestorationTime>5000</RestorationTime>
					</PfcWatchdog>
					...
			</EthernetInterface>
			...
		</EthernetInterfaces>
	</DeviceInfo>
</DeviceInfos>

2.7 Events for Resetting PFC WD

There is a set of external events that can fully or partially invalidate current state of watchdog.

2.7.1 Counters Reset

In case of resetting counters PFC WD is in undefined state and should skip one polling interval. SONiC does not use API to reset counters on ASIC, so this event is ignored.

2.7.2 Port Going Down

In case if port's state changes to DOWN, all queues marked as stormed, are restored.

2.7.3 Queue Reconfiguration or Removal

Queue configuration must be going through PFC WD proxy in order to make it update its internal state. If PFC is disabled on a queue that was marked as stormed, queue will be restored.

2.8 CLI

In order to provide user an ability to set/view PFC WD configuration/statistics, pfcwatchdog CLI tool should provide following functionality:

  • Show watchdog configuration (per port).
  • Show watchdog statistics (per port/queue).
  • Enable watchdog on a specified port(s).
  • Disable watchdog on a specified port(s).

CLI tool should provide the ability to filter any option by a list of ports on which command is being applied, or apply to all ports if the list is not provided.

3 Flows

3.1 Reading SAI Counters

All port counters are stored in Counters DB. In order to have all required information from hardware, watchdog needs to extend it by making syncd read values from SAI: it subscribes queues for polling in WD database. syncd will call apropriate .lua script upon every counters update to check if queues changed their state and notify orchagent upon any change.

3.2 Watchdog Main Thread

Orchagent is subscribed to syncd notifications of queue deadlock, and is supposed to apply configured action (drop/forward) upon receiving notification that queue became locked, or restore queue when opposite notification type is received.

3.3 WD Drop Action

3.3.1 Detect Handler

Following mitigation handler disables PFC on marked queue and sets its reserved buffer to 0.

3.3.2 Restore Handler

Following restoration handler returns reserved buffer value to initial value and enables PFC on unmarked queue.

3.4 WD Forward Action

3.4.1 Detect Handler

Following mitigation handler disables PFC on stormed queue. It will no more respect pause frames from link partner, and forward all packets.

3.4.2 WD Restore Handler

Following restore handler will reenable PFC on a queue so that it will continue to work in lossless mode as configured by user.

⚠️ **GitHub.com Fallback** ⚠️