RoCEv2 Configuration - chhwang/devel-note GitHub Wiki
This guide is tested only with Mellanox ConnectX-4 NICs.
Server Configuration
-
Download the network driver from Mellanox homepage. Version 4.4-1.0.0.0 is the latest at the time of this writing.
# wget http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.4-1.0.0.0/MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz
-
Extract the downloaded .tgz file and run the installation script.
# tar xzf MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz # ./MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64/mlnxofedinstall
-
Restart the driver.
# service openibd restart
-
Follow Mellanox guide for auto-configuration of PFC via LLDP DCBX. You need to replace
/dev/mst/mt4115_pciconf0
andens21f0
into your own MST device and network interface name, respectively. Refer to the guide for detailed explanation.# mst start Starting MST (Mellanox Software Tools) driver set Loading MST PCI module - Success Loading MST PCI configuration module - Success Create devices Unloading MST PCI module (unused) - Success # mst status MST modules: ------------ MST PCI module is not loaded MST PCI configuration module loaded MST devices: ------------ /dev/mst/mt4115_pciconf0 - PCI configuration cycles access. domain:bus:dev.fn=0000:82:00.0 addr.reg=88 data.reg=92 Chip revision is: 00 # mlxconfig -d /dev/mst/mt4115_pciconf0 set LLDP_NB_DCBX_P1=TRUE \ LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE \ LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2 Device #1: ---------- Device type: ConnectX4 PCI device: /dev/mst/mt4115_pciconf0 Configurations: Next Boot New LLDP_NB_DCBX_P1 False(0) True(1) LLDP_NB_TX_MODE_P1 OFF(0) ALL(2) Apply new Configuration? ? (y/n) [n] : y Applying... Done! -I- Please reboot machine to load new configurations. # mlxfwreset -d /dev/mst/mt4115_pciconf0 --level 3 reset Requested reset level for device, /dev/mst/mt4115_pciconf0: 3: Driver restart and PCI reset Continue with reset?[y/N] y -I- Sending Reset Command To Fw -Done -I- Stopping Driver -Done -I- Resetting PCI -Done -I- Starting Driver -Done -I- Restarting MST -Done -I- FW was loaded successfully. # mlnx_qos -i ens21f0 -d fw --trust dscp DCBX mode: Firmware controlled Priority trust state: dscp dscp2prio mapping: prio:0 dscp:07,06,05,04,03,02,01,00, prio:1 dscp:15,14,13,12,11,10,09,08, prio:2 dscp:23,22,21,20,19,18,17,16, prio:3 dscp:31,30,29,28,27,26,25,24, prio:4 dscp:39,38,37,36,35,34,33,32, prio:5 dscp:47,46,45,44,43,42,41,40, prio:6 dscp:55,54,53,52,51,50,49,48, prio:7 dscp:63,62,61,60,59,58,57,56, Cable len: 7 PFC configuration: priority 0 1 2 3 4 5 6 7 enabled 0 0 0 0 0 0 0 0 tc: 0 ratelimit: unlimited, tsa: vendor priority: 1 tc: 1 ratelimit: unlimited, tsa: vendor priority: 0 tc: 2 ratelimit: unlimited, tsa: vendor priority: 2 tc: 3 ratelimit: unlimited, tsa: vendor priority: 3 tc: 4 ratelimit: unlimited, tsa: vendor priority: 4 tc: 5 ratelimit: unlimited, tsa: vendor priority: 5 tc: 6 ratelimit: unlimited, tsa: vendor priority: 6 tc: 7 ratelimit: unlimited, tsa: vendor priority: 7
-
Enable ECN for all prioirty queues.
# for i in {0..7}; do echo 1 > /sys/class/net/ens21f0/ecn/roce_np/enable/$i; done # for i in {0..7}; do echo 1 > /sys/class/net/ens21f0/ecn/roce_rp/enable/$i; done
-
(Optional) Enable ECN for TCP traffic.
# sysctl -w net.ipv4.tcp_ecn=1
Switch Configuration
-
Run LLDP on the switch.
switch(config)# lldp run
-
Enable PFC and DCBX on the switch. The following shows configuring Et1/1 port only, but it should be done for every port connected to a server node.
switch(config)# interface et1/1 switch(config-if-Et1/1)# priority-flow-control mode on switch(config-if-Et1/1)# dcbx mode ieee
You may need to restart the server-side driver via
service openibd restart
after DCBX is configured on the switch. -
Enable ECN. Minimum/maximum threshold needs to be adjusted depending on your environment.
switch(config)# qos random-detect ecn global-buffer minimum-threshold 100 segments maximum-threshold 1000 segments
Validation
- Check
mlnx_qos
result on each server node.ens21f0
needs to be replaced into your own network interface name.# mlnx_qos -i ens21f0 -d fw --trust dscp DCBX mode: Firmware controlled Priority trust state: dscp dscp2prio mapping: prio:0 dscp:07,06,05,04,03,02,01,00, prio:1 dscp:15,14,13,12,11,10,09,08, prio:2 dscp:23,22,21,20,19,18,17,16, prio:3 dscp:31,30,29,28,27,26,25,24, prio:4 dscp:39,38,37,36,35,34,33,32, prio:5 dscp:47,46,45,44,43,42,41,40, prio:6 dscp:55,54,53,52,51,50,49,48, prio:7 dscp:63,62,61,60,59,58,57,56, Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0, Cable len: 7 PFC configuration: priority 0 1 2 3 4 5 6 7 enabled 1 1 1 1 1 1 1 1 buffer 1 1 1 1 1 1 1 1 tc: 0 ratelimit: unlimited, tsa: vendor priority: 1 tc: 1 ratelimit: unlimited, tsa: vendor priority: 0 tc: 2 ratelimit: unlimited, tsa: vendor priority: 2 tc: 3 ratelimit: unlimited, tsa: vendor priority: 3 tc: 4 ratelimit: unlimited, tsa: vendor priority: 4 tc: 5 ratelimit: unlimited, tsa: vendor priority: 5 tc: 6 ratelimit: unlimited, tsa: vendor priority: 6 tc: 7 ratelimit: unlimited, tsa: vendor priority: 7