RoCEv2 Configuration - chhwang/devel-note GitHub Wiki

This guide is tested only with Mellanox ConnectX-4 NICs.

Server Configuration
  1. Download the network driver from Mellanox homepage. Version 4.4-1.0.0.0 is the latest at the time of this writing.

    # wget http://www.mellanox.com/downloads/ofed/MLNX_OFED-4.4-1.0.0.0/MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz
    
  2. Extract the downloaded .tgz file and run the installation script.

    # tar xzf MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz
    # ./MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64/mlnxofedinstall
    
  3. Restart the driver.

    # service openibd restart
    
  4. Follow Mellanox guide for auto-configuration of PFC via LLDP DCBX. You need to replace /dev/mst/mt4115_pciconf0 and ens21f0 into your own MST device and network interface name, respectively. Refer to the guide for detailed explanation.

    # mst start
    Starting MST (Mellanox Software Tools) driver set
    Loading MST PCI module - Success
    Loading MST PCI configuration module - Success
    Create devices
    Unloading MST PCI module (unused) - Success
    
    # mst status
    MST modules:
    ------------
        MST PCI module is not loaded
        MST PCI configuration module loaded
    
    MST devices:
    ------------
    /dev/mst/mt4115_pciconf0         - PCI configuration cycles access.
                                       domain:bus:dev.fn=0000:82:00.0 addr.reg=88 data.reg=92
                                       Chip revision is: 00
    
    # mlxconfig -d /dev/mst/mt4115_pciconf0 set LLDP_NB_DCBX_P1=TRUE \
    LLDP_NB_TX_MODE_P1=2 LLDP_NB_RX_MODE_P1=2 LLDP_NB_DCBX_P2=TRUE \
    LLDP_NB_TX_MODE_P2=2 LLDP_NB_RX_MODE_P2=2
    
    Device #1:
    ----------
    
    Device type:    ConnectX4       
    PCI device:     /dev/mst/mt4115_pciconf0
    
    Configurations:                              Next Boot       New
             LLDP_NB_DCBX_P1                     False(0)        True(1)
             LLDP_NB_TX_MODE_P1                  OFF(0)          ALL(2)
    
     Apply new Configuration? ? (y/n) [n] : y
    Applying... Done!
    -I- Please reboot machine to load new configurations.
    
    # mlxfwreset -d /dev/mst/mt4115_pciconf0 --level 3 reset
    
    Requested reset level for device, /dev/mst/mt4115_pciconf0:
    
    3: Driver restart and PCI reset 
    Continue with reset?[y/N] y
    -I- Sending Reset Command To Fw             -Done
    -I- Stopping Driver                         -Done
    -I- Resetting PCI                           -Done
    -I- Starting Driver                         -Done
    -I- Restarting MST                          -Done
    -I- FW was loaded successfully.
    
    # mlnx_qos -i ens21f0 -d fw --trust dscp
    DCBX mode: Firmware controlled
    Priority trust state: dscp
    dscp2prio mapping:
    	prio:0 dscp:07,06,05,04,03,02,01,00,
    	prio:1 dscp:15,14,13,12,11,10,09,08,
    	prio:2 dscp:23,22,21,20,19,18,17,16,
    	prio:3 dscp:31,30,29,28,27,26,25,24,
    	prio:4 dscp:39,38,37,36,35,34,33,32,
    	prio:5 dscp:47,46,45,44,43,42,41,40,
    	prio:6 dscp:55,54,53,52,51,50,49,48,
    	prio:7 dscp:63,62,61,60,59,58,57,56,
    Cable len: 7
    PFC configuration:
    	priority    0   1   2   3   4   5   6   7
    	enabled     0   0   0   0   0   0   0   0   
    tc: 0 ratelimit: unlimited, tsa: vendor
    	 priority:  1
    tc: 1 ratelimit: unlimited, tsa: vendor
    	 priority:  0
    tc: 2 ratelimit: unlimited, tsa: vendor
    	 priority:  2
    tc: 3 ratelimit: unlimited, tsa: vendor
    	 priority:  3
    tc: 4 ratelimit: unlimited, tsa: vendor
    	 priority:  4
    tc: 5 ratelimit: unlimited, tsa: vendor
    	 priority:  5
    tc: 6 ratelimit: unlimited, tsa: vendor
    	 priority:  6
    tc: 7 ratelimit: unlimited, tsa: vendor
    	 priority:  7
    
  5. Enable ECN for all prioirty queues.

    # for i in {0..7}; do echo 1 > /sys/class/net/ens21f0/ecn/roce_np/enable/$i; done
    # for i in {0..7}; do echo 1 > /sys/class/net/ens21f0/ecn/roce_rp/enable/$i; done
    
  6. (Optional) Enable ECN for TCP traffic.

    # sysctl -w net.ipv4.tcp_ecn=1
    
Switch Configuration
  1. Run LLDP on the switch.

    switch(config)# lldp run
    
  2. Enable PFC and DCBX on the switch. The following shows configuring Et1/1 port only, but it should be done for every port connected to a server node.

    switch(config)# interface et1/1
    switch(config-if-Et1/1)# priority-flow-control mode on
    switch(config-if-Et1/1)# dcbx mode ieee
    

    You may need to restart the server-side driver via service openibd restart after DCBX is configured on the switch.

  3. Enable ECN. Minimum/maximum threshold needs to be adjusted depending on your environment.

    switch(config)# qos random-detect ecn global-buffer minimum-threshold 100 segments maximum-threshold 1000 segments
    
Validation
  1. Check mlnx_qos result on each server node. ens21f0 needs to be replaced into your own network interface name.
    # mlnx_qos -i ens21f0 -d fw --trust dscp
    DCBX mode: Firmware controlled
    Priority trust state: dscp
    dscp2prio mapping:
            prio:0 dscp:07,06,05,04,03,02,01,00,
            prio:1 dscp:15,14,13,12,11,10,09,08,
            prio:2 dscp:23,22,21,20,19,18,17,16,
            prio:3 dscp:31,30,29,28,27,26,25,24,
            prio:4 dscp:39,38,37,36,35,34,33,32,
            prio:5 dscp:47,46,45,44,43,42,41,40,
            prio:6 dscp:55,54,53,52,51,50,49,48,
            prio:7 dscp:63,62,61,60,59,58,57,56,
    Receive buffer size (bytes): 130944,130944,0,0,0,0,0,0,
    Cable len: 7
    PFC configuration:
            priority    0   1   2   3   4   5   6   7
            enabled     1   1   1   1   1   1   1   1   
            buffer      1   1   1   1   1   1   1   1   
    tc: 0 ratelimit: unlimited, tsa: vendor
             priority:  1
    tc: 1 ratelimit: unlimited, tsa: vendor
             priority:  0
    tc: 2 ratelimit: unlimited, tsa: vendor
             priority:  2
    tc: 3 ratelimit: unlimited, tsa: vendor
             priority:  3
    tc: 4 ratelimit: unlimited, tsa: vendor
             priority:  4
    tc: 5 ratelimit: unlimited, tsa: vendor
             priority:  5
    tc: 6 ratelimit: unlimited, tsa: vendor
             priority:  6
    tc: 7 ratelimit: unlimited, tsa: vendor
             priority:  7