Debugging TCP Connection using iperf3. - futurewei-cloud/alcor-control-agent GitHub Wiki

Purpose

The purpose of this page is to record the experience we got from trying to test performance of ACA + OVS, the issue we met, and how did we solve it.

Set Up and workflow

update_setup_04142021

ifconfig and ovs-vsctl show output after initial set up:

ifconfig on the physical host:

# ifconfig

enp2s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.20.92  netmask 255.255.255.0  broadcast 0.0.0.0
        ether aabbcc  txqueuelen 1000  (Ethernet)
        RX packets 4880723674  bytes 898298280737 (898.2 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 40884940  bytes 111210722143 (111.2 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

f6f1bf7934004_l: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether 9a:8c:c4:ae:6b:c8  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vxlan_sys_4789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        ether abcde  txqueuelen 1000  (Ethernet)
        RX packets 1922041  bytes 100097000 (100.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 15026611  bytes 41804450036 (41.8 GB)
        TX errors 0  dropped 48 overruns 0  carrier 0  collisions 0

ifconfig inside the container, where the port resides in:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.0.2  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 6c:dd:ee:00:00:02  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ovs-vsctl on the physical host:

# ovs-vsctl show
844fa782-f448-46b9-9663-bd9ad7495c9a
    Bridge br-int
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port "f6f1bf7934004_l"
            tag: 1
            Interface "f6f1bf7934004_l"
        Port br-int
            Interface br-int
                type: internal
    Bridge br-tun
        Port vxlan-generic
            Interface vxlan-generic
                type: vxlan
                options: {df_default="true", egress_pkt_mark="0", in_key=flow, out_key=flow, remote_ip=flow}
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
        Port br-tun
            Interface br-tun
                type: internal
    ovs_version: "2.9.8"

Results of testing the basic ping

root@9fe5749176e4:/usr/local/apache2# ping -I 10.0.0.2 10.0.0.3
PING 10.0.0.3 (10.0.0.3) from 10.0.0.2 : 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=0.185 ms
64 bytes from 10.0.0.3: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 10.0.0.3: icmp_seq=3 ttl=64 time=0.091 ms
64 bytes from 10.0.0.3: icmp_seq=4 ttl=64 time=0.076 ms
^C
--- 10.0.0.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 80ms
rtt min/avg/max/mdev = 0.076/0.110/0.185/0.045 ms

Ping works as expected.

Try to test TCP performance using iperf

Server side output:

root@7e77e40c37a7:/usr/local/apache2# iperf -s -B 10.0.0.3 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.0.0.3
TCP window size:  128 KByte (default)
----------------------------------------------------

Client side output:

root@9fe5749176e4:/usr/local/apache2# iperf -c 10.0.0.3 -i1 -t10
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 33214 connected with 10.0.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   107 KBytes   880 Kbits/sec
[  3]  1.0- 2.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  2.0- 3.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  3.0- 4.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  4.0- 5.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  5.0- 6.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  6.0- 7.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  7.0- 8.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  8.0- 9.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  9.0-10.0 sec  0.00 Bytes  0.00 bits/sec
[  3]  0.0-10.2 sec   107 KBytes  86.1 Kbits/sec

This looks weird, let's try to see what tcpdump says:

# tcpdump -i f6f1bf7934004_l
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on f6f1bf7934004_l, link-type EN10MB (Ethernet), capture size 262144 bytes
^[[1;9D10:56:07.076404 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [S], seq 3849303330, win 64240, options [mss 1460,sackOK,TS val 3238083790 ecr 0,nop,wscale 7], length 0
10:56:07.077178 IP 10.0.0.3.5001 > 10.0.0.2.33216: Flags [S.], seq 2492407180, ack 3849303331, win 65160, options [mss 1460,sackOK,TS val 2886471229 ecr 3238083790,nop,wscale 7], length 0
10:56:07.077228 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], ack 1, win 502, options [nop,nop,TS val 3238083790 ecr 2886471229], length 0
10:56:07.077475 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [P.], seq 1:7241, ack 1, win 502, options [nop,nop,TS val 3238083791 ecr 2886471229], length 7240
10:56:07.077597 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [P.], seq 7241:14481, ack 1, win 502, options [nop,nop,TS val 3238083791 ecr 2886471229], length 7240
10:56:07.095190 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], seq 14481:15929, ack 1, win 502, options [nop,nop,TS val 3238083808 ecr 2886471229], length 1448
10:56:07.299199 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], seq 1:1449, ack 1, win 502, options [nop,nop,TS val 3238084012 ecr 2886471229], length 1448
10:56:20.123218 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], seq 1:1449, ack 1, win 502, options [nop,nop,TS val 3238096836 ecr 2886471229], length 1448

From the Flags of each line, we can see that the two ports are connected, but the content cannot be sent from port_1 to port_2.

Investigation and Solution

We spent some good time trying to figure this out. At the end, we found the solution here.

We made two changes to our existing set up:

  1. Turning of the TCP Segmentation Offload(TSO) functionality of the related interfaces using command ethtool -k interface_name tso off. You can check if an interface's TSO is on by using command ethtook -K interface_name.
  2. Adjusting the MTU value of the related interfaces. One of the reasons that the TCP packets couldn't go through is, as the MTUs of the virtual ports and the MTU of the physical interface were the same, the size of VXLAN header(50) and size of VLAN header(4) weren't taken into consideration. In our case, it is correct to set the MTUs in the way so that the MTU of the physical interface is (VXLAN header + VLAN header) bigger than the MTU of the virtual interface(s). As the physical interface's MTU is 1500, we set the MTU of the virtual ports to be (1500 - 50 - 4 = 1446), and we tested again:

ifconfig on the physical host:

# ifconfig

enp2s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.20.92  netmask 255.255.255.0  broadcast 0.0.0.0
        ether aabbc  txqueuelen 1000  (Ethernet)
        RX packets 4909278420  bytes 905092171754 (905.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 44394728  bytes 116699221738 (116.6 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

f6f1bf7934004_l: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1446
        ether bbcde  txqueuelen 1000  (Ethernet)
        RX packets 79004  bytes 5086655124 (5.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 97847  bytes 6474018 (6.4 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 90096  bytes 7902529 (7.9 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 90096  bytes 7902529 (7.9 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vxlan_sys_4789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        ether abcde  txqueuelen 1000  (Ethernet)
        RX packets 2019881  bytes 105200964 (105.2 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 18536083  bytes 47068328608 (47.0 GB)
        TX errors 0  dropped 86 overruns 0  carrier 0  collisions 0

ifconfig inside the container:

root@9fe5749176e4:/usr/local/apache2# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1446
        inet 10.0.0.2  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 6c:dd:ee:00:00:02  txqueuelen 1000  (Ethernet)
        RX packets 172209  bytes 11388194 (10.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 154990  bytes 9907169316 (9.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

iperf results:

Server:
root@7e77e40c37a7:/usr/local/apache2# iperf -s -B 10.0.0.3 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.0.0.3
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.3 port 5001 connected with 10.0.0.2 port 33228
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 1.0 sec   453 MBytes  3.80 Gbits/sec
[  4]  1.0- 2.0 sec   457 MBytes  3.84 Gbits/sec
[  4]  2.0- 3.0 sec   459 MBytes  3.85 Gbits/sec
[  4]  3.0- 4.0 sec   467 MBytes  3.91 Gbits/sec
[  4]  4.0- 5.0 sec   461 MBytes  3.87 Gbits/sec
[  4]  5.0- 6.0 sec   454 MBytes  3.81 Gbits/sec
[  4]  6.0- 7.0 sec   454 MBytes  3.81 Gbits/sec
[  4]  7.0- 8.0 sec   469 MBytes  3.94 Gbits/sec
[  4]  8.0- 9.0 sec   456 MBytes  3.83 Gbits/sec
[  4]  9.0-10.0 sec   459 MBytes  3.85 Gbits/sec
[  4]  0.0-10.0 sec  4.48 GBytes  3.85 Gbits/sec


Client:
root@9fe5749176e4:/usr/local/apache2# iperf -c 10.0.0.3 -i1 -t10
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 33228 connected with 10.0.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec   456 MBytes  3.83 Gbits/sec
[  3]  1.0- 2.0 sec   458 MBytes  3.84 Gbits/sec
[  3]  2.0- 3.0 sec   458 MBytes  3.84 Gbits/sec
[  3]  3.0- 4.0 sec   466 MBytes  3.91 Gbits/sec
[  3]  4.0- 5.0 sec   461 MBytes  3.87 Gbits/sec
[  3]  5.0- 6.0 sec   455 MBytes  3.81 Gbits/sec
[  3]  6.0- 7.0 sec   454 MBytes  3.81 Gbits/sec
[  3]  7.0- 8.0 sec   469 MBytes  3.93 Gbits/sec
[  3]  8.0- 9.0 sec   457 MBytes  3.84 Gbits/sec
[  3]  9.0-10.0 sec   458 MBytes  3.84 Gbits/sec
[  3]  0.0-10.0 sec  4.48 GBytes  3.85 Gbits/sec

After the adjustment, TCP connection between the two container ports using iperf is successful.

One more question

We are really glad that the connection is successful, however, as this is a performance test, and we are using fibre interfaces with a speed of 10000Mb/s (10 Gb/s) on the two physical hosts, the iperf speed between the containers is not what we're looking for. Below is the iperf results between the two fibre physical interfaces:

Server:
# iperf -s -B 192.168.20.93 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 192.168.20.93
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 192.168.20.93 port 5001 connected with 192.168.20.92 port 50554
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 1.0 sec  1.08 GBytes  9.30 Gbits/sec
[  4]  1.0- 2.0 sec  1.09 GBytes  9.38 Gbits/sec
[  4]  2.0- 3.0 sec  1.09 GBytes  9.36 Gbits/sec
[  4]  3.0- 4.0 sec  1.09 GBytes  9.38 Gbits/sec
[  4]  4.0- 5.0 sec  1.09 GBytes  9.37 Gbits/sec
[  4]  5.0- 6.0 sec  1.09 GBytes  9.39 Gbits/sec
[  4]  6.0- 7.0 sec  1.09 GBytes  9.36 Gbits/sec
[  4]  7.0- 8.0 sec  1.09 GBytes  9.38 Gbits/sec
[  4]  8.0- 9.0 sec  1.09 GBytes  9.36 Gbits/sec
[  4]  9.0-10.0 sec  1.09 GBytes  9.37 Gbits/sec
[  4]  0.0-10.6 sec  11.5 GBytes  9.36 Gbits/sec

Client:
 iperf -c 192.168.20.93 -i1 -t30
------------------------------------------------------------
Client connecting to 192.168.20.93, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.20.92 port 50554 connected with 192.168.20.93 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  1.09 GBytes  9.33 Gbits/sec
[  3]  1.0- 2.0 sec  1.09 GBytes  9.38 Gbits/sec
[  3]  2.0- 3.0 sec  1.09 GBytes  9.36 Gbits/sec
[  3]  3.0- 4.0 sec  1.09 GBytes  9.37 Gbits/sec
[  3]  4.0- 5.0 sec  1.09 GBytes  9.38 Gbits/sec
[  3]  5.0- 6.0 sec  1.09 GBytes  9.38 Gbits/sec
[  3]  6.0- 7.0 sec  1.09 GBytes  9.36 Gbits/sec
[  3]  7.0- 8.0 sec  1.09 GBytes  9.38 Gbits/sec
[  3]  8.0- 9.0 sec  1.09 GBytes  9.36 Gbits/sec
[  3]  9.0-10.0 sec  1.09 GBytes  9.37 Gbits/sec
^C[  3]  0.0-10.6 sec  11.5 GBytes  9.37 Gbits/sec

The difference is big, and we were worried it was ACA + OVS that brought that much of overhead.

Is it overhead though?

After some investigation, we found out that the difference of the iperf speeds wasn't a overhead brought by ACA + OVS, but a difference brought by the MTU values of the interfaces.

As mentioned above, we set the MTU of the physical interfaces to 1500, which is a default value. However, if we allow something called Jumbo Frame on our physical interface by setting the MTU to 9000, and setting the virtual devices' MTU to 8946 (9000 - 50 - 4 = 8946), we are able to fully utilize the capability of these fibre devices, even using port inside the containers.

ifconfig on physical host:

# ifconfig

enp2s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 192.168.20.92  netmask 255.255.255.0  broadcast 0.0.0.0
        ether bbbbc  txqueuelen 1000  (Ethernet)
        RX packets 4923108139  bytes 908021707382 (908.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 56390522  bytes 134846312070 (134.8 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

f6f1bf7934004_l: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 8946
        ether bbcde  txqueuelen 1000  (Ethernet)
        RX packets 154990  bytes 9907169316 (9.9 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 172209  bytes 11388194 (11.3 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vxlan_sys_4789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 65000
        ether abcde  txqueuelen 1000  (Ethernet)
        RX packets 2094242  bytes 109074044 (109.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 21990981  bytes 52063482444 (52.0 GB)
        TX errors 0  dropped 86 overruns 0  carrier 0  collisions 0

ifconfig inside container:

root@9fe5749176e4:/usr/local/apache2# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 8946
        inet 10.0.0.2  netmask 255.255.0.0  broadcast 0.0.0.0
        ether 6c:dd:ee:00:00:02  txqueuelen 1000  (Ethernet)
        RX packets 279501  bytes 18603666 (17.7 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 252482  bytes 15272089008 (14.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

iperf result:

Server:

root@7e77e40c37a7:/usr/local/apache2# iperf -s -B 10.0.0.3 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.0.0.3
TCP window size:  128 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.3 port 5001 connected with 10.0.0.2 port 33244
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0- 1.0 sec  1.14 GBytes  9.82 Gbits/sec
[  4]  1.0- 2.0 sec  1.14 GBytes  9.83 Gbits/sec
[  4]  2.0- 3.0 sec  1.14 GBytes  9.83 Gbits/sec
[  4]  3.0- 4.0 sec  1.14 GBytes  9.83 Gbits/sec
[  4]  4.0- 5.0 sec  1.14 GBytes  9.83 Gbits/sec
[  4]  5.0- 6.0 sec  1.14 GBytes  9.84 Gbits/sec
[  4]  6.0- 7.0 sec  1.14 GBytes  9.83 Gbits/sec
[  4]  7.0- 8.0 sec  1.14 GBytes  9.82 Gbits/sec
[  4]  8.0- 9.0 sec  1.14 GBytes  9.82 Gbits/sec
[  4]  9.0-10.0 sec  1.14 GBytes  9.82 Gbits/sec
[  4]  0.0-10.0 sec  11.4 GBytes  9.83 Gbits/sec

Client:

root@9fe5749176e4:/usr/local/apache2# iperf -c 10.0.0.3 -i1 -t10
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size:  325 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 33244 connected with 10.0.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  1.15 GBytes  9.84 Gbits/sec
[  3]  1.0- 2.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  2.0- 3.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  3.0- 4.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  4.0- 5.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  5.0- 6.0 sec  1.15 GBytes  9.84 Gbits/sec
[  3]  6.0- 7.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  7.0- 8.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  8.0- 9.0 sec  1.14 GBytes  9.81 Gbits/sec
[  3]  9.0-10.0 sec  1.14 GBytes  9.83 Gbits/sec
[  3]  0.0-10.0 sec  11.4 GBytes  9.83 Gbits/sec

We are happy to see that, the iperf result inside the containers matches the result on the physical hosts, which proves that ACA + OVS doesn't bring much overhead to our connection.