Debugging TCP Connection using iperf3. - futurewei-cloud/alcor-control-agent GitHub Wiki
Purpose
The purpose of this page is to record the experience we got from trying to test performance of ACA + OVS, the issue we met, and how did we solve it.
Set Up and workflow
ifconfig
and ovs-vsctl show
output after initial set up:
ifconfig
on the physical host:
# ifconfig
enp2s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.20.92 netmask 255.255.255.0 broadcast 0.0.0.0
ether aabbcc txqueuelen 1000 (Ethernet)
RX packets 4880723674 bytes 898298280737 (898.2 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 40884940 bytes 111210722143 (111.2 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
f6f1bf7934004_l: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
ether 9a:8c:c4:ae:6b:c8 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan_sys_4789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65000
ether abcde txqueuelen 1000 (Ethernet)
RX packets 1922041 bytes 100097000 (100.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 15026611 bytes 41804450036 (41.8 GB)
TX errors 0 dropped 48 overruns 0 carrier 0 collisions 0
ifconfig
inside the container, where the port resides in:
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
ether 6c:dd:ee:00:00:02 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
ovs-vsctl
on the physical host:
# ovs-vsctl show
844fa782-f448-46b9-9663-bd9ad7495c9a
Bridge br-int
Port patch-tun
Interface patch-tun
type: patch
options: {peer=patch-int}
Port "f6f1bf7934004_l"
tag: 1
Interface "f6f1bf7934004_l"
Port br-int
Interface br-int
type: internal
Bridge br-tun
Port vxlan-generic
Interface vxlan-generic
type: vxlan
options: {df_default="true", egress_pkt_mark="0", in_key=flow, out_key=flow, remote_ip=flow}
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port br-tun
Interface br-tun
type: internal
ovs_version: "2.9.8"
Results of testing the basic ping
root@9fe5749176e4:/usr/local/apache2# ping -I 10.0.0.2 10.0.0.3
PING 10.0.0.3 (10.0.0.3) from 10.0.0.2 : 56(84) bytes of data.
64 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=0.185 ms
64 bytes from 10.0.0.3: icmp_seq=2 ttl=64 time=0.091 ms
64 bytes from 10.0.0.3: icmp_seq=3 ttl=64 time=0.091 ms
64 bytes from 10.0.0.3: icmp_seq=4 ttl=64 time=0.076 ms
^C
--- 10.0.0.3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 80ms
rtt min/avg/max/mdev = 0.076/0.110/0.185/0.045 ms
Ping works as expected.
Try to test TCP performance using iperf
Server side output:
root@7e77e40c37a7:/usr/local/apache2# iperf -s -B 10.0.0.3 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.0.0.3
TCP window size: 128 KByte (default)
----------------------------------------------------
Client side output:
root@9fe5749176e4:/usr/local/apache2# iperf -c 10.0.0.3 -i1 -t10
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 33214 connected with 10.0.0.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 107 KBytes 880 Kbits/sec
[ 3] 1.0- 2.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 2.0- 3.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 3.0- 4.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 4.0- 5.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 9.0-10.0 sec 0.00 Bytes 0.00 bits/sec
[ 3] 0.0-10.2 sec 107 KBytes 86.1 Kbits/sec
This looks weird, let's try to see what tcpdump
says:
# tcpdump -i f6f1bf7934004_l
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on f6f1bf7934004_l, link-type EN10MB (Ethernet), capture size 262144 bytes
^[[1;9D10:56:07.076404 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [S], seq 3849303330, win 64240, options [mss 1460,sackOK,TS val 3238083790 ecr 0,nop,wscale 7], length 0
10:56:07.077178 IP 10.0.0.3.5001 > 10.0.0.2.33216: Flags [S.], seq 2492407180, ack 3849303331, win 65160, options [mss 1460,sackOK,TS val 2886471229 ecr 3238083790,nop,wscale 7], length 0
10:56:07.077228 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], ack 1, win 502, options [nop,nop,TS val 3238083790 ecr 2886471229], length 0
10:56:07.077475 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [P.], seq 1:7241, ack 1, win 502, options [nop,nop,TS val 3238083791 ecr 2886471229], length 7240
10:56:07.077597 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [P.], seq 7241:14481, ack 1, win 502, options [nop,nop,TS val 3238083791 ecr 2886471229], length 7240
10:56:07.095190 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], seq 14481:15929, ack 1, win 502, options [nop,nop,TS val 3238083808 ecr 2886471229], length 1448
10:56:07.299199 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], seq 1:1449, ack 1, win 502, options [nop,nop,TS val 3238084012 ecr 2886471229], length 1448
10:56:20.123218 IP 10.0.0.2.33216 > 10.0.0.3.5001: Flags [.], seq 1:1449, ack 1, win 502, options [nop,nop,TS val 3238096836 ecr 2886471229], length 1448
From the Flags
of each line, we can see that the two ports are connected, but the content cannot be sent from port_1 to port_2.
Investigation and Solution
We spent some good time trying to figure this out. At the end, we found the solution here.
We made two changes to our existing set up:
- Turning of the TCP Segmentation Offload(TSO) functionality of the related interfaces using command
ethtool -k interface_name tso off
. You can check if an interface's TSO is on by using commandethtook -K interface_name
. - Adjusting the MTU value of the related interfaces. One of the reasons that the TCP packets couldn't go through is, as the MTUs of the virtual ports and the MTU of the physical interface were the same, the size of VXLAN header(50) and size of VLAN header(4) weren't taken into consideration. In our case, it is correct to set the MTUs in the way so that the MTU of the physical interface is (VXLAN header + VLAN header) bigger than the MTU of the virtual interface(s). As the physical interface's MTU is 1500, we set the MTU of the virtual ports to be (1500 - 50 - 4 = 1446), and we tested again:
ifconfig
on the physical host:
# ifconfig
enp2s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.20.92 netmask 255.255.255.0 broadcast 0.0.0.0
ether aabbc txqueuelen 1000 (Ethernet)
RX packets 4909278420 bytes 905092171754 (905.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 44394728 bytes 116699221738 (116.6 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
f6f1bf7934004_l: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1446
ether bbcde txqueuelen 1000 (Ethernet)
RX packets 79004 bytes 5086655124 (5.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 97847 bytes 6474018 (6.4 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 90096 bytes 7902529 (7.9 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 90096 bytes 7902529 (7.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan_sys_4789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65000
ether abcde txqueuelen 1000 (Ethernet)
RX packets 2019881 bytes 105200964 (105.2 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 18536083 bytes 47068328608 (47.0 GB)
TX errors 0 dropped 86 overruns 0 carrier 0 collisions 0
ifconfig
inside the container:
root@9fe5749176e4:/usr/local/apache2# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1446
inet 10.0.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
ether 6c:dd:ee:00:00:02 txqueuelen 1000 (Ethernet)
RX packets 172209 bytes 11388194 (10.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 154990 bytes 9907169316 (9.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
iperf results:
Server:
root@7e77e40c37a7:/usr/local/apache2# iperf -s -B 10.0.0.3 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.0.0.3
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 10.0.0.3 port 5001 connected with 10.0.0.2 port 33228
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 1.0 sec 453 MBytes 3.80 Gbits/sec
[ 4] 1.0- 2.0 sec 457 MBytes 3.84 Gbits/sec
[ 4] 2.0- 3.0 sec 459 MBytes 3.85 Gbits/sec
[ 4] 3.0- 4.0 sec 467 MBytes 3.91 Gbits/sec
[ 4] 4.0- 5.0 sec 461 MBytes 3.87 Gbits/sec
[ 4] 5.0- 6.0 sec 454 MBytes 3.81 Gbits/sec
[ 4] 6.0- 7.0 sec 454 MBytes 3.81 Gbits/sec
[ 4] 7.0- 8.0 sec 469 MBytes 3.94 Gbits/sec
[ 4] 8.0- 9.0 sec 456 MBytes 3.83 Gbits/sec
[ 4] 9.0-10.0 sec 459 MBytes 3.85 Gbits/sec
[ 4] 0.0-10.0 sec 4.48 GBytes 3.85 Gbits/sec
Client:
root@9fe5749176e4:/usr/local/apache2# iperf -c 10.0.0.3 -i1 -t10
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 33228 connected with 10.0.0.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 456 MBytes 3.83 Gbits/sec
[ 3] 1.0- 2.0 sec 458 MBytes 3.84 Gbits/sec
[ 3] 2.0- 3.0 sec 458 MBytes 3.84 Gbits/sec
[ 3] 3.0- 4.0 sec 466 MBytes 3.91 Gbits/sec
[ 3] 4.0- 5.0 sec 461 MBytes 3.87 Gbits/sec
[ 3] 5.0- 6.0 sec 455 MBytes 3.81 Gbits/sec
[ 3] 6.0- 7.0 sec 454 MBytes 3.81 Gbits/sec
[ 3] 7.0- 8.0 sec 469 MBytes 3.93 Gbits/sec
[ 3] 8.0- 9.0 sec 457 MBytes 3.84 Gbits/sec
[ 3] 9.0-10.0 sec 458 MBytes 3.84 Gbits/sec
[ 3] 0.0-10.0 sec 4.48 GBytes 3.85 Gbits/sec
After the adjustment, TCP connection between the two container ports using iperf
is successful.
One more question
We are really glad that the connection is successful, however, as this is a performance test, and we are using fibre interfaces with a speed of 10000Mb/s (10 Gb/s) on the two physical hosts, the iperf
speed between the containers is not what we're looking for. Below is the iperf
results between the two fibre physical interfaces:
Server:
# iperf -s -B 192.168.20.93 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 192.168.20.93
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 192.168.20.93 port 5001 connected with 192.168.20.92 port 50554
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 1.0 sec 1.08 GBytes 9.30 Gbits/sec
[ 4] 1.0- 2.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 4] 2.0- 3.0 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 3.0- 4.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 4] 4.0- 5.0 sec 1.09 GBytes 9.37 Gbits/sec
[ 4] 5.0- 6.0 sec 1.09 GBytes 9.39 Gbits/sec
[ 4] 6.0- 7.0 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 7.0- 8.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 4] 8.0- 9.0 sec 1.09 GBytes 9.36 Gbits/sec
[ 4] 9.0-10.0 sec 1.09 GBytes 9.37 Gbits/sec
[ 4] 0.0-10.6 sec 11.5 GBytes 9.36 Gbits/sec
Client:
iperf -c 192.168.20.93 -i1 -t30
------------------------------------------------------------
Client connecting to 192.168.20.93, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.20.92 port 50554 connected with 192.168.20.93 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1.09 GBytes 9.33 Gbits/sec
[ 3] 1.0- 2.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 3] 2.0- 3.0 sec 1.09 GBytes 9.36 Gbits/sec
[ 3] 3.0- 4.0 sec 1.09 GBytes 9.37 Gbits/sec
[ 3] 4.0- 5.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 3] 5.0- 6.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 3] 6.0- 7.0 sec 1.09 GBytes 9.36 Gbits/sec
[ 3] 7.0- 8.0 sec 1.09 GBytes 9.38 Gbits/sec
[ 3] 8.0- 9.0 sec 1.09 GBytes 9.36 Gbits/sec
[ 3] 9.0-10.0 sec 1.09 GBytes 9.37 Gbits/sec
^C[ 3] 0.0-10.6 sec 11.5 GBytes 9.37 Gbits/sec
The difference is big, and we were worried it was ACA + OVS that brought that much of overhead.
Is it overhead though?
After some investigation, we found out that the difference of the iperf
speeds wasn't a overhead brought by ACA + OVS, but a difference brought by the MTU values of the interfaces.
As mentioned above, we set the MTU of the physical interfaces to 1500, which is a default value. However, if we allow something called Jumbo Frame
on our physical interface by setting the MTU to 9000, and setting the virtual devices' MTU to 8946 (9000 - 50 - 4 = 8946), we are able to fully utilize the capability of these fibre devices, even using port inside the containers.
ifconfig
on physical host:
# ifconfig
enp2s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 192.168.20.92 netmask 255.255.255.0 broadcast 0.0.0.0
ether bbbbc txqueuelen 1000 (Ethernet)
RX packets 4923108139 bytes 908021707382 (908.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 56390522 bytes 134846312070 (134.8 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
f6f1bf7934004_l: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8946
ether bbcde txqueuelen 1000 (Ethernet)
RX packets 154990 bytes 9907169316 (9.9 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 172209 bytes 11388194 (11.3 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vxlan_sys_4789: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 65000
ether abcde txqueuelen 1000 (Ethernet)
RX packets 2094242 bytes 109074044 (109.0 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 21990981 bytes 52063482444 (52.0 GB)
TX errors 0 dropped 86 overruns 0 carrier 0 collisions 0
ifconfig
inside container:
root@9fe5749176e4:/usr/local/apache2# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 8946
inet 10.0.0.2 netmask 255.255.0.0 broadcast 0.0.0.0
ether 6c:dd:ee:00:00:02 txqueuelen 1000 (Ethernet)
RX packets 279501 bytes 18603666 (17.7 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 252482 bytes 15272089008 (14.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
iperf
result:
Server:
root@7e77e40c37a7:/usr/local/apache2# iperf -s -B 10.0.0.3 -i1
------------------------------------------------------------
Server listening on TCP port 5001
Binding to local address 10.0.0.3
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 10.0.0.3 port 5001 connected with 10.0.0.2 port 33244
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 1.0 sec 1.14 GBytes 9.82 Gbits/sec
[ 4] 1.0- 2.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 4] 2.0- 3.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 4] 3.0- 4.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 4] 4.0- 5.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 4] 5.0- 6.0 sec 1.14 GBytes 9.84 Gbits/sec
[ 4] 6.0- 7.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 4] 7.0- 8.0 sec 1.14 GBytes 9.82 Gbits/sec
[ 4] 8.0- 9.0 sec 1.14 GBytes 9.82 Gbits/sec
[ 4] 9.0-10.0 sec 1.14 GBytes 9.82 Gbits/sec
[ 4] 0.0-10.0 sec 11.4 GBytes 9.83 Gbits/sec
Client:
root@9fe5749176e4:/usr/local/apache2# iperf -c 10.0.0.3 -i1 -t10
------------------------------------------------------------
Client connecting to 10.0.0.3, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 33244 connected with 10.0.0.3 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 1.0 sec 1.15 GBytes 9.84 Gbits/sec
[ 3] 1.0- 2.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 2.0- 3.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 3.0- 4.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 4.0- 5.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 5.0- 6.0 sec 1.15 GBytes 9.84 Gbits/sec
[ 3] 6.0- 7.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 7.0- 8.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 8.0- 9.0 sec 1.14 GBytes 9.81 Gbits/sec
[ 3] 9.0-10.0 sec 1.14 GBytes 9.83 Gbits/sec
[ 3] 0.0-10.0 sec 11.4 GBytes 9.83 Gbits/sec
We are happy to see that, the iperf
result inside the containers matches the result on the physical hosts, which proves that ACA + OVS doesn't bring much overhead to our connection.