OVS - hpaluch/hpaluch.github.io GitHub Wiki

OVS

OVS is Open vSwitch - Open (Source) Software Switch. Homepage is here: https://docs.openvswitch.org/en/latest/intro/what-is-ovs/

You can use OVS at many levels:

  • as simple learning switch (like Linux Bridge)

  • as locally managed switch (configuration stored in database called OVS-DB) with various useful features:

    • VLAN support
    • bonding (grouping more LAN adapters together for higher speed and/or redundancy)
    • QoS
    • various tunnels Geneve, GRE, VXLAN,...
  • or even more: OVS managed centrally using controller software and OpenFlow protocol. This is used for advanced solutions like OVN or Faucet

My first OVS tutorial

Here is my first tutorial how to use OVS. We will create 3 logical switches (emulating isolated networks) with Layer2 Geneve tunnel over dc-link:

+---------+              +---------+
| dc-west |              | dc-east |
+---------+              +---------+
         \                /
          \             /
            +---------+
            | dc-link |
            +---------+

Where:

  • dc-west - "Data Center West" - logical OVS switch running 2 VMs (actually using Network namespaces to emulate them):

    • vm1-west - 192.168.100.1
    • vm2-west - 192.168.100.2
  • dc-east - "Data Center East" - logical OVS switch running 2 VMs (actually using Network namespaces to emulate them):

    • vm3-east - 192.168.100.3
    • vm4-east - 192.168.100.4
  • NOTE: We will use Layer 2 (Ethernet MAC addresses) tunnel to connect these two localities, so they use same IP network (192.168.100.0/24) ! From VMs perspective all 4 VMs are on same IP network - they are not aware that it is actually tunneled to other DC...

  • dc-link - logical switch emulating Internet link between those 2 "data centers". It will use Geneve or other tunnel to connect dc-east and dc-west networks (with same IP network!) dc-link will have 2 IPs:

    • link1-west - 10.200.200.1
    • link2-east - 10.200.200.2
    • these will be tunnel endpoints

Requirements for tutorial:

  • single Debian 11 host (or even VM is OK). I will call it deb11-ovs
  • all traffic will use only local (internal) OVS switches. So it should be safe to try it even on remote machine, however you must ensure that my IP ranges or device names does not collide with your setup.

Installing OVS:

  • all commands to be run as root (Debian 10+ somehow removed sudo from default installation, which is now add-on package)

  • ensure that your system is up-to-date:

    apt-get update
    apt-get dist-upgrade
    # reboot if system components were updated
    
  • now install OVS packages:

    apt-get install openvswitch-switch tcpdump
    # if you like ifconfig, route and friends also install:
    apt-get install net-tools
    

Now create all 3 bridges (or actually "OVS Switches"

  • create script 10_setup_switches.sh with contents:

    #!/bin/bash
    set -euo pipefail
    # Create OVS Switches - each switch simulate one "datacenter" network
    for i in dc-west dc-east dc-link 
    do
     set -x
     ovs-vsctl add-br $i
     set +x
    done
    echo "Listing switches: "
    ovs-vsctl list-br
    echo "OK: dumping OVS configuration"
    ovs-vsctl show
    exit 0
    
  • grant executable permissions using chmod +x 10_setup_switches.sh and run it using ./10_setup_switches.sh

  • it should produce output like:

    ovs-vsctl add-br dc-west
    + set +x
    + ovs-vsctl add-br dc-east
    + set +x
    + ovs-vsctl add-br dc-link
    + set +x
    Listing switches: 
    dc-east
    dc-link
    dc-west
    OK: dumping OVS configuration
    56ae4467-60fe-4819-a83e-714bfa59a74b
        Bridge dc-west
            Port dc-west
                Interface dc-west
                    type: internal
        Bridge dc-link
            Port dc-link
                Interface dc-link
                    type: internal
        Bridge dc-east
            Port dc-east
                Interface dc-east
                    type: internal
        ovs_version: "2.15.0"
    

Now we will prepare 2 devices for tunneling called tep0 and tep1 (Tunnel Endpoint):

  • create script 20-setup-tunnel-dev.sh with contents:

    #!/bin/bash
    set -euo pipefail
    
    # logical switch used for tunnel
    dc=dc-link
    
    # it is common to use "TEPx" device name for "Tunnel Endpoints"
    # these device wil be used to tunnel traffic from our "dc-west" to "dc-east" 
    
    for eth in tep0 tep1
    do
    	set -x
    	ovs-vsctl list-ports $dc | fgrep -wo "$eth" || ovs-vsctl add-port $dc $eth
    	ovs-vsctl set interface $eth type=internal
    	set +x
    done
    exit 0
    
  • mark it executable and run it. NOTE: on first invocation it throwed error, but it was OK on second invocation.

Before reboot we should prepare static IP configuration for those tunnel devices:

  • append to /etc/network/interfaces:

    # OVS tutorial - tunnel endpoints for switch dc-link
    auto tep0
    iface tep0 inet static
    	address 10.200.200.1/24
    
    auto tep1
    iface tep1 inet static
    	address 10.200.200.2/24
    
  • try to enable those devices manually using:

    /sbin/ifup tep0
    /sbin/ifup tep1
    
  • verify that they are properly configured using:

    ip -br -4 a | egrep -w '^tep[01]'
    tep0             UNKNOWN        10.200.200.1/24 
    tep1             UNKNOWN        10.200.200.2/24 
    
  • NOTE: it is OK that state is UNKNOWN - it is normal for internal device (including lo for loopback)

Now reboot system using init 6 and verify with same commands that bridges still exist:

init 6
# after reboot try:
p -br -4 a
lo               UNKNOWN        127.0.0.1/8          ### Loopback
eth0             UP             192.168.100.173/24   ### physical network interface of deb11-ovs
tep0             UNKNOWN        10.200.200.1/24      ### tunnel endpoint for dc-west
tep1             UNKNOWN        10.200.200.2/24      ### tunnel endpoint for dc-east

Now we will setup 4 VMs (as namespaces):

  • create script create_all_vms.sh with contents:

    #!/bin/bash
    set -eu
    
    create_vm() {
    	[ $# -eq 2 ] || {
    		echo "ERROR: Invalid number of arguments $# != 2" >&2
    		exit 1
    	}
    	# arguments
    	local region="$1" # east|west
    	local num="$2"     # vm number 1..4
    	# computed variables
    	local dc="dc-$region"
    	local ns="vm$num-$region"
    	local vmip=192.168.100.$num
    	local eth="vm${num}eth"
    
    	set -x
    	# Add Port (internal network device) to OVS Switch
    	ovs-vsctl list-ports $dc | fgrep -wo "$eth" || ovs-vsctl add-port $dc $eth
    	ovs-vsctl set interface $eth type=internal
    	# create namespace if it does not exist yet
    	ip netns | fgrep -wo "$ns" || ip netns add $ns
    
    	# configure and/or replace loopback
    	ip netns exec $ns ip a show dev lo
            ip netns exec $ns ip a replace 127.0.0.1/8 dev lo
    	ip netns exec $ns ip link set lo up
    	# first check if our LAN device already exists in $ns namespace
    	# if not, move it to Namespace
    	ip netns exec $ns ip link show $eth || ip link set dev $eth netns $ns
    	# configure IP Address and netmask
    	ip netns exec $ns ip addr replace $vmip/24 dev $eth
    	ip netns exec $ns ip link set $eth up
    
    	# dump current values
    	ip netns exec $ns ip -br -4 l
    	ip netns exec $ns ip -br -4 a
    	set +x
    
    }
    
    create_vm west 1
    create_vm west 2
    create_vm east 3
    create_vm east 4
    exit 0
    
  • mark it executable and run it

  • to see Interface configuration of all namespaces create script dump_all_vms.sh with contents:

    #!/bin/bash
    set -eu
    
    for ns in `ip netns | awk '{print $1}' | sort`
    do
    	echo "Dumping Namespace '$ns':"
    	ip netns exec $ns ip -br -4 l | sed 's/^/  /'
    	ip netns exec $ns ip -br -4 a | sed 's/^/  /'
      ip netns exec $ns ip r | sed 's/^/  /'
      ip netns exec $ns ip n | sed 's/^/  /'
    done
    exit 0
    
  • mark it executable and run. Here is expected output:

    Dumping Namespace 'vm1-west':
      lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
      vm1eth           UNKNOWN        ea:1e:e8:88:a8:fe <BROADCAST,MULTICAST,UP,LOWER_UP> 
      lo               UNKNOWN        127.0.0.1/8 
      vm1eth           UNKNOWN        192.168.100.1/24 
    Dumping Namespace 'vm2-west':
      lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
      vm2eth           UNKNOWN        2a:db:dd:b4:3a:83 <BROADCAST,MULTICAST,UP,LOWER_UP> 
      lo               UNKNOWN        127.0.0.1/8 
      vm2eth           UNKNOWN        192.168.100.2/24 
    Dumping Namespace 'vm3-east':
      lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
      vm3eth           UNKNOWN        9e:4f:72:b2:33:33 <BROADCAST,MULTICAST,UP,LOWER_UP> 
      lo               UNKNOWN        127.0.0.1/8 
      vm3eth           UNKNOWN        192.168.100.3/24 
    Dumping Namespace 'vm4-east':
      lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
      vm4eth           UNKNOWN        4e:72:d9:e9:19:19 <BROADCAST,MULTICAST,UP,LOWER_UP> 
      lo               UNKNOWN        127.0.0.1/8 
      vm4eth           UNKNOWN        192.168.100.4/24 
    
  • please note that it is pretty normal that link status is UNKNOWN for internal interfaces including Loopback.

Now we should test with ping in namespace that:

  • only VMs in same region are reachable. So these IPs should be reachable:

    • VM: vm1-west <-> vm2-west
    • VM: vm3-east <-> vm4-east
  • however if you try to ping from east to west (or back) it should be unreachable.

Example of pings that should work:

# OK: ping from vm1-west to vm2-west
ip netns  exec vm1-west ping -c 2  192.168.100.2

But this will not work (yet):

# Should fail: ping from vm1-west to vm3-east:
ip netns  exec vm1-west ping -c 2  192.168.100.3

Setting up Tunnel

  • Now moment of truth!

  • we will setup Geneve tunnel

  • create script 30-setup-tunnel.sh with contents:

    #!/bin/bash
    set -euo pipefail
    
    for p in gre0 gre1
    do
    	ovs-vsctl del-port $p || true
    done
    ovs-vsctl add-port dc-west gre0 -- set interface gre0 type=geneve \
           	options:remote_ip=10.200.200.2 options:local_ip=10.200.200.1
    ovs-vsctl add-port dc-east gre1 -- set interface gre1 type=geneve \
           	options:remote_ip=10.200.200.1 options:local_ip=10.200.200.2
    exit 0
    
  • NOTE: we must use local_ip for tunnel endpoints otherwise kernel will use loopback and packets will miss OVS - so the tunnel will not work. When you have using OVS on real PCs this is usually not problem because it is not possible to route it via loopback..

  • NOTE: I originally planed to use GRE tunnel but later switches to Geneve, thus devices are still named gre0 and gre1. However it should otherwise work well.

  • make it executable and run-it.

  • finally ping between dc-east and dc-west should work, for example:

    # ping from vm1-west to vm3-east
    ip netns  exec vm1-west ping -c 2  192.168.100.3
    # ping from vm1-west to vm4-east
    ip netns  exec vm1-west ping -c 2  192.168.100.4
    
  • you can also monitor traffic (surprisingly still listening on loopback):

    tcpdump -e -n -p -i lo
    
    tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
    listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
    # request from tep0 to tep1
    16:44:29.376860 00:00:00:00:00:00 > 00:00:00:00:00:00, \
      ethertype IPv4 (0x0800), length 148: 10.200.200.1.36185 > 10.200.200.2.6081: \
      Geneve, Flags [none], vni 0x0, proto TEB (0x6558): ea:1e:e8:88:a8:fe > 4e:72:d9:e9:19:19, \
      ethertype IPv4 (0x0800), length 98: 192.168.100.1 > 192.168.100.4: \
      ICMP echo request, id 29820, seq 1, length 64
    # response from tep1 to tep0
    16:44:29.377164 00:00:00:00:00:00 > 00:00:00:00:00:00, \
      ethertype IPv4 (0x0800), length 148: 10.200.200.2.36185 > 10.200.200.1.6081: \
      Geneve, Flags [none], vni 0x0, proto TEB (0x6558): 4e:72:d9:e9:19:19 > ea:1e:e8:88:a8:fe, \
      ethertype IPv4 (0x0800), length 98: 192.168.100.4 > 192.168.100.1: \
      ICMP echo reply, id 29820, seq 1, length 64
    

TODO: MTU issues

Please see really nice tutorial on MTU issues and how to solve them:

More tips:

  • to see statistics of traffic from/to various ports of OVS switch you can use OpenFlow command (ovs-ofctl), for example to dump statistics from switch dc-link use:

    ovs-ofctl dump-ports dc-link
    
    OFPST_PORT reply (xid=0x2): 3 ports
      port tep1: rx pkts=12, bytes=748, drop=0, errs=0, frame=0, over=0, crc=0
               tx pkts=13, bytes=1006, drop=0, errs=0, coll=0
      port tep0: rx pkts=13, bytes=824, drop=0, errs=0, frame=0, over=0, crc=0
               tx pkts=13, bytes=1006, drop=0, errs=0, coll=0
      port LOCAL: rx pkts=0, bytes=0, drop=25, errs=0, frame=0, over=0, crc=0
               tx pkts=0, bytes=0, drop=0, errs=0, coll=0
    
  • to see OVS DB content you can just less /var/lib/openvswitch/conf.db (it is basically log of metadata + one-line json changes

  • you can also try more civilized form:

    ovs-vsctl list Open_vSwitch
    

What to do after reboot:

  • verify that tep0 and tep1 are configured after reboot:

    ip -br -4 a
    
  • if they are missing (typically it happens after 2nd boot) you have to reactivate them using:

    /sbin/ifup tep0
    /sbin/ifup tep1
    ip -br -4 a
    
  • you also have to again re-create all "VMs" (network namespace configurations) using: ./create_all_vms.sh

  • now tunnels should again work, for example:

    ip netns  exec vm1-west ping -c 2  192.168.100.4
    

So it is end of my first OVS tutorial. It barely scratches surface, but it takes definitely lot of time to learn OVS...

Resources used for this tutorial (and many others I forgot):

NOTE: Many guides on Internet are actually based on those from Scott Lowe so you should definitely start with Scott's blog and compare it with those guides.