DTLS 1.2 connection ID based load balancer - eclipse-californium/californium GitHub Wiki

Note: Built-In-DTLS-CID-Load-Balancer-Cluster is the current approach for load-balancing based on CID.

DTLS 1.2 load-balancer

Using DTLS 1.2 in the internet comes with pain caused by NATs and so it's strongly recommended, to use it.

The DTLS 1.2 connection ID offers even more: the base to build a "stateless load-balancer". In a cluster the task to associate the negotiated keys is extended by using the node in the cluster, which has been used for the handshake. Simon Bernard had that vision already in 2017 CID for nginx and lvs mailing list, but neither the specification nor a implementation for that connection ID was in place. That has changed now. Since californium 2.0.0-M14 a connection ID implementation is available, and with the "connection id generator" of 2.0.0-M15 , all pieces are ready to build such a cluster. All, OK, some may need improvements :-).

The DTLS 1.2 CID load-balancer demonstration setup

It requires modern linux (I use ubuntu 18.04 LTS) installed on 3 boxes, one for the load-balancer and two for the nodes. And additional a client.

Prepare the nodes

Both nodes requires a installed java and the cf-plugtest-server. If you consider to set it up as systemd service, please read Californium as old style systemd service . To be used as node, the records with the responses must be send back through the load-balancer box. That's achieved by setting up that load-balancer box as gateway.

#! /bin/sh
 
ip route del default
# adapt this according your ip subnet setup
ip route del 192.168.178.0/24
 
# adapt this according your load-balancer ip and interface setup
ip route add 192.168.178.118/32 dev wlan0
# adapt this according your load-balancer ip
ip route add default via 192.168.178.118
# add routes to other nodes in the subnet, if required

Download script

That assumes, that your node is in subnet "192.168.178.0/24", and your load-balancer runs on "192.168.178.118" and is reachable by the "wlan0" device. This setup routes now all IPv4 traffic via the load-balancer, even the traffic in the subnet before. If you want to reach other nodes in the subnet directly, add the routes to that node as well. I chose that setup to be able to test the cluster also from a client node in the subnet. If that is not required and all clients are not in the subnet, other routes for the local interface of the nodes are possible.

The cf-plugtest-server must be configure to use the node-id in the connection id. Therefore adjust the values

DTLS_CONNECTION_ID_LENGTH=6
DTLS_CONNECTION_ID_NODE=1

in the "CaliforniumPlugtest.properties" of the first node. This configuration file is created the first time the application is started. It's only read on startup, so restart the application after adjust the values. On the second node, use

DTLS_CONNECTION_ID_LENGTH=6
DTLS_CONNECTION_ID_NODE=2

Prepare the load-balancer

Run the script with "sudo".

#! /bin/sh

if [ -z "$1" ]  ; then
    INTERFACE_IN=
else 
    INTERFACE_IN="-i $1"
fi

LOADBA_IP=192.168.178.118
NODE_1_IP=192.168.178.123
NODE_2_IP=192.168.178.124

# configure the ipvs UDP timeout to 15s
ipvsadm --set 0 0 15

echo "Remove IP LVS for port 5683 and 5684"
ipvsadm -D -u ${LOADBA_IP}:5683
ipvsadm -D -u ${LOADBA_IP}:5684
 
echo "Create IP LVS 5683 (coap without DTLS)"
ipvsadm -A -u ${LOADBA_IP}:5683 -s rr
ipvsadm -a -u ${LOADBA_IP}:5683 -r ${NODE_1_IP} -m
ipvsadm -a -u ${LOADBA_IP}:5683 -r ${NODE_2_IP} -m
 
echo "Create IP LVS 5684 (coaps with DTLS)"
ipvsadm -A -u ${LOADBA_IP}:5684 -s rr
ipvsadm -a -u ${LOADBA_IP}:5684 -r ${NODE_1_IP} -m
ipvsadm -a -u ${LOADBA_IP}:5684 -r ${NODE_2_IP} -m
 
# NAT LB using DTLS 1.2 connection ID
# IPv4 layer, RFC 791, minimum 20 bytes, use IHL to calculate the effective size
# IPv6 layer, RFC 2460, "next header" => currently not supported :-)  
# UDP layer, RFC 768, 8 bytes
# DTLS 1.2 header first 3 bytes "19 fe fd" for TLS12_CID (25 / 0x19)
# DTLS 1.2 header with cid, offset 3 + 2 + 6 = 11 
# (UDP header / 8 bytes + offset / 11 bytes - u32 shift / 3 bytes => 16)
echo "Create DTLS_NAT"
iptables -t nat -N DTLS_NAT
echo "Prepare DTLS_NAT"
iptables -t nat -F DTLS_NAT
# rule cid == 1 => NODE_1_IP
iptables -t nat -A DTLS_NAT -m u32 --u32 "0>>22&0x3C@ 7&0xFFFFFF=0x19FEFD && 0>>22&0x3C@ 16&0xFF=1" -j DNAT --to-destination ${NODE_1_IP}
# rule cid == 2 => NODE_2_IP
iptables -t nat -A DTLS_NAT -m u32 --u32 "0>>22&0x3C@ 7&0xFFFFFF=0x19FEFD && 0>>22&0x3C@ 16&0xFF=2" -j DNAT --to-destination ${NODE_2_IP}
 
echo "Remove PREROUTING to DTLS NAT $1"
iptables -t nat -D PREROUTING ${INTERFACE_IN} -p udp --dport 5684 -j DTLS_NAT
echo "Forward PREROUTING to DTLS NAT $1"
iptables -t nat -A PREROUTING ${INTERFACE_IN} -p udp --dport 5684 -j DTLS_NAT
 
#enable ipv4 forwarding
echo "1" > /proc/sys/net/ipv4/ip_forward

# set DNAT timeout to 15s
echo "15" > /proc/sys/net/netfilter/nf_conntrack_udp_timeout
echo "15" > /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream

#some cmd to list the states and see, what's going on
#statistic:
#  iptables -t nat -L -nvx
#reset statistics: 
#  iptables -t nat -Z
#dump connection tracking of iptables: 
#  conntrack -L
#dump connections in ipvs: 
#  ipvsadm -L -n

Download script

(The script contains additional firewall rules to protect the DTLS 1.2 endpoint at port 5684 from foreign traffic. That's not required for the load-balancer therefore it's only in the download script.)

The script assumes, that the load-balancer is on box "192.168.178.118", and the nodes on "192.168.178.123" (DTLS_CONNECTION_ID_NODE=1) and "192.168.178.124" (DTLS_CONNECTION_ID_NODE=2).

The script uses both, the ipvs (ipvsadm) and the iptables dnat. That may surprise but both are used for different periods. The ipvs is used for the handshake itself, that doesn't use cid for all handshake-records. The note which negotiates the handshake will generate a cid with a fixed prefix (0x01|0x02 ....). If in a quiet period the ipvs route times out and a TLS12_CID record is received afterwards, that prefix (0x01|0x02) is used to establish a (short time) DNAT route to the node again, which negotiated the handshake. The routes in both periods are not kept for long term, both can time out in rather short time, just to cover your maximum response time to route a message back. The ipvs UDP timeout, used during the handshake, could be configured with

ipvsadm --set 0 0 15

to 15s. The iptables dnat timeout used for the TLS12_CID traffic is controlled by

/proc/sys/net/netfilter/nf_conntrack_udp_timeout
/proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream

and could be configured by writing the timeout to them. It's important not to use longer timeouts. Other NATs do not only change the IP endpoint of a peer, sometimes they reuse such an IP endpoint for an other peer. With a longer timeout the cluster DNAT routes may be "accidentally" used for such a peers and may result in forwarding the record to the wrong node.

After the expiration of these timeouts, no peer specific state is stored in the load-balancer, only the node specific rules to DNAT the TLS12_CID record according the contained CID.

Plain coap is only load-balance using the ipvs. After the connection state expires, the client may be forwarded to a different node. So it's only sticky for that timeout.

Encrypted coaps without "DTLS connection ID" is also only load-balance using the ipvs. After the connection state expires, the client may be forwarded to a different node. So it's only sticky for that timeout and therefore requires frequently new handshakes.

Use the DTLS 1.2 CID load-balancer demonstration

cf-browser in californium.tools

The cf-browser in californium.tools - branch "main" is updated to use DTLS with connection ID since version 2.0.x. californium.tools - cf-browser

The 3.5.0 build of the tools is deployed in the eclipse maven repository and could be downloaded cf-browser-3.5.0.jar

Start that using java -jar cf-browser-3.5.0.jar.

If GET for the root node is executed, the description contains the node id. Other request are also valid. If you pause after a DTLS request for more than the timeout, you may check, that neither the ipvs has connections left nor the DNAT.

sudo ipvsadm -L -n
sudo iptables -t nat -L -nvx

The hits on the DNAT rules shows, that the load-balancer base on the connection ID works.

Using openjdk with openjfx seems to be sometimes broken. If you use openjdk-8 ensure, that openjfx is also a java-8 version. May be this guide helps to fix it openjfx-8.

sudo apt install openjfx=8u161-b12-1ubuntu2 libopenjfx-jni=8u161-b12-1ubuntu2 libopenjfx-java=8u161-b12-1ubuntu2
sudo apt-mark hold openjfx libopenjfx-jni libopenjfx-java

If your using an other java version or the guide didn't work for you, try an other jdk distributions with jfx.

For openjdk 11 it's required to install openjfx separately. To start the used modules javafx.controls and javafx.fxml needs to be provided. For Ubuntu 18.04 that is achieved by

java --module-path /usr/share/openjfx/lib --add-modules javafx.controls,javafx.fxml -jar cf-browser-3.5.0.jar