Switch - UofG-netlab/BPFabric GitHub Wiki

Design

As a traditional switch the BPFabric switch is separated into 2 separate planes, the control plane and the data plane.

The control plane establishes the connection to the BPFabric controller and through the use of the Southbound API allows the controller to add, update and remove functions to the switch's dataplane as well as inspect and update the switch's forwarding tables. In BPFabric the control plane implementation is called the agent.

The data plane receives packets from the switch's ports and execute each function in the pipeline until a forwarding decision has been made. Each function in the pipeline can query and update it's lookup table(s). The forwarding decision will result in the packet to either be sent to a specific port, flooded to all other ports, sent to the controller or dropped.

Implementations

BPFabric currently provides 2 different switch implementation for experimentation. Both switch are fully interchangeable and provide the exact same feature set.

The SoftSwitch is a user-space switch that relies on AF_PACKET to receive and send raw frames between interfaces. This implementation is lightweight with few dependencies and can easily extended and can be quickly modified and debugged. It can work easily with any network interfaces supporting AF_PACKET. This implementation is however not able to achieve very high throughput especially for small packet sizes.

The DPDKSwitch is also a user-space switch that relies on the DPDK framework to receive and send raw frames between interfaces. Using DPDK, vfio-pci drivers and hugepages the packets can be very quickly processed. This implementation can perform at much higher throughput that the SoftSwitch but requires compatible hardware.

Both switch implementations are fundamentally very similar. The agent, providing the control plane is identical, and both have a data plane that receives (ingress) packets from a RX ring per port and send (egress) packet to the output port TX ring. The main difference is that in the softswitch implementation the RX and TX rings are filled and emptied by the kernel through the AF_PACKET socket, while on DPDK the RX and TX rings are filled and emptied by the DPDK framework.

Every port has 2 rings a RX ring on which packets are received that need to pass through the BPFabric pipeline and a TX ring on which packets are queued before they are sent.

The dataplane iterate over every RX ring checking if new packets must be processed. If so, some metadata information including the source port and timestamp is prepended to the packet. Then the packet is passed through each stage of the pipeline until a forwarding decision is made. Once a forwarding decision is made and an output port is selected the packet is queued on the port's TX ring.

A forwarding decisions is made when a pipeline stage returns a PORT, FLOOD, DROP or CONTROLLER. If no forwarding decision is made the packet is passed to the next stage in the pipeline. If the pipeline doesn't result in a forwarding decision the packet is dropped.

Agent

The agent establishes a connection to the controller, if the connection cannot be established it will pause for 5 seconds and retry.

Once the connection established the agent send a Hello message to the controller indicating which version of BPFabric is running (current can only be 1) and the datapath identifier (dpid) of this switch. The controller will reply with Hello including the version it supports and a dpid always at 0. At this point the connection between the switch and the controller is established.

Once the hello handshake is done, the agent will listen for incoming packets from the controller.

  • FunctionAddRequest: The controller is requesting for a new function to be installed. The agent checks that the index in the pipeline to which the function should be installed is valid. If valid, the agent uninstalls any function at this stage including freeing the lookup tables allocated. The new function is installed at this stage in the pipeline and the tables for this function allocated. On x86 machine the function is just in time compiled from eBPF to x86. Once installed the agent replies to the controller with a FunctionAddReply whether the function was successfully installed or not.
  • FunctionRemoveRequest: The controller is requesting for a function to be removed. The agent first checks if the stage requested is valid. If valid, the function is removed from the pipeline and its lookup tables de-allocated. Once removed the agent replies to the controller whether the function was successfully removed or not.
  • FunctionListRequest: The controller is requesting the list of functions installed on this switch. The agent iterates over every installed function in the pipeline and creates a FunctionListEntry for each stage including the name of the stage, it's index in the pipeline and the number of packets that have passed through this stage. All the entries are sent in a FunctionListReply to the controller.
  • TablesListRequest: The controller is requesting the list of lookup tables defined by a function. It first validates that the requesting function exists. If so, it creates a TableDefinition per table including the table type, key size, value size and maximum number of entries in the table. The agent replies with the list of table definitions if the function specified was correct.
  • TableListRequest: The controller is requesting to list the content of a specific lookup table for the provided function. It first checks that the function and lookup table are valid. If so, it creates a TableListReply with the status of the request, the table definition of the table requested, the number of items in the table and the content of the table. The format of the entries is dependent on the type of the table.
  • TableEntryGetRequest: The controller is requesting for a specific entry in a table of a function. First it check that the function, table and entry do exist. It returns to the controller a TableEntryGetReply with the status of the lookup as well as the value in the table of this entry if it exists.
  • TableEntryInsertRequest: The controller is requesting a new entry to be inserted into a specific table of the provided function. It checks that the function and table are correct. If so, it add a new entry in the table with the key and value provided. If the entry already exists it's overridden, if it's missing it's inserted. The agent replies with a TableEntryInsertReply whether the insertion was successful.
  • TableEntryDeleteRequest: The controller is requesting an entry into a function's table to be removed. It checks that the function and table are valid. If so it delete the entry from the table and freeing any resources necessary. It replies to the controller whether the deletion was successful or not.
  • PacketOut: The controller is requesting for a packet to be sent on a specific port. The request contains the raw packet as well as the output port to send the packet to. The agent passes this packet to the dataplane for it to be enqueued on the port's output queue.

The agent can also send messages to the controller based on events triggered by the dataplane's functions or if the forwarding decision is CONTROLLER.

  • CONTROLLER: The forwarding decisions CONTROLLER is used when a function decides that the decision should be made by the controller. At this point instead of forwarding the packet to another dataplane port, the packet is instead sent to the controller. The controller can parse the packet and insert table entries or install new functions. a PacketIn message is sent from the agent to the controller containing the length of the packet and the content of the packet.
  • bpf_notify: a function is asking the agent to notify the controller that an event occurred. This is not a forwarding decision, but can be used to notify the controller of some event of interest. A Notify packet is sent from the agent to the controller containing a identifier for the event and the data attached with this event.

SoftSwitch

For the dataplane the softswitch starts by creating a AF_PACKET socket for each interface provided as an argument. Each socket is then allocated a RX and TX ring buffers to store received packets and packets queued from transmission. Once the rings allocated and the socket configured the network interfaces are initialised.

The softswitch then starts its control plane by starting the agent as described above. It provides to the agent a function to queue a packet on an output port in case of PacketOut.

Once the port initialised and the control plane running, the softswitch iterates over every port's Rx rings and check if packets are available to be processed. If a packet is available it prepends the metadata information and execute the function pipeline for this packet. Once the pipeline is done and a forwarding decision has been made the packet is either dropped, queued on to the port(s)' TX queue for transmission or sent to the control plane.

DPDK Switch

For the dataplane the DPDK switch starts by initialising a mbuf pool which will store the incoming network frames. Then for each port that is enabled it creates the RX and TX queues. At this point the interfaces are initialised and the RX and TX queues allocated.

The DPDK switch then starts its control plane by starting the agent as described above. It provides to the agent a function to queue a packet on an output port in case of PacketOut.

At this point it creates multiple threads of execution each tied to a logical processing core of the machine running the switch based on the EAL arguments provided. Each processing thread is responsible for processing the data coming from one port.

For each thread and hence each port, it first checks if the TX queue should be sent. If so it sends any queued packet. It then checks if the receive queue contains any packet to be processed. If so, the packet is prependent with metadata information and passed through the pipeline. Once the pipeline is done and a forwarding decision has been made the packet is either dropped, queued on to the port(s)' TX queue for transmission or sent to the control plane.