BOSH vs WebSockets for emulation of TCP socket - pvgupta24/Jitsi-Meet-Concepts GitHub Wiki

Introduction

The Transmission Control Protocol (TCP) which is a byte stream oriented protocol is the most frequently used transport layer protocol for establishing a network connection between 2 entities.

As stated in XEP-0124,

The nature of the device or network can prevent an application from maintaining a long-lived TCP connection to a server or peer.

Therefore there was a need for an alternative method of connection to emulate the behavior of a long-lived TCP connection using a sequenced series of requests and responses that are exchanged over short-lived connections. The appropriate request-response semantics are widely available via HTTP

BOSH

Bidirectional-streams Over Synchronous HTTP (BOSH) is mainly used to establish XMPP connections to support the usage of XMPP servers.

BOSH is designed to transport any data efficiently with minimal latency in both directions. It is more bandwidth-efficient and responsive than most other bidirectional HTTP-based transport protocols used for both push and pull semantics. BOSH achieves this efficiency and low latency by using long polling with multiple synchronous HTTP request/response pairs.

It allows for real time communication between a browser (client) and a server. The client keeps the connection with the server open as long as it has no data to send. When data is available, the server sends it over the open HTTP connection and closes the connection itself. This reduces the number of requests, as the browser is not continuously polling the server.

Session attachment and resumption Requires "connection manager"

WebSockets

WebSockets is the HTML5 standard which aims to replace BOSH and allows to open an interactive communication session between the user’s browser and a server. This API allows us to send messages to a server and receive event-driven responses, without the need to continuously poll the server for a reply. Websockets are supported by most of the browsers today.

It works well with the internet as it can bypass most firewall checks and web proxies because it reuses previously opened HTTP/HTTPS connections, instead of using a new connection on a new port. As BOSH is based on the HTTP long polling technique, it suffers from high transport overhead compared to XMPP's native binding to TCP as mentioned in RFC7395.

Comparision

Latency

The BOSH client and connection manager needs to re-establish connections after every packet transmission and request timeouts. This leads to high latency and latency jitter.

As low jitter is one of the key factors in real time applications than average latency, WebSocket connections has the advantage due to similar latency and jitter compared to raw TCP connections. And even under ideal conditions, the client-to-server latency of BOSH communication (and therefore round-trip) will always be higher than WebSockets. BOSH still has to abide by HTTP request-response semantics. Therefore, even though HTTP streaming enables multiple responses per request by splitting a single response into multiple parts), each of client message is a new request.

Packet overhead

In WebSockets there are two bytes of framing overhead for small messages. In BOSH, every message has HTTP request and response headers which easily 180+ bytes for each round-trip. In addition, each message is wrapped in XML with session attributes.

Complexity

Complex JavaScript library is required to implement the BOSH semantics which leads to increase in latency and jitter compared to a native/browser implementation.

References