Understanding WebRTC - pvgupta24/Jitsi-Meet-Concepts GitHub Wiki

History of WebRTC

In the early 1990's, when the World Wide Web was first created, it was built on a page-centric model, i.e browsers navigated from one page to another to present new content and to update HTML pages. A new approach to web browsing began to develop around the year 2000, which became standardized as the XMLHttpRequest(XHR) API. This new XHR API allowed web developers to websites that didn't need to navigate to new pages to update content or user interface. It provided server-based web services to access structured data and snippets of data or other content.

Now the web is undergoing another transformation that enables web browsers to stream data directly to each other without the need for intermediary servers. This approach of communication between devices over the web is called Web Real Time Communication or WebRTC, in short. This type of network where two devices are connected directly without going through an intermediary server is called peer-to-peer network. Thus WebRTC uses peer-to-peer communication, eliminating the need for a third-party server.

The different stages of WebRTC connection

Although WebRTC uses peer-to-peer network, coordination using a server, mostly by a web server or signalling server, is required in the initial step. This enables the two devices to find one another, share contact details, negotiate a session and then, finally, establish the direct peer-to-peer streams of media that flows between them. There are five main stages in a WebRTC connection establishment. They are as follows.

1. Connect users

The first step is to connect two users. The simplest option is for both the users to visit the same website. This page can then identify each browser and connect both of them to a shared signalling server. This webpage uses a unique token to link communication between these two users. This unique token can be thought of as a room ID or conversation ID. For example, in case of Jitsi, first user selects a token, say room1 and sends this token to second user. This token is appended to the url https://meet.jit.si as https://meet.jit.si/room1. Now, when the two users connect to https://meet.jit.si/room1, the connection is initiated.

2. Start signals

Now that the two users have a shared token, they can exchange signalling messages to setup WebRTC connection. Signalling messages help these two users establish and control their WebRTC connection. WebRTC standards don't define what type of signalling protocol to use, this is left to the developer. Recently, WebSockets protocol has been found to outperform all other protocols like BOSH, XHR polling.

3. Find candidates

In this step, the two browsers exchange information their networks and how they can be contacted. This process is called finding candidates and at the end each browser should be mapped to a directly accessible network interface and port. Each browser is likely to be sitting behind a router that may be using Network Address Translation (NAT) to connect the local network to the internet. Finding a way to connect through these types of routers is commonly known as NAT Traversal.

4. Negotiate media sessions

Now that both the browsers know how to talk to each other, they must also agree on the type and format of media (for example, audio and video). This is usually negotiated using an offer/answer based model, built upon the Session Description Protocol (SDP)

5. Start using WebRTC

Once this has all been completed, the browsers can finally start streaming media to each other, either directly through their peer-to-peer connections or via any media relay gateway they have fallen back to using.

References