Engage Encoder Information - rallytac/pub GitHub Wiki

Engage Encoders (and Decoders)

The following are CODECs supported by Engage. Most CODECs have multiple encoding rates - resulting in varying amounts of bandwidth utilization traded against the qualilty of the audio transmitted over the network.

A Word About Audio Quality

Now, in typical bi-directional/full-duplex VoIP systems, audio quality is most-often a combination of latency (the time it takes for audio spoken by one individual to reach another) and the similarity of the audio spoken to that heard - i.e. how good does the received audio sound compared to what was sent in the first place.

That's because the audio that you hear is not actually the audio that was spoken. Rather, what you hear on the other side of a digitized (i.e. VoIP) communication is an algorithmically-generated sound that closely mimics the original audio based on the description of the audio sent by the transmitting entity. The process of reproduction of the audio on the receiving side is also referred to as synthesis.

In a half-duplex scenario (something like a Push-To-Talk setup), latency is slightly less of an issue, however, as the expectation is that there will naturally be a delay between someone saying something, and getting a response. So, because Engage is very often used in that kind of environment, we generally tend to measure quality based on the similarity portion, removing latency from the equation.

[Also, latency is so variable in nature based on network environments and often-changing conditions beyond our control that it makes little sense to provide a blanket quality assessment or guideline on audio quality to include latency.]

LASS

Hence ... in the tables below, we have a column for Latency-Adjusted Similarity Score, or LASS, that gives a measure of the similarity of audio heard to that originally transmitted over a notionally ideal network environment (a network that has no packet loss or corruption; with zero latency). This score is expressed as a percentage, with 100% being an exact match and 0% indicating no similarity between what was transmitted and what was received.

This doesn't mean that a low-score is unintelligible. Rather, it simply means that a low-scoring CODEC sounds less than exactly the same on the receiving end as it was captured on the transmitting end.

For example: Linear/Raw PCM undergoes no compression or decompression and therefore - assuming all other things are equal such as no packet loss or corruption - we should expect it to have a perfect score of 100%. Which it does. G.711 mulaw, however, does compress and decompress the audio a little - resulting in a score of 99.98%. That comes pretty close to perfect reproduction. The tradeoff is that G.711, while incurring some computation on both ends and suffers a little in the similariy area, occupies only half the network bandwidth of Linear/Raw PCM.

GSM, on the other hand, is more computationally expensive on both ends but still gives a pretty good score of 91.27. And it only occupies 13.3 kbps of network bandwidth compared to 64 kbps for G.711 and 128 kbps for Linear PCM.

Looking at the other end of the scale, we find CODEC2 coding at 0.45 kbps coding at in the terrible score of 11.80. The audio is still intelligible but sounds awfully robotic and "breathy". But ... it only uses 0.45 kbps of network bandiwdth. That's around 96% in bandwidth savings!

Frankly, ultra-low bandwidth CODECs like CODEC2 are not suited for most uses. However, there are situations where bandwidth is in extremely short supply so then the only option is to use a low-quality, low-bandwidth CODEC to make do with what's available.

Intelligibility

The discussion above about LASS is somewhat academic honestly because what we're doing there is to perform a purely mathematical comparision between two audio waves (the transmitted wave and the received wave) to see how they match up. But it doesn't mean that even a super low score isn't intelligible to humans. Rather, its just a measure of the mathematical exactness of the two waves.

For purposes of human processing (that's a nice way of saying how real people hear and understand the audio), there's a few ways to do this including the purely subjective intelligibility test of Mean Opinion Score (MOS) where people listen to audio and give their opinion; and algorithmic/objective measuring standards such as Perceptual Evaluation of Speech Quality (PESQ) and Perceptual Objective Listening Quality Analysis (POLQA). We've got it on the list of TODOs to add values from those strategies to this document, but not just yet.

Suffice to say, though, that we (nor anyone else) would not even contemplate using a CODEC that is not intelligible; so its OK to assume that any CODEC listed in this document is safe for human consumption.

G.711

Coding delay of 0.2ms
RTP packetization is modulus 20ms
Max RTP packetization is 60ms

Codec	Value	Default RTP Payload ID	LASS %
G.711 ulaw @ 64kbps	1	0	99.98
G.711 alaw @ 64kbps	2	8	99.98

GSM

Coding delay of 2.3ms
RTP packetization is fixed at 30ms

Codec	Value	Default RTP Payload ID	LASS %
GSM 6.10 @ 13.3kbps	3	3	91.27

G.729A (As of Engage 1.240.9080)

Coding delay of 0.1ms
RTP packetization is modulus 10ms
Max RTP packetization is 80ms

Codec	Value	Default RTP Payload ID	LASS %
G.729A @ 8kbps	4	18	60.93

Linear/Raw PCM

RTP packetization is modulus 10ms
Max RTP packetization is 60ms
Samples are encoded as signed, 16-bit integers in Little Endian format

Codec	Value	Default RTP Payload ID	LASS %
Linear PCM @ 128kbps	5	99	100.00

AMR Narrowband

Coding delay of 2.5ms
RTP packetization is fixed at 20ms

Codec	Value	Default RTP Payload ID	LASS %
AMR Narrowband @ 4.75kbps	10	122	56.88
AMR Narrowband @ 5.15kbps	11	122	55.61
AMR Narrowband @ 5.9kbps	12	122	58.63
AMR Narrowband @ 6.7kbps	13	122	55.31
AMR Narrowband @ 7.4kbps	14	122	63.03
AMR Narrowband @ 7.95kbps	15	122	62.97
AMR Narrowband @ 10.2kbps	16	122	64.64
AMR Narrowband @ 12.2kbps	17	122	64.58

Opus

Coding delay of 0.9ms
Max RTP packetization is 100ms

Codec	Value	Default RTP Payload ID	LASS %
Opus @ 6kbps	20	118	82.85
Opus @ 8kbps	21	118	81.13
Opus @ 10kbps	22	118	93.74
Opus @ 12kbps	23	118	95.47
Opus @ 14kbps	24	118	96.87
Opus @ 16kbps	25	118	96.51
Opus @ 18kbps	26	118	69.47
Opus @ 20kbps	27	118	88.94
Opus @ 22kbps	28	118	53.33
Opus @ 24kbps	29	118	55.55

Speex

Coding delay of 20ms
RTP packetization is modulus 20ms
Max RTP packetization is 60ms
Available with Engage 1.208.9046 onward

Codec	Value	Default RTP Payload ID	LASS %
Speex Narrowband @ 2.15kbps	30	97	17.27
Speex Narrowband @ 3.95kbps	31	97	50.13
Speex Narrowband @ 5.95kbps	32	97	40.00
Speex Narrowband @ 8kbps	33	97	51.20
Speex Narrowband @ 11kbps	34	97	61.46
Speex Narrowband @ 15kbps	35	97	64.14
Speex Narrowband @ 18.2kbps	36	97	65.31
Speex Narrowband @ 24.6kbps	37	97	65.68

Codec2

Coding delay varies based on encoder
RTP packetization varies based on encoder
Available with Engage 1.212.9050 onward
More information on Wikipedia

Codec	Value	Default RTP Payload ID	LASS %
Codec2 @ 0.45kbps	40	88	11.80
Codec2 @ 0.7kbps	41	87	12.23
Codec2 @ 1.2kbps	42	85	12.99
Codec2 @ 1.3kbps	43	84	16.46
Codec2 @ 1.4kbps	44	83	12.56
Codec2 @ 1.6kbps	45	82	17.67
Codec2 @ 2.4kbps	46	81	10.98
Codec2 @ 3.2kbps	47	80	19.97

MELPe

Coding delay varies based on encoder
RTP packetization varies based on encoder
Available with Engage 1.212.9050 onward
More information on Wikipedia

Codec	Value	Default RTP Payload ID	LASS %
MELPe @ 0.6kbps	50	79	12.65
MELPe @ 1.2kbps	51	78	19.96
MELPe @ 2.4kbps	52	77	13.94