Engage Encoder Information - rallytac/pub GitHub Wiki

Engage Encoders (and Decoders)

The following are CODECs supported by Engage. Most CODECs have multiple encoding rates - resulting in varying amounts of bandwidth utilization traded against the qualilty of the audio transmitted over the network.

A Word About Audio Quality

Now, in typical bi-directional/full-duplex VoIP systems, audio quality is most-often a combination of latency (the time it takes for audio spoken by one individual to reach another) and the similarity of the audio spoken to that heard - i.e. how good does the received audio sound compared to what was sent in the first place.

That's because the audio that you hear is not actually the audio that was spoken. Rather, what you hear on the other side of a digitized (i.e. VoIP) communication is an algorithmically-generated sound that closely mimics the original audio based on the description of the audio sent by the transmitting entity. The process of reproduction of the audio on the receiving side is also referred to as synthesis.

In a half-duplex scenario (something like a Push-To-Talk setup), latency is slightly less of an issue, however, as the expectation is that there will naturally be a delay between someone saying something, and getting a response. So, because Engage is very often used in that kind of environment, we generally tend to measure quality based on the similarity portion, removing latency from the equation.

[Also, latency is so variable in nature based on network environments and often-changing conditions beyond our control that it makes little sense to provide a blanket quality assessment or guideline on audio quality to include latency.]

LASS

Hence ... in the tables below, we have a column for Latency-Adjusted Similarity Score, or LASS, that gives a measure of the similarity of audio heard to that originally transmitted over a notionally ideal network environment (a network that has no packet loss or corruption; with zero latency). This score is expressed as a percentage, with 100% being an exact match and 0% indicating no similarity between what was transmitted and what was received.

This doesn't mean that a low-score is unintelligible. Rather, it simply means that a low-scoring CODEC sounds less than exactly the same on the receiving end as it was captured on the transmitting end.

For example: Linear/Raw PCM undergoes no compression or decompression and therefore - assuming all other things are equal such as no packet loss or corruption - we should expect it to have a perfect score of 100%. Which it does. G.711 mulaw, however, does compress and decompress the audio a little - resulting in a score of 99.98%. That comes pretty close to perfect reproduction. The tradeoff is that G.711, while incurring some computation on both ends and suffers a little in the similariy area, occupies only half the network bandwidth of Linear/Raw PCM.

GSM, on the other hand, is more computationally expensive on both ends but still gives a pretty good score of 91.27. And it only occupies 13.3 kbps of network bandwidth compared to 64 kbps for G.711 and 128 kbps for Linear PCM.

Looking at the other end of the scale, we find CODEC2 coding at 0.45 kbps coding at in the terrible score of 11.80. The audio is still intelligible but sounds awfully robotic and "breathy". But ... it only uses 0.45 kbps of network bandiwdth. That's around 96% in bandwidth savings!

Frankly, ultra-low bandwidth CODECs like CODEC2 are not suited for most uses. However, there are situations where bandwidth is in extremely short supply so then the only option is to use a low-quality, low-bandwidth CODEC to make do with what's available.

Intelligibility

The discussion above about LASS is somewhat academic honestly because what we're doing there is to perform a purely mathematical comparision between two audio waves (the transmitted wave and the received wave) to see how they match up. But it doesn't mean that even a super low score isn't intelligible to humans. Rather, its just a measure of the mathematical exactness of the two waves.

For purposes of human processing (that's a nice way of saying how real people hear and understand the audio), there's a few ways to do this including the purely subjective intelligibility test of Mean Opinion Score (MOS) where people listen to audio and give their opinion; and algorithmic/objective measuring standards such as Perceptual Evaluation of Speech Quality (PESQ) and Perceptual Objective Listening Quality Analysis (POLQA). We've got it on the list of TODOs to add values from those strategies to this document, but not just yet.

Suffice to say, though, that we (nor anyone else) would not even contemplate using a CODEC that is not intelligible; so its OK to assume that any CODEC listed in this document is safe for human consumption.

G.711

  • Coding delay of 0.2ms
  • RTP packetization is modulus 20ms
  • Max RTP packetization is 60ms
Codec Value Default RTP Payload ID LASS %
G.711 ulaw @ 64kbps 1 0 99.98
G.711 alaw @ 64kbps 2 8 99.98

GSM

  • Coding delay of 2.3ms
  • RTP packetization is fixed at 30ms
Codec Value Default RTP Payload ID LASS %
GSM 6.10 @ 13.3kbps 3 3 91.27

G.729A (As of Engage 1.240.9080)

  • Coding delay of 0.1ms
  • RTP packetization is modulus 10ms
  • Max RTP packetization is 80ms
Codec Value Default RTP Payload ID LASS %
G.729A @ 8kbps 4 18 60.93

Linear/Raw PCM

  • RTP packetization is modulus 10ms
  • Max RTP packetization is 60ms
  • Samples are encoded as signed, 16-bit integers in Little Endian format
Codec Value Default RTP Payload ID LASS %
Linear PCM @ 128kbps 5 99 100.00

AMR Narrowband

  • Coding delay of 2.5ms
  • RTP packetization is fixed at 20ms
Codec Value Default RTP Payload ID LASS %
AMR Narrowband @ 4.75kbps 10 122 56.88
AMR Narrowband @ 5.15kbps 11 122 55.61
AMR Narrowband @ 5.9kbps 12 122 58.63
AMR Narrowband @ 6.7kbps 13 122 55.31
AMR Narrowband @ 7.4kbps 14 122 63.03
AMR Narrowband @ 7.95kbps 15 122 62.97
AMR Narrowband @ 10.2kbps 16 122 64.64
AMR Narrowband @ 12.2kbps 17 122 64.58

Opus

  • Coding delay of 0.9ms
  • Max RTP packetization is 100ms
Codec Value Default RTP Payload ID LASS %
Opus @ 6kbps 20 118 82.85
Opus @ 8kbps 21 118 81.13
Opus @ 10kbps 22 118 93.74
Opus @ 12kbps 23 118 95.47
Opus @ 14kbps 24 118 96.87
Opus @ 16kbps 25 118 96.51
Opus @ 18kbps 26 118 69.47
Opus @ 20kbps 27 118 88.94
Opus @ 22kbps 28 118 53.33
Opus @ 24kbps 29 118 55.55

Speex

  • Coding delay of 20ms
  • RTP packetization is modulus 20ms
  • Max RTP packetization is 60ms
  • Available with Engage 1.208.9046 onward
Codec Value Default RTP Payload ID LASS %
Speex Narrowband @ 2.15kbps 30 97 17.27
Speex Narrowband @ 3.95kbps 31 97 50.13
Speex Narrowband @ 5.95kbps 32 97 40.00
Speex Narrowband @ 8kbps 33 97 51.20
Speex Narrowband @ 11kbps 34 97 61.46
Speex Narrowband @ 15kbps 35 97 64.14
Speex Narrowband @ 18.2kbps 36 97 65.31
Speex Narrowband @ 24.6kbps 37 97 65.68

Codec2

  • Coding delay varies based on encoder
  • RTP packetization varies based on encoder
  • Available with Engage 1.212.9050 onward
  • More information on Wikipedia
Codec Value Default RTP Payload ID LASS %
Codec2 @ 0.45kbps 40 88 11.80
Codec2 @ 0.7kbps 41 87 12.23
Codec2 @ 1.2kbps 42 85 12.99
Codec2 @ 1.3kbps 43 84 16.46
Codec2 @ 1.4kbps 44 83 12.56
Codec2 @ 1.6kbps 45 82 17.67
Codec2 @ 2.4kbps 46 81 10.98
Codec2 @ 3.2kbps 47 80 19.97

MELPe

  • Coding delay varies based on encoder
  • RTP packetization varies based on encoder
  • Available with Engage 1.212.9050 onward
  • More information on Wikipedia
Codec Value Default RTP Payload ID LASS %
MELPe @ 0.6kbps 50 79 12.65
MELPe @ 1.2kbps 51 78 19.96
MELPe @ 2.4kbps 52 77 13.94