Engage Encoder Information - rallytac/pub GitHub Wiki
Engage Encoders (and Decoders)
The following are CODECs supported by Engage. Most CODECs have multiple encoding rates - resulting in varying amounts of bandwidth utilization traded against the qualilty of the audio transmitted over the network.
A Word About Audio Quality
Now, in typical bi-directional/full-duplex VoIP systems, audio quality is most-often a combination of latency (the time it takes for audio spoken by one individual to reach another) and the similarity of the audio spoken to that heard - i.e. how good does the received audio sound compared to what was sent in the first place.
That's because the audio that you hear is not actually the audio that was spoken. Rather, what you hear on the other side of a digitized (i.e. VoIP) communication is an algorithmically-generated sound that closely mimics the original audio based on the description of the audio sent by the transmitting entity. The process of reproduction of the audio on the receiving side is also referred to as synthesis.
In a half-duplex scenario (something like a Push-To-Talk setup), latency is slightly less of an issue, however, as the expectation is that there will naturally be a delay between someone saying something, and getting a response. So, because Engage is very often used in that kind of environment, we generally tend to measure quality based on the similarity portion, removing latency from the equation.
[Also, latency is so variable in nature based on network environments and often-changing conditions beyond our control that it makes little sense to provide a blanket quality assessment or guideline on audio quality to include latency.]
LASS
Hence ... in the tables below, we have a column for Latency-Adjusted Similarity Score, or LASS, that gives a measure of the similarity of audio heard to that originally transmitted over a notionally ideal network environment (a network that has no packet loss or corruption; with zero latency). This score is expressed as a percentage, with 100% being an exact match and 0% indicating no similarity between what was transmitted and what was received.
This doesn't mean that a low-score is unintelligible. Rather, it simply means that a low-scoring CODEC sounds less than exactly the same on the receiving end as it was captured on the transmitting end.
For example: Linear/Raw PCM undergoes no compression or decompression and therefore - assuming all other things are equal such as no packet loss or corruption - we should expect it to have a perfect score of 100%. Which it does. G.711 mulaw, however, does compress and decompress the audio a little - resulting in a score of 99.98%. That comes pretty close to perfect reproduction. The tradeoff is that G.711, while incurring some computation on both ends and suffers a little in the similariy area, occupies only half the network bandwidth of Linear/Raw PCM.
GSM, on the other hand, is more computationally expensive on both ends but still gives a pretty good score of 91.27. And it only occupies 13.3 kbps of network bandwidth compared to 64 kbps for G.711 and 128 kbps for Linear PCM.
Looking at the other end of the scale, we find CODEC2 coding at 0.45 kbps coding at in the terrible score of 11.80. The audio is still intelligible but sounds awfully robotic and "breathy". But ... it only uses 0.45 kbps of network bandiwdth. That's around 96% in bandwidth savings!
Frankly, ultra-low bandwidth CODECs like CODEC2 are not suited for most uses. However, there are situations where bandwidth is in extremely short supply so then the only option is to use a low-quality, low-bandwidth CODEC to make do with what's available.
Intelligibility
The discussion above about LASS is somewhat academic honestly because what we're doing there is to perform a purely mathematical comparision between two audio waves (the transmitted wave and the received wave) to see how they match up. But it doesn't mean that even a super low score isn't intelligible to humans. Rather, its just a measure of the mathematical exactness of the two waves.
For purposes of human processing (that's a nice way of saying how real people hear and understand the audio), there's a few ways to do this including the purely subjective intelligibility test of Mean Opinion Score (MOS) where people listen to audio and give their opinion; and algorithmic/objective measuring standards such as Perceptual Evaluation of Speech Quality (PESQ) and Perceptual Objective Listening Quality Analysis (POLQA). We've got it on the list of TODOs to add values from those strategies to this document, but not just yet.
Suffice to say, though, that we (nor anyone else) would not even contemplate using a CODEC that is not intelligible; so its OK to assume that any CODEC listed in this document is safe for human consumption.
G.711
- Coding delay of 0.2ms
- RTP packetization is modulus 20ms
- Max RTP packetization is 60ms
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
G.711 ulaw @ 64kbps | 1 | 0 | 99.98 |
G.711 alaw @ 64kbps | 2 | 8 | 99.98 |
GSM
- Coding delay of 2.3ms
- RTP packetization is fixed at 30ms
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
GSM 6.10 @ 13.3kbps | 3 | 3 | 91.27 |
G.729A (As of Engage 1.240.9080)
- Coding delay of 0.1ms
- RTP packetization is modulus 10ms
- Max RTP packetization is 80ms
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
G.729A @ 8kbps | 4 | 18 | 60.93 |
Linear/Raw PCM
- RTP packetization is modulus 10ms
- Max RTP packetization is 60ms
- Samples are encoded as signed, 16-bit integers in Little Endian format
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
Linear PCM @ 128kbps | 5 | 99 | 100.00 |
AMR Narrowband
- Coding delay of 2.5ms
- RTP packetization is fixed at 20ms
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
AMR Narrowband @ 4.75kbps | 10 | 122 | 56.88 |
AMR Narrowband @ 5.15kbps | 11 | 122 | 55.61 |
AMR Narrowband @ 5.9kbps | 12 | 122 | 58.63 |
AMR Narrowband @ 6.7kbps | 13 | 122 | 55.31 |
AMR Narrowband @ 7.4kbps | 14 | 122 | 63.03 |
AMR Narrowband @ 7.95kbps | 15 | 122 | 62.97 |
AMR Narrowband @ 10.2kbps | 16 | 122 | 64.64 |
AMR Narrowband @ 12.2kbps | 17 | 122 | 64.58 |
Opus
- Coding delay of 0.9ms
- Max RTP packetization is 100ms
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
Opus @ 6kbps | 20 | 118 | 82.85 |
Opus @ 8kbps | 21 | 118 | 81.13 |
Opus @ 10kbps | 22 | 118 | 93.74 |
Opus @ 12kbps | 23 | 118 | 95.47 |
Opus @ 14kbps | 24 | 118 | 96.87 |
Opus @ 16kbps | 25 | 118 | 96.51 |
Opus @ 18kbps | 26 | 118 | 69.47 |
Opus @ 20kbps | 27 | 118 | 88.94 |
Opus @ 22kbps | 28 | 118 | 53.33 |
Opus @ 24kbps | 29 | 118 | 55.55 |
Speex
- Coding delay of 20ms
- RTP packetization is modulus 20ms
- Max RTP packetization is 60ms
- Available with Engage 1.208.9046 onward
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
Speex Narrowband @ 2.15kbps | 30 | 97 | 17.27 |
Speex Narrowband @ 3.95kbps | 31 | 97 | 50.13 |
Speex Narrowband @ 5.95kbps | 32 | 97 | 40.00 |
Speex Narrowband @ 8kbps | 33 | 97 | 51.20 |
Speex Narrowband @ 11kbps | 34 | 97 | 61.46 |
Speex Narrowband @ 15kbps | 35 | 97 | 64.14 |
Speex Narrowband @ 18.2kbps | 36 | 97 | 65.31 |
Speex Narrowband @ 24.6kbps | 37 | 97 | 65.68 |
Codec2
- Coding delay varies based on encoder
- RTP packetization varies based on encoder
- Available with Engage 1.212.9050 onward
- More information on Wikipedia
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
Codec2 @ 0.45kbps | 40 | 88 | 11.80 |
Codec2 @ 0.7kbps | 41 | 87 | 12.23 |
Codec2 @ 1.2kbps | 42 | 85 | 12.99 |
Codec2 @ 1.3kbps | 43 | 84 | 16.46 |
Codec2 @ 1.4kbps | 44 | 83 | 12.56 |
Codec2 @ 1.6kbps | 45 | 82 | 17.67 |
Codec2 @ 2.4kbps | 46 | 81 | 10.98 |
Codec2 @ 3.2kbps | 47 | 80 | 19.97 |
MELPe
- Coding delay varies based on encoder
- RTP packetization varies based on encoder
- Available with Engage 1.212.9050 onward
- More information on Wikipedia
Codec | Value | Default RTP Payload ID | LASS % |
---|---|---|---|
MELPe @ 0.6kbps | 50 | 79 | 12.65 |
MELPe @ 1.2kbps | 51 | 78 | 19.96 |
MELPe @ 2.4kbps | 52 | 77 | 13.94 |