Latency - Fraunhofer-IIS/iec61937-13 GitHub Wiki

IEC 61937-13 latency is the time required to transmit a complete MPEG-H 3D Audio data frame via IEC 61937-13 to allow the continuous decoding/rendering process on the receiving device.

IEC 61937-13 latency depends on the transmission parameters burst repetition period and sampling rate mode and the MPEG-H parameter maximum size of the MPEG-H 3D Audio data frame.

Since burst repetition period and MPEG-H audio frame are not aligned to each other (they even can be of different length), up to one additional burst repetition period is needed to wait for the start of transmission of a data frame.

In addition, MPEG-H has a variable bit rate that (depending on used parameters) can lead to bit rate peaks that require that a single MPEG-H 3D Audio data frame needs more than one burst period to be transmitted. In that case, the latency is calculated according to the worst case number of burst periods that are needed to transmit one audio data frame.

Burst repetition period

Burst repetition period for IEC 61937-13 data frames, defined by the IEC61937-13 standard, could be chosen to be either 768, 1024, 1536, 2048, 3072 or 4096 samples. The corresponding time needed for one of these burst repetition periods is accordingly either 16ms, 21.33ms, 32ms, 42.67ms, 64ms or 85.33ms at 48 kHz sampling frequency.

For latency it is obviously best to use one of the short repetition periods (either 768 or 1024 samples), since this reduces the time delay for the burst period. It depends on the selected coding parameters if an IEC burst repetition period of 768 or 1024 samples provides a shorter overall delay.

The burst repetition period is independent of the audio frame length that is used by MPEG-H (since MPEG-H audio frame length is not constant anyway).

IMPLEMENTATION RESTRICTION: only 1024 samples burst repetition period needs to be considered by the IEC61937-13 receiver and sender implementation. All other burst repetition periods defined by the IEC61937-13 can be ignored.

Maximal MPEG-H 3D Audio data frame size

MPEG-H uses a bit reservoir mechanism to allow peak bit rates that are higher than the average bit rate. This way a higher number of bits can be assigned to portions of the audio signal that are more complex to code than other portions. MPEG-H 3D Audio data frames may get so big that some do not fit into a single IEC 61937 data burst, even if the average bit rate is lower than the maximum bit rate of the IEC 61937 link.

Therefore, MPEG-H 3D Audio data frames may be partitioned (spill over) to more than one IEC 61937 data burst, if they don't fit into the current data burst. Figure 7 illustrates an MPEG-H 3D Audio data frame that extends over two IEC 61937 data bursts.

img_MPEG-H_3D_Audio_burst_payload_for_overlapping_data_frame.png Figure 7: MPEG-H 3D Audio burst payload for overlapping data frame

The size of the bit reservoir is 6144 bits times the number of used audio channels minus the average size of an MPEG-H data frame. For the MPEG-H 3D Audio baseline (BL) profile level 3, the maximum number of channels is 32, the maximum size of the bit reservoir is therefore 32 * 6144 = 196608 bits minus the average size of an MPEG-H 3D Audio data frame. The largest data frame size is 32 * 6144 = 196608 bits.

Table 2 shows the average and maximal MPEG-H 3D Audio data frame sizes for different bitrates. All calculations are done for 1024 samples audio frame size and 48 kHz sampling rate with 32 channels.

Table 2: MPEG-H 3D Audio data frame sizes (BL profile, level3)

Bitrate Average audio data frame size Maximal audio data frame size
1.2 Mbps (~ 38 kbps per channel) 26839 bits 196608 bits
2.5 Mbps (~ 78 kbps per channel) 55924 bits 196608 bits
3.125 Mbps (~ 100 kbps per channel) 69894 bits 196608 bits

IEC 61937-13 payload size

The maximal data burst payload size of an IEC 61937-13 data burst depends on the burst repetition period and the sample rate multiplier. The sample rate multiplier is 1 for non-HBR mode or 2, 4, 8 or 16 for HBR mode. The maximum data burst payload sizes are given in Tables 5, 8, 9, 10 and 11 of IEC 61937-13.

The maximal MPEG-H 3D Audio data frame sizes are a little smaller than the maximal data burst payload sizes because the burst payload headers have to be considered. The maximal MPEG-H 3D Audio data frame sizes may be further reduced by the need for additional headroom for e.g. MHAS_USERINTERACTION packets and system sound support.

Table 3 shows the maximum burst payload sizes and audio frame sizes for 1024 samples burst repetition period and all possible sampling rate modes.

Table 3: IEC 61937-13 payload sizes (1024 samples burst repetition period)

IEC sampling rate mode / bitrate mode Max burst payload size Max audio frame size (considering payload headers, see 1. below) Max audio frame size (considering payload headers and additional headroom, see 2. below)
1 x audio sampling rate, 48 kHz / 1.5 Mbps 32640 bits (4080 Bytes) 32544 bits (4068 Bytes) 26144 bits (3268 Bytes)
2 x audio sampling rate, 96 kHz / 3 Mbps 65408 bits (1022 * 8 = 8176 Bytes) 65280 bits (8160 Bytes) 58880 bits (7360 Bytes)
4 x audio sampling rate, 192 kHz / 6 Mbps 130944 bits (2046 * 8 = 16368 Bytes) 130816 bits (16352 Bytes) 124416 bits (15552 Bytes)
8 x audio sampling rate, 384 kHz / 12 Mbps 262016 bits (4094 * 8 = 32752 Bytes) 261888 bits (32736 Bytes) 255488 bits (31936 Bytes)
16 x audio sampling rate, 768 kHz / 24 Mbps 524160 bits (8190 * 8 = 65520 Bytes) 524032 bits (65504 Bytes) 517632 bits (64704 Bytes)
  1. Max payload size for an audio frame per IEC payload considering the payload header:
  • for 1 x audio sample rate: max burst payload size – 12 bytes
  • for 2/4/8/16 x audio sample rates: max burst payload size – 16 bytes
  1. Additional headroom with 300 kbps (~6400 bits / 800 Bytes per 1024 samples frame) reserved for e.g. MHAS_USERINTERACTION packets

IMPLEMENTATION NOTE: the table above contains payload sizes for all sample rate multipliers. For the IEC16937-13 receiver/sender implementation only the rows for 4x and 16x sample rate multipliers to be considered

IEC 61937-13 sample rate mode

Table 4 shows the relationship between the IEC sampling rate mode and the number of required burst payloads and corresponding latency for all possible sampling rate modes for the MPEG-H 3D Audio baseline (BL) profile level 3 maximal MPEG-H 3D Audio data frame size and a burst repetition period of 1024 samples.

Table 4: IEC 61937-13 latency (BL profile, level 3) (1024 samples burst repetition period)

IEC sampling rate mode / bitrate mode Required IEC burst payloads for maximal audio data frame size Required IEC burst payloads for maximal audio data frame size with additional headroom
1 x audio sampling rate, 48 kHz / 1.5 Mbps 7 IEC payloads → 149.32 msec latency 8 IEC payloads → 170.64 msec latency
2 x audio sampling rate, 96 kHz / 3 Mbps 4 IEC payloads → 85.32 msec latency 4 IEC payloads → 85.32 msec latency
4 x audio sampling rate, 192 kHz / 6 Mbps 2 IEC payloads → 42.66 msec latency 2 IEC payloads → 42.66 msec latency
8 x audio sampling rate, 384 kHz / 12 Mbps 1 IEC payload → 21.33 msec latency 1 IEC payload → 21.33 msec latency
16 x audio sampling rate, 768 kHz / 24 Mbps 1 IEC payload → 21.33 msec latency 1 IEC payload → 21.33 msec latency

The values shown in Table 4 are worst case values for MPEG-H 3D Audio baseline (BL) profile level 3. The latency can be lower if less than 32 codec core channels are used. However, an IEC 61937-13 receiver has to be prepared for the worst case.

IMPLEMENTATION NOTE: the table above contains latency indications for all sample rate multipliers. For the IEC16937-13 receiver/sender implementation only the rows for 4x and 16x sample rate multipliers to be considered

IEC 61937-13 latency

IEC 61937-13 latency is the time required for the transmission of the number of required burst payloads for a maximun length audio data frame (see 4.4) plus one additional burst period for alignment between IEC burst frames and audio frames.

The latency range caused by IEC 61397-13 transmission varies from approx. 40ms (short repetition period, high sample rate factor) to above 200ms (largest influence for that is low IEC sample rate mode).

To keep latency of a decoding device low, it is strongly recommended that the device is always operated with constrained parameters that are optimized for low latency.

For IEC 61937-13 settings, recommended constraints are:

  • Burst repetition period is always 1024 samples
  • IEC sample rate factor is 4 for SPDIF and HDMI ARC type connections and 16 for HDMI forward and HDMI eARC connections 

This limits the IEC 61937-13 latency to approx. 65 ms for SPDIF and HDMI ARC connections and 45 ms for HDMI forward connections.