Timestamp handling - Fraunhofer-IIS/iec61937-13 GitHub Wiki

As mentioned in section 'MPEG-H 3D Audio Special Features' MPEG-H 3D Audio provides unique features that are not found in other audio coding schemes. One of these unique features is that an encoded audio data frame may not represent a constant number of PCM audio samples.

Truncated MPEG-H 3D Audio data frames

Truncated MPEG-H 3D Audio data frames can for example be employed to switch an encoded audio stream concurrently with an encoded video stream as shown in Figure 4.

Time_stamps_with_audio_truncation Figure 4: Time stamps with audio truncation

The figure shows a video stream and an audio stream. At video frame number 5 the program is switched. The timing is shown for a video frame rate of 50 fps, audio data frames of 1024 samples per (non-truncated) frame and a sampling rate of 48 kHz. The time stamps of the audio and video frames are shown at the vertical arrows with a time base of 90000 ticks per second. At video frame 5 the content changes. The audio changes at the same instance of time by means of a truncated MPEG-H 3D Audio data frame.

Figure 5 and Table 1 show how the burst payloads for the audio data frames 4 – 6 may be constructed.

MPEG-H_3D_Audio_burst_payload_with_truncation Figure 5: MPEG-H 3D Audio burst payload with truncation

NOTE: for the completeness to the IEC61937-13 standard the following table also contains the header values for the non-HBR mode even if this mode is not required for IEC61937-13 receiver and sender implementation.

Table 1 - MPEG-H 3D Audio burst payload header structure entries for truncated data frame

Header Structure Entry Value in Normal Structure Value in HBR Structure Notes
Data Offset 4 26 32 8 bytes Pa-Pd + 2 Header Structures + Stop Header Structure (0, 0, 0)
Data Size 4 Size of MPEG-H 3D Audio data frame 4 Size of MPEG-H 3D Audio data frame 4
PCM Offset 4 0 0
Data Offset 5 26 + size of data frame 5 32 + size of data frame 5 8 bytes Pa-Pd + 2 Header Structures + Stop Header Structure (0, 0, 0) + size of data frame 4
Data Size 5 Size of MPEG-H 3D Audio data frame 5 Size of MPEG-H 3D Audio data frame 5
PCM Offset 5 Number of PCM samples in data frame 4 Number of PCM samples in data frame 4 704 for the example given in Figure 4
Data Offset 6 20 24 8 bytes Pa-Pd + Header Structure + Stop Header Structure (0, 0, 0)
Data Size 6 Size of MPEG-H 3D Audio data frame 6 Size of MPEG-H 3D Audio data frame 6
PCM Offset 6 Number of PCM samples in data frame 4 Number of PCM samples in data frame 4 704 for the example given in Figure 4. PCM offset carries on until next truncated data frame

Transmitter operation

An IEC 61397-13 transmitter has to insert correct PCM offset values to ensure correct timing of the output of the decoded PCM samples at the receiver. The PCM offset values can usually be derived from higher level system timing information.

For example in MPEG-2 transport stream based systems the PCM offset values can be derived from the presentation time stamps of the transport stream. In MPEG-4 file format based systems the PCM offset values can be derived from the sample duration information contained in the audio track of an MP4 file.

Receiver operation

An IEC 61397-13 receiver has to use the PCM offset value to play out the decoded PCM samples at the correct instant of time. The PCM offset value specifies the time difference in audio samples of the start of playout of a decoded MPEG-H 3D Audio data frame relative to the reference point of the burst payload that contains this MPEG-H 3D Audio data frame. Figure 6 illustrates this relative timing for the example of a truncated MPEG-H 3D Audio data frame given in section 'Truncated MPEG-H 3D Audio data frames'.

Playout_timing Figure 6: Playout timing

The PCM offset of audio frame 4 is 0 so the playout of the decoded PCM samples of audio frame 4 starts immediately at the reference point R’n of the burst payload that contains audio frame 4. Audio frame 4 is a truncated MPEG-H 3D Audio data frame with a length of 704 samples. Playout of the decoded PCM samples of audio frame 5 starts 704 samples after the same reference R’n point because audio frames 4 and 5 are contained in the same burst payload.

In this example we assume that the length of a non-truncated MPEG-H 3D Audio data frame and the IEC 61937 burst repetition period are both 1024 audio samples. Therefore the PCM offset of audio frame 6 is again 704 and the playout of the decoded PCM samples of audio frame 6 starts 704 samples after the reference point R’n+1 of the next burst payload.

Note that the length of a non-truncated audio frame and the burst repetition period don't need to be identical. In this case the PCM offset values will be different for subsequent audio frames.

Note also that the reference points R’ in the PCM output will be delayed in comparison to the reference points R of the IEC 61937-13 stream, because an audio device needs time to fully receive and decode the MPEG-H 3D Audio data frames (see section 'Latency' for more details on latency).