Readers guide to IEC 61937 13 - Fraunhofer-IIS/iec61937-13 GitHub Wiki
The explanations on this page are meant as supplementary information to the text found in IEC 61937-13, IEC 61937-1 and -2. It is assumed that the reader has a basic understanding of the burst transmission defined in IEC 61937-1 and -2.
Principles of Operation
The IEC 61937 family of standards defines how to transport coded audio bitstreams via the IEC 60958 digital audio interface used for SPDIF and HDMI, including ARC and eARC, connections.
IEC 61937-1 and -2 define the basic approach as follows:
- The original linear PCM samples are partitioned into PCM audio frames of constant size, e.g. 1024 samples
- Each PCM frame is encoded into an audio data frame
- The audio data frame is formatted according to IEC 61937-1 and IEC 61937-2 and transmitted as a data burst within the time interval that would be needed to transport the original number of PCM samples.
- A sync header at the start of the data burst enables the receiver to detect the data burst and extract the data frame from the input stream.
- After decoding the data frame, the resulting PCM samples can be used as if they had been directly transmitted as PCM samples.
Figure 1 illustrates these principles of operation of IEC 61937.
Figure 1: IEC 61937 Principle of Operation
Note that an IEC 61937 transmitting device usually does not perform the encoding of the PCM audio samples into encoded audio frames. Instead, it usually receives already encoded audio frames. For example a set top box may receive an encoded audio bitstream via a satellite link and transmit it through IEC 61937 to a TV set. Or a BluRay player may read an encoded audio bitstream from the BluRay disc and transmit it through IEC 61937 to an AVR.
The sync header mentioned in bullet point 4 of the above list is shown as Pa – Pd in Figure 1. Pa – Pd are four 16-bit words. Pa and Pb contain a synchronization bit pattern. Pc specifies the type of the coded audio data payload and Pd specifies the size of the coded audio payload. For MPEG-H 3D Audio the bits 0-4 of Pc are set to a decimal value of 25.
In normal operation, mode the data rate of an IEC 61937 link is the same that would be needed for an uncompressed 16-bit stereo PCM signal, i. e. a gross data rate of 48 kHz * 2 * 16 = 1536 kbps for 48 kHz sample rate. Furthermore, high bit rate (HBR) modes are available that operate at 2 times, 4 times, 8 times or 16 times the normal data rate.
MPEG-H 3D Audio Special Features
MPEG-H 3D Audio enables unique features that are not found in other audio coding schemes. Therefore, IEC 61937-13 specifies additional steps that are necessary for an MPEG-H 3D Audio Stream. Particularly two properties of MPEG-H 3D Audio are not directly suitable for the basic principles of operation of IEC 61937 as outlined in section 'Principles of Operation'.
- Each MPEG-H data frame may represent a varying number of PCM samples. This depends on both the internal encoder states and on the truncation mechanism that allows to shorten PCM frames to support sample accurate changes of the audio scene configuration (e.g. change from stereo to 7.1+4 channels at arbitrary PCM samples)
- MPEG-H uses a bit reservoir mechanism to allow peak bit rates that are higher than the average bit rate. As a result, data frames are of variable size and can become so big that they do not fit into a single IEC data burst. Instead, data frames need to be split over more than one IEC data burst (see maximal MPEG-H 3D Audio data frame size).
IEC 61937-13 defines the following mechanism to implement the necessary functionality to support the MPEG-H 3D Audio special features:
- A variable number of MPEG-H 3D Audio data frames may be transmitted in one IEC 61937 data burst.
- MPEG-H 3D Audio data frames may be partitioned (spill over) to more than one IEC 61937 data burst, if they don't fit into the current data burst.
- Each MPEG-H 3D Audio data frame is accompanied by a timing offset (relative to the start of the data burst) that tells where the decoded PCM samples from this data frame are to be placed.
This flexibility requires additional bookkeeping information. It is provided as a burst payload header to signal for every data frame in that burst:
- the "data offset", i.e. an offset in bytes pointing to the first bit of the first byte of the data frame
- the "data size", i.e. the length (in bytes) of the data frame
- the "PCM offset", i.e. a temporal offset for the first PCM value of the decoded data frame
This mechanism is illustrated in the following Figure 2 and Figure 3.
Figure 2: MPEG-H 3D Audio data-burst structure
Figure 3: MPEG-H 3D Audio burst payload
Further examples are given in IEC 61937-13 specification section 5.3.3.
"MPEG-H 3D Audio" and "MPEG-H 3D Audio HBR"
The operation mode of an IEC 61937 transmission is specified by the Pc word of the header. The bits 0 – 6 specify the payload type and operation mode.
- Bits 0 – 4 of Pc set to decimal 25 and bits 5 – 6 of Pc set to decimal 0 specify MPEG-H 3D Audio
- Bits 0 – 4 of Pc set to decimal 25 and bits 5 – 6 of Pc set to decimal 1 specify MPEG-H 3D Audio high bitrate (HBR). In case of MPEG-H 3D Audio HBR bits 11 – 12 of Pc specify the multiple (2, 4, 8 or 16) of the data rate compared to non-HBR mode.
The format of the payload header is slightly different between non-HBR and HBR mode: In non-HBR mode the data size and data offset field of the payload header occupy 2 bytes whereas in HBR mode the data size and data offset field of the payload header occupy 3 bytes (See Table 3 and Table 6 of IEC 61937-13).
IMPLEMENTATION RESTRICTION: only MPEG-H 3D Audio HBR modes 4x and 16x needs to be supported by the device implementation. The non-HBR mode and HBR modes 2x and 8x can be ignored.
MPEG-H 3D Audio data frame
IEC 61937-13 employs the MHAS framing format. MHAS is a self-contained stream format to transport MPEG-H 3D Audio data defined in ISO/IEC 23008-3, section 14. An MPEG-H 3D Audio data frame is a sequence of one or more MPEG-H 3D Audio Stream Packets (MHAS packets). In MPEG terms an MPEG-H 3D Audio data frame is usually called an access unit (AU).
MHAS packets consist of a MHASPacketType field, a MHASPacketLabel field, a MHASPacketLength field and a MHASPacketPayload. The MHASPacketType field specifies the payload type in the packet. The MHASPacketLabel field provides an indication on which packets belong together (there may be a main stream and zero or more sub streams included). The MHASPacketLength field indicates the length of the MHASPacketPayload in bytes.
Each MPEG-H 3D Audio Data Frame shall contain exactly one MHAS Packet with MHASPacketType PACTYP_MPEGH3DAFRAME and MHASPacketLabel in the range between 1 and 16 (this is the main stream).
An MPEG-H 3D Audio Data Frame may contain zero or more MHAS Packets with MHASPacketType PACTYP_MPEGH3DAFRAME and MHASPacketLabel greater than 16 (these are the sub streams). An MPEG-H 3D Audio Data Frame may contain additional MHAS Packets of other types; if present, a MHAS Packet with MHASPacketType PACTYP_MPEGH3DACFG, PACTYP_AUDIOSCENEINFO, or PACTYP_AUDIOTRUNCATION shall precede the MHAS Packet of Type PACTYP_MPEGH3DAFRAME.
For more detailed information about the different PACTYPs see ISO/IEC 23008-3, section 14. Information about which MHAS packets form an MPEG-H 3D Audio data frame is usually provided from an upstream transport layer, e. g. MPEG-2 transport stream or MP4 file format.
Timing Information
As not all MPEG-H 3D Audio data frames represent a fixed constant number of decoded PCM audio samples, additional timing information is provided by the PCM offset field in the burst payload header for each MPEG-H 3D Audio data frame. The PCM offset field specifies the temporal offset of the first decoded PCM sample of the MPEG-H 3D Audio data frame in PCM samples with respect to reference point R. The reference point R is the first bit of Pa of the IEC 61937 sync header.
More detailed information is provided in chapter 'Timestamp handling'.
MPEG-H 3D audio data frame length
IEC 61937-13 defines 6 different values of MPEG-H 3D Audio data frame length (1024, 2048, 4096, 768, 1536 and 3072) that can be used for the burst repetition period. These 6 values are the most common audio frame sizes of MPEG-H 3D Audio, if neither internal resampling nor truncation occurs. However, due to the flexibility enabled by the burst payload header, any of these burst repetition periods can be used to transport any MPEG-H 3D Audio Stream.
IMPLEMENTATION RESTRICTION: it is recommended that only the most common frame size for MPEG-H, 1024 samples, is used for transporting MHAS via IEC 61937-13. All other frame length can be ignored by the IEC 61937-13 receiver and sender implementation.
Burst spacing
IEC 61937-1 and IEC 61937-2 specify a burst spacing, i. e. the four 16-bit words preceding each sync header (Pa – Pd) shall be zero. This means that the maximum burst payload size is reduced by the size of the sync header and the burst spacing. Table 5 of IEC 61937-13 lists the maximum burst payload sizes for the 6 different burst repetition periods for non-HBR mode. Tables 8 – 11 of IEC 61937-13 list the maximum burst payload sizes for the 6 different burst repetition periods for the different HBR modes.