HDMI forward connection - Fraunhofer-IIS/iec61937-13 GitHub Wiki

The High-Definition Multimedia Interface (HDMI) is the dominant interface for audiovisual CE devices such as TV sets, AV receivers, set-top boxes or Blu-ray / DVD players.

HDMI is a unidirectional connection from a source device, like a set-top box, to a sink device, like a TV set. HDMI supports digital video and audio as well as means to negotiate on device capabilities, e.g. supported video resolutions and audio codecs.

HDMI defines a return channel for audio (ARC) from sink to source, e.g. from a TV set to an AV receiver, but support for ARC is optional.

The versions 1.4b and 2.1a of HDMI are currently (summer 2022) in force. HDMI Specification version 2.1a incorporates HDMI Specification 1.4b by reference and defines additional and improved functionality.

Audio functionality is specified in section 7 of HDMI 1.4b and in section 9 of HDMI 2.1a. Audio and video data are transmitted in a multiplexed way from the source device to the sink device. The audio data is formatted according to IEC 60958/61937. Therefore, uncompressed PCM audio as well as compressed audio bitstreams can be transmitted. For MPEG-H, formatting according to IEC 61937-13 applies.

The sink device (e.g. TV set) signals to the source device (e.g. set-top box) it’s video and audio capabilities via EDID messages (see below). The source device selects an option from the capabilities of the sink device and signals the used format to the sink device via HDMI InfoFrames. For audio, the source device signals the format via Audio InfoFrames, IEC 60958 channel status bits and IEC 61937 Burst info and/or stream data.

Signaling within the Audio InfoFrame is generic (refer to stream header) concerning the codec type, so that low level HDMI driver software can be written in a generic way independent of the used codec. Codec specific details are signaled within the burst info / stream data as described in IEC 61937-13.

Depending on the selected bitrate for the transmission (either non-HBR or HBR with factors 2, 4, 8 or 16), data will be transferred in either HDMI Audio packet layout 0 (representing 1 “SPDIF line” of 2 audio channels) or HDMI Audio packet layout 1 (representing 4 “SPDIF lines” of 2 audio channels each, i.e. 8 audio channels) with appropriate sampling rates to generate the multiplication factors 1, 2, 4, 8 or 16 as needed.

If HDMI Audio packet layout 1 is used, the data is multiplexed on the 4 “SPDIF lines” on a sample by sample basis as described in section 7.6.2 of HDMI 1.4b, Table 7-8:

(HDMI 1.4b) Table 7-8 High Bitrate Audio Stream Packet Layout

Subpkt 0 Subpkt 1 Subpkt 2 Subpkt 3
Frame x+0 Frame x+1 Frame x+2 Frame x+3

This means that multiplexing / de-multiplexing is necessary to use the 4 stereo paths provided by the typical HDMI chip as a single bitstream formatted according to IEC 61937.

For 48 kHz basic sampling frequency, the full table of HDMI transmission modes is:

Bitrate multiplication factor HDMI mode Sampling frequency
1 Layout 0 (2 channels) 48 kHz
2 Layout 0 (2 channels) 96 kHz
4 Layout 0 (2 channels) 192 kHz
8 Layout 1 (8 channels) 96 kHz
16 Layout 1 (8 channels) 192 kHz

All of these modes need to be supported by an MPEG-H capable sink device with HDMI input.

EDID and CTA-861

EDID enables plug-and-play capabilities. EDID data is transmitted over the display data channel (DDC) of an HDMI interface from sink to source. It is also used in other display interfaces like DVI and DisplayPort. TThe EDID data is stored in the sink and describes the audio and video formats that the sink device (e.g. AVR) is capable of receiving and rendering.

Audio formats are described with Short Audio Descriptors (SAD) as defined in CTA-861. For MPEG-H the Short Audio Descriptor is constructed as follows:

Table 5 - Short Audio Descriptor for MPEG-H

bits
Byte# 7 6 5 4 3 2 1 0
1 F17=0 Audio Format Code=1111 MPEG-H 3DA level
2 F27=0 192 kHz 176.4 kHz 96 kHz 88.2 kHz 48 kHz 44.1 kHz 32 kHz
2 Audio Coding Extension Type Code=0x0B F32=0 BP LCP

The Bits ‘F17, F27, F32, F31, F30’ are flags to signal Audio Format Code dependent values. Bit F30 is defined to signal the Low Complexity Profile when set to 1, bit F31 is defined to signal the Baseline Profile, all other bits are reserved and shall be set to 0.

Bits ‘F12, F11, F10’ are used to signal the MPEG-H 3DA Level according to the following table:

Bit of Byte 1 MPEG-H 3DA level
F12 F11 F10
0 0 0 Unspecified
0 0 1 Level 1
0 1 0 Level 2
0 1 1 Level 3
1 0 0 Level 4
1 0 1 Level 5
1 1 0 Reserved
1 1 1 Reserved

If the MPEG-H 3DA Level is unspecified at least level 3 shall be supported

⚠️ **GitHub.com Fallback** ⚠️