NHML Format - aureliendavid/gpac GitHub Wiki

HOME » MP4Box » NHML

The old NHNT Format is a very useful tool for multiplexing data, but is not user-friendly at all when dealing with complex cases such as multi-source media files or NHNT authoring (timing modification, data removal or insertion).

The NHML format has been therefore developed at Telecom ParisTech in order to provide more control about the imported data source and give the user the tools to easily modify the multiplexing process.

The NHML format is an XML-based description of a media file, just like NHNT, with some major enhancements. This format is supported since GPAC 0.4.2.

To obtain some sample NHML files, simply use MP4Box -nhml trackID srcFile

The NHML file format

Just like any XML file, the file must begin with the usual xml header. The file encoding SHALL BE UTF-8.

The root element of an NHML file is the NHNTStream.

Syntax

<NHNTStream baseMediaFile="..." specificInfoFile="..." trackID="..." inRootOD="..." DTS_increment="..." timeScale="..." streamType="..." objectTypeIndication="..." mediaType="..." mediaSubType="..." width="..." height="..." parNum="..." parDen="..." sampleRate="..." numChannels="..." bitsPerSample="..." compressorName="..." codecVersion="..." codecRevision="..." codecVendor="..." temporalQuality="..." spatialQuality="..." horizontalResolution="..." verticalResolution="..." bitDepth="..." >

    <NHNTSample />    
        ...    
    <NHNTSample />
    
</NHNTStream>

Semantics

  • baseMediaFile : indicates the default location of the stream data. If not set, the file with the same name and extension .media is assumed to be the source.
  • specificInfoFile : indicates the location of the decoder configuration data if any.
  • trackID : indicates a desired trackID for this media when importing to IsoMedia. Value type: unsigned integer. Default Value: 0.
  • inRootOD : indicates if the imported stream is present in the InitialObjectDescriptor. Value type: "yes", "no". Default Value: "no".
  • DTS_increment : indicates a default time increment between two consecutive samples. Value type: unsigned integer. Default Value: 0.
  • timeScale : indicates the time scale in which the time stamps are given. Value type: unsigned integer. Default Value: 1000 or sample rate if specified.
  • streamType : identifies the media streamType as specified in MPEG-4 (0x04: Visual, 0x05: audio, ...). Officially supported stream types are listed here.
  • objectTypeIndication : identifies the media type as specified in MPEG-4. For example, 0x40 for MPEG-4 AAC. Officially supported object types are listed here.
  • mediaType : indicates the 4CC media type (handler) as used in IsoMedia. Not needed if streamType is specified. Value Type: 4 byte string. Officially supported handler types are listed here.
  • mediaSubType : indicates the 4CC media subtype (codec) to use in IsoMedia. This subtype will identify the sample description used (stsd table). Not needed if streamType is specified. Value Type: 4 byte string. Officially supported codec types are listed here.
  • widthheight : indicates the dimension of a visual media. Ignored if the media is not video (streamType 0x04 or mediaType "vide"). Value Type: unsigned integer.
  • parNumparDen : indicates the pixel aspect ratio of a visual media. Ignored if the media is not video (streamType 0x04 or mediaType "vide"). Value Type: unsigned integer.
  • sampleRate : indicates the sample rate of an audio media. Ignored if the media is not audio (streamType 0x05 or mediaType "soun"). Value Type: unsigned integer.
  • numChannels : indicates the number of channels of an audio media. Ignored if the media is not audio (streamType 0x05 or mediaType "soun"). Value Type: unsigned integer.
  • bitsPerSample : indicates the number of bits per audio sample for an audio media. Ignored if the media is not audio (streamType 0x05 or mediaType "soun"). Value Type: unsigned integer.

All other parameters are used when creating custum sample description in IsoMedia (eg, not using MPEG-4 streamType and ObjectTypeIndication). Their semantics are given in the QT (and IsoMedia) file format specification.

Each access unit is then described with a NHNTSample element.

Syntax

<NHNTSample DTS="..." CTSOffset="..." isRAP="..." isSyncShadow="..." mediaOffset="..." dataLength="..." mediaFile="..." xmlFrom="..." xmlTo="..." />

Semantics

  • DTS : decoding time stamp of the sample. If not set, the previous sample DTS (or 0) plus the specified DTS_increment or the previous sample duration is used. Value type: unsigned integer. Default Value: 0.
  • duration : sets the duration of the sample. The duration set on the last sample will change the track duration. Default Value: 0.
  • CTSOffset : offset between the decoding and the composition time stamp of the sample. Value type: unsigned integer. Default Value: 0.
  • isRAP : indicates if the sample is a random access point or not. Value type: "yes", "no". Default Value: "no".
  • isSyncShadow : indicates if the sample is a sync shadow sample (IsoMedia storage only). Value type: "yes", "no". Default Value: "no".
  • mediaOffset : indicates the position of the first byte of this sample in the media source file. Value type: unsigned integer. Default Value: 0.
  • dataLength : indicates the size of this sample. Value type: unsigned integer. Default Value: 0.
  • mediaFile : indicates the media source file to use. If not set, the baseMediaFile is used.
  • xmlFrom : if the source file is XML data, indicates the location of the first element to copy fom the XML document. The location can be "doc.start", "elt_id.start" or "elt_id.end". Elements are idendified through their "id", "xml:id" or "DEF" attributes.
  • xmlTo : if the source file is XML data, indicates the location of the last element to copy fom the XML document. The location can be "doc.end", "elt_id.start" or "elt_id.end". Elements are idendified through their "id", "xml:id" or "DEF" attributes.

Bitstream constructing

As of 0.5.1, it is possible to describe bit sequences when importing NHML. Both NHNTStream and NHNTSample may have child bitstream constructors, called BS. These elements allow assembling bytes and files as needed to construct the sample or the sampleDescription. For an NHNTStream element, the BS elements shall be encapsulated in an DecoderSpecificInfo element. The content of the DecoderSpecificInfo element is then inserted in the ESD (MPEG-4 Systems), or after the base sampleDescription (ISOBMFF generic).

Syntax

<BS  bits="..." value="..." mediaOffset="..." mediaFile="..." dataLength="..." text="..." fcc="..."/>

Semantics

  • bits : number of bits used to write the value
  • value : integer value to write
  • float : float value to write, (32 bits)
  • double : double value to write (64 bits)
  • mediaFile or dataFile: file to get data from
  • mediaOffset or dataOffset: offset in the file
  • mediaLength or dataLength: number of bytes to copy from the file
  • text or string: writes text without trailing 0. If bits is set, first writes the size of the text string using bits bits
  • fcc: writes a four character code on 32 bits
  • ID128: writes a 128 bit value given in hexadecimal
  • data64: writes data given encoded in base64. If bits is set, first writes the size of the data using bits bits.
  • data: writes data given in hexadecimal. If bits is set, first writes the size of the data using bits bits.

Example

This example was used to generate files conforming to ISO/IEC 14496-18 AMD1. It shows how the bitstream constructor is used to create a custom font sample description fntC in the stsd entry called fnt1. The duration on the last sample is used to extend the duration of the track.

<?xml version="1.0" encoding="UTF-8"?>
<NHNTStream version="1.0" timeScale="1000" trackID="1" mediaType="fdsm" mediaSubType="fnt1">
   <DecoderSpecificInfo>
      <BS id="size" bits="32" value="24" />
      <!-- box size is 4+4+3+strlen(TriodPostnaja)-->
      <BS id="type" fcc="fntC" />
      <BS id="fontFormat" bits="7" value="1" />
      <BS id="storeFont" bits="1" value="0" />
      <BS id="fontName" bits="8" text="TriodPostnaja" />
      <BS id="fontSubsetID" bits="7" value="1" />
      <BS id="reserved" bits="1" value="1" />
   </DecoderSpecificInfo>
   <NHNTSample DTS="0" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_CyrillicCaps.ttf" />
   <NHNTSample DTS="2000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_CyrillicSmall.ttf" />
   <NHNTSample DTS="4000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_LatinCaps.ttf" />
   <NHNTSample DTS="6000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_LatinSmall.ttf" />
   <NHNTSample DTS="8000" duration="4000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_symbols+numerals.ttf" />
</NHNTStream>

Using BS constructor outside of NHML

As of revision 5601, it is posible to convert an XML file with BS syntax element to a binary file directly using MP4Box -bin source.xml . The source file can be any XML file, not just NHML file. BS element can furthermore be located in children nodes if needed.

⚠️ **GitHub.com Fallback** ⚠️