NHML Format - aureliendavid/gpac GitHub Wiki
The old NHNT Format is a very useful tool for multiplexing data, but is not user-friendly at all when dealing with complex cases such as multi-source media files or NHNT authoring (timing modification, data removal or insertion).
The NHML format has been therefore developed at Telecom ParisTech in order to provide more control about the imported data source and give the user the tools to easily modify the multiplexing process.
The NHML format is an XML-based description of a media file, just like NHNT, with some major enhancements. This format is supported since GPAC 0.4.2.
To obtain some sample NHML files, simply use MP4Box -nhml trackID srcFile
Just like any XML file, the file must begin with the usual xml header. The file encoding SHALL BE UTF-8.
The root element of an NHML file is the NHNTStream.
<NHNTStream baseMediaFile="..." specificInfoFile="..." trackID="..." inRootOD="..." DTS_increment="..." timeScale="..." streamType="..." objectTypeIndication="..." mediaType="..." mediaSubType="..." width="..." height="..." parNum="..." parDen="..." sampleRate="..." numChannels="..." bitsPerSample="..." compressorName="..." codecVersion="..." codecRevision="..." codecVendor="..." temporalQuality="..." spatialQuality="..." horizontalResolution="..." verticalResolution="..." bitDepth="..." >
<NHNTSample />
...
<NHNTSample />
</NHNTStream>
-
baseMediaFile
: indicates the default location of the stream data. If not set, the file with the same name and extension.media
is assumed to be the source. -
specificInfoFile
: indicates the location of the decoder configuration data if any. -
trackID
: indicates a desired trackID for this media when importing to IsoMedia. Value type: unsigned integer. Default Value: 0. -
inRootOD
: indicates if the imported stream is present in the InitialObjectDescriptor. Value type: "yes", "no". Default Value: "no". -
DTS_increment
: indicates a default time increment between two consecutive samples. Value type: unsigned integer. Default Value: 0. -
timeScale
: indicates the time scale in which the time stamps are given. Value type: unsigned integer. Default Value: 1000 or sample rate if specified. -
streamType
: identifies the media streamType as specified in MPEG-4 (0x04: Visual, 0x05: audio, ...). Officially supported stream types are listed here. -
objectTypeIndication
: identifies the media type as specified in MPEG-4. For example, 0x40 for MPEG-4 AAC. Officially supported object types are listed here. -
mediaType
: indicates the 4CC media type (handler) as used in IsoMedia. Not needed if streamType is specified. Value Type: 4 byte string. Officially supported handler types are listed here. -
mediaSubType
: indicates the 4CC media subtype (codec) to use in IsoMedia. This subtype will identify the sample description used (stsd table). Not needed if streamType is specified. Value Type: 4 byte string. Officially supported codec types are listed here. -
width
,height
: indicates the dimension of a visual media. Ignored if the media is not video (streamType 0x04 or mediaType "vide"). Value Type: unsigned integer. -
parNum
,parDen
: indicates the pixel aspect ratio of a visual media. Ignored if the media is not video (streamType 0x04 or mediaType "vide"). Value Type: unsigned integer. -
sampleRate
: indicates the sample rate of an audio media. Ignored if the media is not audio (streamType 0x05 or mediaType "soun"). Value Type: unsigned integer. -
numChannels
: indicates the number of channels of an audio media. Ignored if the media is not audio (streamType 0x05 or mediaType "soun"). Value Type: unsigned integer. -
bitsPerSample
: indicates the number of bits per audio sample for an audio media. Ignored if the media is not audio (streamType 0x05 or mediaType "soun"). Value Type: unsigned integer.
All other parameters are used when creating custum sample description in IsoMedia (eg, not using MPEG-4 streamType and ObjectTypeIndication). Their semantics are given in the QT (and IsoMedia) file format specification.
Each access unit is then described with a NHNTSample
element.
<NHNTSample DTS="..." CTSOffset="..." isRAP="..." isSyncShadow="..." mediaOffset="..." dataLength="..." mediaFile="..." xmlFrom="..." xmlTo="..." />
-
DTS
: decoding time stamp of the sample. If not set, the previous sample DTS (or 0) plus the specifiedDTS_increment
or the previous sampleduration
is used. Value type: unsigned integer. Default Value: 0. -
duration
: sets the duration of the sample. The duration set on the last sample will change the track duration. Default Value: 0. -
CTSOffset
: offset between the decoding and the composition time stamp of the sample. Value type: unsigned integer. Default Value: 0. -
isRAP
: indicates if the sample is a random access point or not. Value type: "yes", "no". Default Value: "no". -
isSyncShadow
: indicates if the sample is a sync shadow sample (IsoMedia storage only). Value type: "yes", "no". Default Value: "no". -
mediaOffset
: indicates the position of the first byte of this sample in the media source file. Value type: unsigned integer. Default Value: 0. -
dataLength
: indicates the size of this sample. Value type: unsigned integer. Default Value: 0. -
mediaFile
: indicates the media source file to use. If not set, thebaseMediaFile
is used. -
xmlFrom
: if the source file is XML data, indicates the location of the first element to copy fom the XML document. The location can be "doc.start", "elt_id.start" or "elt_id.end". Elements are idendified through their "id", "xml:id" or "DEF" attributes. -
xmlTo
: if the source file is XML data, indicates the location of the last element to copy fom the XML document. The location can be "doc.end", "elt_id.start" or "elt_id.end". Elements are idendified through their "id", "xml:id" or "DEF" attributes.
As of 0.5.1, it is possible to describe bit sequences when importing NHML. Both NHNTStream and NHNTSample may have child bitstream constructors, called BS.
These elements allow assembling bytes and files as needed to construct the sample or the sampleDescription. For an NHNTStream element, the BS elements shall be encapsulated in an DecoderSpecificInfo element. The content of the DecoderSpecificInfo element is then inserted in the ESD (MPEG-4 Systems), or after the base sampleDescription (ISOBMFF generic).
<BS bits="..." value="..." mediaOffset="..." mediaFile="..." dataLength="..." text="..." fcc="..."/>
-
bits
: number of bits used to write the value -
value
: integer value to write -
float
: float value to write, (32 bits) -
double
: double value to write (64 bits) -
mediaFile
ordataFile
: file to get data from -
mediaOffset
ordataOffset
: offset in the file -
mediaLength
ordataLength
: number of bytes to copy from the file -
text
orstring
: writes text without trailing 0. Ifbits
is set, first writes the size of the text string usingbits
bits -
fcc
: writes a four character code on 32 bits -
ID128
: writes a 128 bit value given in hexadecimal -
data64
: writes data given encoded in base64. Ifbits
is set, first writes the size of the data usingbits
bits. -
data
: writes data given in hexadecimal. Ifbits
is set, first writes the size of the data usingbits
bits.
This example was used to generate files conforming to ISO/IEC 14496-18 AMD1. It shows how the bitstream constructor is used to create a custom font sample description fntC in the stsd entry called fnt1. The duration on the last sample is used to extend the duration of the track.
<?xml version="1.0" encoding="UTF-8"?>
<NHNTStream version="1.0" timeScale="1000" trackID="1" mediaType="fdsm" mediaSubType="fnt1">
<DecoderSpecificInfo>
<BS id="size" bits="32" value="24" />
<!-- box size is 4+4+3+strlen(TriodPostnaja)-->
<BS id="type" fcc="fntC" />
<BS id="fontFormat" bits="7" value="1" />
<BS id="storeFont" bits="1" value="0" />
<BS id="fontName" bits="8" text="TriodPostnaja" />
<BS id="fontSubsetID" bits="7" value="1" />
<BS id="reserved" bits="1" value="1" />
</DecoderSpecificInfo>
<NHNTSample DTS="0" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_CyrillicCaps.ttf" />
<NHNTSample DTS="2000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_CyrillicSmall.ttf" />
<NHNTSample DTS="4000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_LatinCaps.ttf" />
<NHNTSample DTS="6000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_LatinSmall.ttf" />
<NHNTSample DTS="8000" duration="4000" isRAP="yes" mediaFile="TriodPostnaja\_subsets/TriodPostnaja\_symbols+numerals.ttf" />
</NHNTStream>
As of revision 5601, it is posible to convert an XML file with BS syntax element to a binary file directly using MP4Box -bin source.xml
. The source file can be any XML file, not just NHML file. BS element can furthermore be located in children nodes if needed.