DataSegment - radishengine/drowsy GitHub Wiki

Require Path: 'DataSegment'

The DataSegment interface is a way to deal with binary data sources. Much of the time a DataSegment is a thin wrapper around a Blob, but it can also represent other data sources, like a remote URL download. Like a Blob, it has a .type string property which is intended to be used as a MIME-style type descriptor. Unlike Blob, it does not necessarily have a fixed, known size. Many of its methods return Promises instead of immediate values (and unlike using the FileReader interface to read data from a Blob).

Creating a DataSegment

The DataSegment.from(source [,typeDescriptor]) function takes a source that can be:

If the optional second parameter typeDescriptor is specified, it can either be:

  • a type name string, like: 'audio/wav'
  • a type descriptor string, with a name and one or more name=value parameters separated by semicolons, like: 'text/plain; charset=utf-8'
  • an array containing a type name string and an object for the parameters, like ['text/plain', {charset:'utf-8'}]
  • a TypeDescriptor object

If typeDescriptor is not specified, then:

  • ...if source is a Blob, and source.type is not the empty string, the type is taken from here.
  • ...if source is an array of 'application/x-joinable'-type DataSegments/Blobs, and they also specify the data type that they all join together to become, then that type is used.
    (see "Advanced Concatenation" below)
  • ...otherwise, 'application/octet-stream' is used.

You can also create a DataSegment as a slice of an existing DataSegment by using dataSegment.getSegment(typeDescriptor [, offset [, minLength [, maxLength]]]).

  • offset is usually specified as a non-negative integer, but it can also be 'suffix', meaning that the segment is to be sliced from the end of available data
  • if offset is not specified, it is assumed to be 0
  • minLength and maxLength are also usually specified as non-negative integers, but they can also be 'all', which means every available byte from offset
  • if minLength, maxLength are both unspecified, they are taken to be 0, 'all'
  • if minLength is specified but maxLength is not, maxLength is taken to be the same as minLength
  • minLength must be less than or equal to maxLength, where 'all' is equivalent to +Infinity
  • if minLength bytes from offset is greater than dataSegment.maxLength, a RangeError is thrown

dataSegment.getSegment(t) and DataSegment.from(dataSegment, t) are equivalent.

Advanced Concatenation

If the source parameter of DataSegment.from() is an array of 'application/x-joinable'-type DataSegments/Blobs:

  • ...the list may be rearranged according to the order type parameter
    For example, if the list contains an 'application/x-joinable; order=2' followed by an 'application/x-joinable; order=1', the two elements will be swapped around
  • ...an error will be thrown if there is more than one with type parameter last=true, or if there is one with a higher order than the one that has last=true.
    (It's not required for any part to have last=true, though.)
  • ...if every element has the same to type parameter, the default type (that is, the type that will be used if the second parameter of DataSegment.from() is not specified) will be the common value of to, concatenated with '; x-join-parts=', followed by the lengths of each of the parts, each separated by a , comma

For example, if you use DataSegment.from([a, b]) to concatenate:

  • a: a 100-byte 'application/x-joinable; to=some/type'
  • b: a 250-byte 'application/x-joinable; to=some/type'

...the result will have a type of 'some/type; x-join-parts=100,250'

Reading Data

dataSegment.getBytes([offset [, minLength [, maxLength]]]) returns a Promise that resolves to a Uint8Array containing the data that was read. offset, minLength and maxLength have the same semantics as in .getSegment() except that if minLength is not specified, it is taken as 'all' instead of 0. If at least minLength bytes cannot be read from the data source, the Promise fails.

dataSegment.getBytes(15, 10)
.then(function(bytes) {
  console.log('10 bytes read at offset 15:', bytes);
})
.else(function() {
  console.log('less than 25 bytes available!');
});

Even if minLength is 0, the byte array is guaranteed have a length of at least 1 unless there are genuinely no bytes to be read from the data source.

dataSegment.getArrayBuffer() and .getDataView() take the same parameters as .getBytes() but the returned Promise resolves to an ArrayBuffer/DataView instead.

dataSegment.getInt32(offset, littleEndian).then(...) is equivalent to:

dataSegment.getDataView(offset, 4, 4)
.then(function(dataView) {
  return dataView.getInt32(0, littleEndian);
})
.then(...)

There are similar parallels for the other DataView.get<Type>() methods, and also .getInt64() and .getUint64() which will attempt to read a 64-bit integer as either a JavaScript number (if it can be represented this way without loss of precision) or a string containing the integer encoded as a hexadecimal literal (with the 0x prefix, and - before that for negative numbers).

Segment Data Length

The .minLength and .maxLength fields hold values equivalent to the parameters passed in to .getSegment(). Note that they may be normalized according to the context, instead of the literal values passed in (for example, 'maxLength' may be reduced to fit the number of available bytes, if known).

If .minLength and .maxLength are the same value (and that value is not 'all') then .hasFixedLength will be true and .fixedLength will be the common value. Otherwise, .hasFixedLength will be false and .fixedLength will be NaN.

Calling .withFixedLength() will return a Promise. If .hasFixedLength is true, the Promise will resolve to the same segment (or an equivalent clone, if .isUnique is true). Otherwise, it will resolve to a new DataSegment where .hasFixedLength is true, and .fixedLength can be used.

Segment Data Type

Like Blob, DataSegment does not require that its type descriptor be any of the actual official IANA-registered media types. By convention, if the type category (the part up to /) is one of the official ones like application/ and image/, and the subtype is non-official, the subtype should begin with the x- prefix. If the top-level type is non-official, the convention is not to prefix the subtype in this way.

To maintain compatibility with the Blob .type property, DataSegment's type descriptor is also required to be 7-bit ASCII, all lower-case. In order to preserve upper-case letters and other characters that cannot normally be used in a type parameter value (including ;), these characters should be percent-encoded. When a parameter value is an arbitrary string instead of a fixed value, it is usually preferable to use the ['category/subtype', {param:value, ...}] notation to define the type descriptor, as the percent-encoding of value will be handled automatically.

You can get information about a DataSegment's type using the following properties:

  • .typeDescriptor: a TypeDescriptor object
  • .type: 'category/subtype; paramName=paramValue'
    (equivalent to .typeDescriptor.toString())
  • .typeName: 'category/subtype' (i.e. parameters are removed, if any)
    (equivalent to .typeDescriptor.name)
  • .typeCategory: 'category'
    (equivalent to .typeDescriptor.category)
  • .subtype': 'subtype'
    (equivalent to .typeDescriptor.subtype)
  • .typeParameters: {paramName:'paramValue', ...} (equivalent to .typeDescriptor.parameters)
  • .isBrowserReady: true if the type is one of the Browser-Ready Types

These are all read-only properties. To change any part of the type, use DataSegment.from(segment, newTypeDescriptor).

Blob Construction

dataSegment.getBlob() returns a Promise that resolves to a Blob that contains the segment data and has the same .type as dataSegment.

dataSegment.asBlobParameter will either be null or, if possible, contains the segment data in one of the types that the new Blob() constructor accepts as elements of its array parameter.

Ensuring Uniqueness

Sometimes a DataSegment gets re-used. For example, if you do this:

var newSegment = DataSegment.from(dataSegment);

...newSegment is likely to end up assigned the same object as dataSegment. This is no problem if you are treating them as immutable, but it will not be what you want if you are planning to add custom fields/methods to these objects that might interfere with each other.

To combat this, you can use the .isUnique field. This field is false by default, but if you set it to true, .getSegment() will instead return an equivalent clone, and so should any other method that might be tempted to return the exact same object.

You can also use the .unique() helper method. What this does is:

  • if .isUnique is true:
  • call .getSegment(this.type) to create a new clone
  • set clone.isUnique to true
  • return clone
  • if .isUnique is false:
  • set .isUnique to true
  • return this

To be used like this, before using any custom fields (or putting the object in some context where it may later be given custom fields):

var newSegment = dataSegment.getSegment(dataSegment.type, offset).unique();
newSegment.blabla = 'this';
// ...
dataSegment = dataSegment.unique();
dataSegment.blabla = 'that'; // does not clobber newSegment.blabla

Splitting

Splitting is an operation that turns one DataSegment into a stream of component DataSegments, in whichever way seems appropriate according to its type descriptor.

This is a multi-purpose operation that covers several distinct use-cases:

  • Retrieve each chunk, in turn, of a "chunk stream"-based format
  • Disambiguate a file or piece of data that could be one of a number of things
    (Or even several of them, at once: you might get multiple "overlapping" DataSegments that cover the same raw data, interpreted in different ways.)
  • Get a DataSegment representing the original data from a compressed, encrypted or otherwise encoded DataSegment
    Note: At the time that you receive a decoded DataSegment object, the actual decoding process may not have completed (or even started) yet. That's why it is said to be a DataSegment representing (and not containing) the original data. If you want to get something that definitely contains the entire original data, decoded as quickly as possible, the best way is probably to use .getBlob().then(...)
  • For data that is not technically compressed but is tightly-packed (e.g. bitplanes), provide an expanded, byte-aligned alternate version that is easier to deal with
    For example: In a 16-color image where each palette entry's RGB values are packed as 5:6:5 bits, and the pixel data packs 2 pixels into every byte, provide alternate versions of the palette and pixel data where each RGB component and every pixel gets its own byte.
  • For data that directly represents certain basic kinds of media (e.g. a static image, a sound, a text document), a converted "browser-ready" version should be provided wherever possible (see Browser-Ready Types), as well as the "raw" original form

dataSegment.split( [filter,] [func] ) always returns a Promise. What the Promise ultimately resolves to depends on whether you passed a callback function for func or not:

/* way #1 */
dataSegment.split()
.then(function(componentSegments) {
  // wait until all components have been found,
  // then get an array of DataSegments
});

/* way #2 */
dataSegment.split(function(componentSegment) {
  // deal with each component immediately
  // as it is discovered
})
.then(function() {
  // do something once it's been confirmed
  // there are no more components to come
  // (no "components" parameter is passed here,
  //  unlike in way #1)
});

As well as the optional callback func, there is an optional filter parameter that allows you to specify which data type(s) you are interested in. You can pass any of the following for filter:

  • A type descriptor string
  • An array of type descriptor strings
  • A regular expression to match against the type name (only the 'category/subtype' part, not including parameters)
  • A TypeDescriptor object
  • A TypeDescriptor.filter() object

The split operation has access to this filter and may decide to change the way it works depending on which types the filter specifies, like skipping over whole parts of the file if it knows that there's nothing the filter will accept there.

Splitting and Data Validity

If the DataSegment's data is invalid in some way, the Promise returned by .split() will only fail on the most basic verification errors, like a sudden unexpected end-of-file when the format specifies there's supposed to be more data there, or the wrong "magic number" set in a header.

In particular, checksums are not checked.

Bottom line: Do not assume that something must be 100% valid just because .split() did not fail.

⚠️ **GitHub.com Fallback** ⚠️