DataSegment - radishengine/drowsy GitHub Wiki
Require Path: 'DataSegment'
The DataSegment interface is a way to deal with binary data sources. Much of the time a DataSegment is a thin wrapper around a Blob, but it can also represent other data sources, like a remote URL download. Like a Blob, it has a .type
string property which is intended to be used as a MIME-style type descriptor. Unlike Blob, it does not necessarily have a fixed, known size. Many of its methods return Promises instead of immediate values (and unlike using the FileReader interface to read data from a Blob).
The DataSegment.from(source [,typeDescriptor])
function takes a source
that can be:
- ...another DataSegment
- ...a Blob
- ...an ArrayBuffer, or an ArrayBufferView (like Uint8Array, DataView etc.)
- ...an array of these, to be concatenated together
(usually in the order given, but not always -- see "Advanced Concatenation" below)
If the optional second parameter typeDescriptor
is specified, it can either be:
- a type name string, like:
'audio/wav'
- a type descriptor string, with a name and one or more
name=value
parameters separated by semicolons, like:'text/plain; charset=utf-8'
- an array containing a type name string and an object for the parameters, like
['text/plain', {charset:'utf-8'}]
- a TypeDescriptor object
If typeDescriptor
is not specified, then:
- ...if
source
is a Blob, andsource.type
is not the empty string, the type is taken from here. - ...if
source
is an array of'application/x-joinable'
-type DataSegments/Blobs, and they also specify the data type that they all join together to become, then that type is used.
(see "Advanced Concatenation" below) - ...otherwise,
'application/octet-stream'
is used.
You can also create a DataSegment as a slice of an existing DataSegment by using dataSegment.getSegment(typeDescriptor [, offset [, minLength [, maxLength]]])
.
-
offset
is usually specified as a non-negative integer, but it can also be'suffix'
, meaning that the segment is to be sliced from the end of available data - if
offset
is not specified, it is assumed to be0
-
minLength
andmaxLength
are also usually specified as non-negative integers, but they can also be'all'
, which means every available byte fromoffset
- if
minLength, maxLength
are both unspecified, they are taken to be0, 'all'
- if
minLength
is specified butmaxLength
is not,maxLength
is taken to be the same asminLength
-
minLength
must be less than or equal tomaxLength
, where'all'
is equivalent to+Infinity
- if
minLength
bytes fromoffset
is greater thandataSegment.maxLength
, a RangeError is thrown
dataSegment.getSegment(t)
and DataSegment.from(dataSegment, t)
are equivalent.
If the source
parameter of DataSegment.from()
is an array of 'application/x-joinable'
-type DataSegments/Blobs:
- ...the list may be rearranged according to the
order
type parameter
For example, if the list contains an'application/x-joinable; order=2'
followed by an'application/x-joinable; order=1'
, the two elements will be swapped around - ...an error will be thrown if there is more than one with type parameter
last=true
, or if there is one with a higherorder
than the one that haslast=true
.
(It's not required for any part to havelast=true
, though.) - ...if every element has the same
to
type parameter, the default type (that is, the type that will be used if the second parameter ofDataSegment.from()
is not specified) will be the common value ofto
, concatenated with'; x-join-parts='
, followed by the lengths of each of the parts, each separated by a,
comma
For example, if you use DataSegment.from([a, b])
to concatenate:
-
a
: a 100-byte'application/x-joinable; to=some/type
' -
b
: a 250-byte'application/x-joinable; to=some/type
'
...the result will have a type of 'some/type; x-join-parts=100,250'
dataSegment.getBytes([offset [, minLength [, maxLength]]])
returns a Promise that resolves to a Uint8Array containing the data that was read. offset
, minLength
and maxLength
have the same semantics as in .getSegment()
except that if minLength
is not specified, it is taken as 'all'
instead of 0
. If at least minLength
bytes cannot be read from the data source, the Promise fails.
dataSegment.getBytes(15, 10)
.then(function(bytes) {
console.log('10 bytes read at offset 15:', bytes);
})
.else(function() {
console.log('less than 25 bytes available!');
});
Even if minLength
is 0
, the byte array is guaranteed have a length of at least 1 unless there are genuinely no bytes to be read from the data source.
dataSegment.getArrayBuffer()
and .getDataView()
take the same parameters as .getBytes()
but the returned Promise resolves to an ArrayBuffer/DataView instead.
dataSegment.getInt32(offset, littleEndian).then(...)
is equivalent to:
dataSegment.getDataView(offset, 4, 4)
.then(function(dataView) {
return dataView.getInt32(0, littleEndian);
})
.then(...)
There are similar parallels for the other DataView.get<Type>()
methods, and also .getInt64()
and .getUint64()
which will attempt to read a 64-bit integer as either a JavaScript number (if it can be represented this way without loss of precision) or a string containing the integer encoded as a hexadecimal literal (with the 0x
prefix, and -
before that for negative numbers).
The .minLength
and .maxLength
fields hold values equivalent to the parameters passed in to .getSegment()
. Note that they may be normalized according to the context, instead of the literal values passed in (for example, 'maxLength'
may be reduced to fit the number of available bytes, if known).
If .minLength
and .maxLength
are the same value (and that value is not 'all'
) then .hasFixedLength
will be true
and .fixedLength
will be the common value. Otherwise, .hasFixedLength
will be false
and .fixedLength
will be NaN
.
Calling .withFixedLength()
will return a Promise. If .hasFixedLength
is true, the Promise will resolve to the same segment (or an equivalent clone, if .isUnique
is true). Otherwise, it will resolve to a new DataSegment where .hasFixedLength
is true, and .fixedLength
can be used.
Like Blob, DataSegment does not require that its type descriptor be any of the actual official IANA-registered media types. By convention, if the type category (the part up to /
) is one of the official ones like application/
and image/
, and the subtype is non-official, the subtype should begin with the x-
prefix. If the top-level type is non-official, the convention is not to prefix the subtype in this way.
To maintain compatibility with the Blob .type
property, DataSegment's type descriptor is also required to be 7-bit ASCII, all lower-case. In order to preserve upper-case letters and other characters that cannot normally be used in a type parameter value (including ;
), these characters should be percent-encoded. When a parameter value is an arbitrary string instead of a fixed value, it is usually preferable to use the ['category/subtype', {param:value, ...}]
notation to define the type descriptor, as the percent-encoding of value
will be handled automatically.
You can get information about a DataSegment's type using the following properties:
-
.typeDescriptor
: a TypeDescriptor object -
.type
:'category/subtype; paramName=paramValue'
(equivalent to.typeDescriptor.toString()
) -
.typeName
:'category/subtype'
(i.e. parameters are removed, if any)
(equivalent to.typeDescriptor.name
) -
.typeCategory
:'category'
(equivalent to.typeDescriptor.category
) -
.subtype
':'subtype'
(equivalent to.typeDescriptor.subtype
) -
.typeParameters
:{paramName:'paramValue', ...}
(equivalent to.typeDescriptor.parameters
) -
.isBrowserReady
:true
if the type is one of the Browser-Ready Types
These are all read-only properties. To change any part of the type, use DataSegment.from(segment, newTypeDescriptor)
.
dataSegment.getBlob()
returns a Promise that resolves to a Blob that contains the segment data and has the same .type
as dataSegment
.
dataSegment.asBlobParameter
will either be null
or, if possible, contains the segment data in one of the types that the new Blob()
constructor accepts as elements of its array
parameter.
Sometimes a DataSegment gets re-used. For example, if you do this:
var newSegment = DataSegment.from(dataSegment);
...newSegment
is likely to end up assigned the same object as dataSegment
. This is no problem if you are treating them as immutable, but it will not be what you want if you are planning to add custom fields/methods to these objects that might interfere with each other.
To combat this, you can use the .isUnique
field. This field is false
by default, but if you set it to true, .getSegment()
will instead return an equivalent clone, and so should any other method that might be tempted to return the exact same object.
You can also use the .unique()
helper method. What this does is:
- if
.isUnique
istrue
: - call
.getSegment(this.type)
to create a new clone - set
clone.isUnique
totrue
- return
clone
- if
.isUnique
isfalse
: - set
.isUnique
totrue
- return
this
To be used like this, before using any custom fields (or putting the object in some context where it may later be given custom fields):
var newSegment = dataSegment.getSegment(dataSegment.type, offset).unique();
newSegment.blabla = 'this';
// ...
dataSegment = dataSegment.unique();
dataSegment.blabla = 'that'; // does not clobber newSegment.blabla
Splitting is an operation that turns one DataSegment into a stream of component DataSegments, in whichever way seems appropriate according to its type descriptor.
This is a multi-purpose operation that covers several distinct use-cases:
- Retrieve each chunk, in turn, of a "chunk stream"-based format
- Disambiguate a file or piece of data that could be one of a number of things
(Or even several of them, at once: you might get multiple "overlapping" DataSegments that cover the same raw data, interpreted in different ways.) - Get a DataSegment representing the original data from a compressed, encrypted or otherwise encoded DataSegment
Note: At the time that you receive a decoded DataSegment object, the actual decoding process may not have completed (or even started) yet. That's why it is said to be a DataSegment representing (and not containing) the original data. If you want to get something that definitely contains the entire original data, decoded as quickly as possible, the best way is probably to use.getBlob().then(...)
- For data that is not technically compressed but is tightly-packed (e.g. bitplanes), provide an expanded, byte-aligned alternate version that is easier to deal with
For example: In a 16-color image where each palette entry's RGB values are packed as 5:6:5 bits, and the pixel data packs 2 pixels into every byte, provide alternate versions of the palette and pixel data where each RGB component and every pixel gets its own byte. - For data that directly represents certain basic kinds of media (e.g. a static image, a sound, a text document), a converted "browser-ready" version should be provided wherever possible (see Browser-Ready Types), as well as the "raw" original form
dataSegment.split( [filter,] [func] )
always returns a Promise. What the Promise ultimately resolves to depends on whether you passed a callback function for func
or not:
/* way #1 */
dataSegment.split()
.then(function(componentSegments) {
// wait until all components have been found,
// then get an array of DataSegments
});
/* way #2 */
dataSegment.split(function(componentSegment) {
// deal with each component immediately
// as it is discovered
})
.then(function() {
// do something once it's been confirmed
// there are no more components to come
// (no "components" parameter is passed here,
// unlike in way #1)
});
As well as the optional callback func
, there is an optional filter
parameter that allows you to specify which data type(s) you are interested in. You can pass any of the following for filter
:
- A type descriptor string
- An array of type descriptor strings
- A regular expression to match against the type name (only the
'category/subtype'
part, not including parameters) - A TypeDescriptor object
- A TypeDescriptor.filter() object
The split operation has access to this filter and may decide to change the way it works depending on which types the filter specifies, like skipping over whole parts of the file if it knows that there's nothing the filter will accept there.
If the DataSegment's data is invalid in some way, the Promise returned by .split()
will only fail on the most basic verification errors, like a sudden unexpected end-of-file when the format specifies there's supposed to be more data there, or the wrong "magic number" set in a header.
In particular, checksums are not checked.
Bottom line: Do not assume that something must be 100% valid just because .split()
did not fail.