This page is an attempt to summarize the core concepts of the project.

Intended as a starting-point for understanding the project on a technical level, maintainers should aim to keep this page up-to-date, and also, to avoid a high level of detail: keep it brief here, and expand on the concept in further page(s).

Format

A Format describes a way to interpret a stream of bytes, giving structure and context to what would otherwise be just a long list of numbers.

Format Descriptor String

A Format Descriptor String is a string of the form:

category/subtype
category/subtype; param=value
category/subtype; param="quoted value" ("enclosing quotes" are not part of the value)
category/subtype; param1=value1; param2=value2; param3=value3 (...and so on)

Descriptor strings must be all lower-case, 7-bit ASCII characters, with no new-line characters and no unprintable characters.

Format Name

The Format Name is the first part of a descriptor string, of the form 'category/subtype', without any of the semicolon-separated parameters that may come after it in the full descriptor.

Format names must be all lower-case, 7-bit ASCII characters, with no new-line characters and no unprintable characters.

Format Parameters

The Format Parameters are the semicolon-separated list of key=value pairs that come after the format name in a format descriptor string.

Parameter values may be:

enclosed in double-quotes ": to allow for the ; semicolon character to appear in the value, and prevent leading and trailing whitespace from being trimmed
percent-encoded, to allow for all other characters (including upper-case letters, and double-quotes)

Parameter names may also be percent-encoded, but not quoted.

Octet-Stream Format

The Octet-Stream Format is the format specified by descriptor string 'application/octet-stream'. It describes data for which there is no known structure or context, and should be treated as nothing other than a series of bytes.

Format Descriptor Object

A Format Descriptor Object is an object that represents a Format Descriptor String, providing automatic handling of parameter encoding, etc.

The main operations are:

to parse a descriptor object from a full descriptor string
to construct a descriptor object from a format name and a parameters object
to turn a descriptor object into a full descriptor string
to extract the component parts, as strings: category, subtype, parameter(s)

Format Filter

A Format Filter is an object that represents a range of format descriptors. For example, a filter might represent any format in a particular category.

The main functionality of a filter is to test whether a given descriptor (as a string or an object) is within that range.

Every descriptor object is also automatically a filter object, too: a filter that happens to represent a range of exactly one format, itself.

Filter Format List

Format filters can generate the following lists, where possible (i.e. where the list would not be infinitely long):

list of each format descriptor included by the filter
list of each format descriptor excluded by the filter
list of each format name included by the filter
list of each format name excluded by the filter
list of each format category included by the filter
list of each format category excluded by the filter

Browser-Ready Format Filter

The Browser-Ready Format Filter is a filter that should include every format that can be directly used in one of the following contexts:

the src of an <img>, <audio> or <video> element
the .data property of an ImageData object
the .copyToChannel()/.copyFromChannel() methods of AudioBuffer
the srcdoc of an <iframe> element
the textContent of a <pre> element

The filter will be different depending on the current browser's advertised support for various known formats in the <audio> and <video> elements, using their .canPlayType() methods.

Format Dispatch

A Format Dispatch is a mapping from some external format identifier to a format descriptor.

The obvious example is a case-insensitive mapping for filename extensions (.txt, .gif etc.) but it should not be assumed that this is the only one in practice.

Format Operation

A Format Operation refers to any kind of functionality that should have a common interface across every format where it makes sense to perform this kind of functionality, but every format that does provide it needs its own, custom-written implementation.

Format Operation: Split

Splitting is a format operation that turns one data segment into a stream of component data segments.

This is a multi-purpose operation, that covers several distinct use-cases:

Retrieve each chunk of a "chunk stream"-based format.
Disambiguate data that could be any one of a number of things.
(Or even several of them, at once: you might get multiple "overlapping" results that cover the same raw data, interpreted in different ways.)
If this is a compressed, encrypted or otherwise-encoded data stream, splitting it should yield the original, decoded data.
(Unless it is unknown, the original data's format descriptor should be included within the encoded data's format descriptor, as a parameter called encoded.)
For data that is not technically compressed, but is tightly-packed (e.g. bitplanes), provide an expanded, byte-aligned alternate version that is easier to deal with.
For example: In a 16-color image where each palette entry's RGB values are packed as 5:6:5 bits, and the pixel data packs 2 pixels into every byte, provide alternate versions of the palette and pixel data where each RGB component and every pixel gets its own byte.
For data that is not already in one of the formats listed in the browser-ready format filter, but can be automatically converted to one of those formats, perform the automatic conversion and include the converted version as well as the original data.

You can optionally specify a Format Filter to specify which format(s) you are interested in getting out of the split. The Format Handler has access to this filter object, and can make decisions based on it. For example, it might skip over whole parts of the file if it can tell there's nothing there that would be accepted by the filter, or skip automatic conversions.

Format Operation: Create Volume

This operation creates a new empty volume that is configured according to the requirements and restrictions of the format.

For example, the volume may be set up to reflect:

whether the format only supports a single flat collection, or a full hierarchy with folders and subfolders
whether path names are case-sensitive
path name restrictions, like disallowing certain characters, or restricting them to a maximum length
custom path encoding: with the path name restrictions in place, percent-encoding may be unnecessary, and there may be a custom path name separator character (e.g. \ instead of /)

Format Operation: Populate Volume

This operation takes a volume (usually an empty volume, created previously via the Create Volume operation on the same format) and begins the process of adding paths to the volume, and associating them with data and metadata.

This operation completes when the volume is fully populated, but you can also use volume listeners to keep tabs on the process while it is still ongoing.

The parameters for the populate operation are:

a path range to specify which paths to add
Default: a range that covers every path
finishing move: what to do with the volume (or part of the volume) once it is known to be fully populated
- 'nothing'.
- 'seal': seal it.
- 'freeze': freeze it. (default)

Format Operation: Scriptify

This operation generates a SquareScript document from the data.

Format Capabilities Object

A Format Capabilities Object is an object associated with a format that describes which operations are available for that format.

For each available operation there will be either a simple true value, or something more descriptive.

Split:
- A format filter describing every format that might be provided by the split.

Format Handler

A Format Handler is an object associated with a specific format name (not full descriptor) that defines custom data operations associated with that format.

The main operations are:

get a capabilities object for this handler's format
get a capabilities object for a given full descriptor that has the same format name as this handler (for where capabilities are different based on the parameters)
perform a format operation on a given data segment

Data Segment

A Data Segment is an object that represents a stream of bytes.

Each Data Segment has an assigned format, a minimum length (which may be zero) and a maximum length (which may be infinite).

The default format is application/octet-stream, which means a generic stream of bytes.

The main operations:

get actual bytes from the stream, in ArrayBuffer-based form
get a URI for the data (which may be a data: or blob: scheme URI)
create a new segment from part of this one, assigning it its own format
with the help of a format handler for this segment's assigned format:
- perform a format operation on this segment's data

Data Record

A Data Record is a type of object where the only named properties should use getter/setter methods to internally map these value accesses onto an ArrayBuffer (via method calls on a DataView or typed array created for this buffer).

Data Record Constructor

A Data Record object constructor function should always take the same parameters as the DataView constructor: an ArrayBuffer, a byte offset, and the length in bytes.

Path

A Path is a series of path names and/or path ordinals intended to express a location in a hierarchical (subfolder-based) system.

(A flat system can also express locations using paths. In this case, valid paths would always be a series of exactly one name/ordinal.)

Root Path

The Root Path is a path of length zero, intended to refer to the system's root container.

Path Name

A Path Name is part of a path used to differentiate the path's location from other locations that may exist at the same depth-level in the system hierarchy.

Any string value can be a valid path name, except the empty string ''.

Path Ordinal

A Path Ordinal is part of a path used to differentiate the path's location from other locations that may exist at the same depth-level in the system hierarchy.

A path ordinal is always a finite integer. It must a number value (i.e. (typeof po === 'number') must be true where po is a path ordinal), and in particular, not a string representation of a number. (Otherwise, it would not be possible to differentiate it from a path name.)

Path ordinal 0 means the first path. 1 means the second, and so on. (The actual ordering method -- what it is makes a path "first" or "second" etc. -- is unspecified. It needs to be determined by the specific system that this path is being used for.)

For negative ordinals, -1 means the final path, -2 means the second-to-last, and so on.

Path Object

A Path Object is an object that represents a path.

It is an Array, or Array-like object, where the property p.length is a finite non-negative integer and numeric properties p[0] through p[p.length-1] are valid path names or path ordinals.

Main functionality:

get a stringified version of the path, in standard encoding

Standard Path Encoding

Standard Path Encoding is a string representation of a path that is consistent across all kinds of path-based system.

Encoding rules:

zero-length path (the root): '<root>'
path name: use percent-encoding, with the same rules as encodeURIComponent()
path ordinal: enclose the ordinal number in [ square brackets ]
series: join encoded components together with / forward slashes /

Path Range Object

A Path Range Object is an object that describes a linear set of paths found between two reference point paths.

A range object has the following properties:

.firstPath: no paths in the range can come "before" this path object, and if .excludeFirstPath is true, they cannot be equal to it, either
Default: []
.excludeFirstPath: boolean flag set to indicate that the .firstPath is an exclusive bound, not an inclusive bound
Default: false
.lastPath: no paths in the range can come "after" this path object, and if .excludeLastPath is true, they cannot be equal to it, either
Default: [-1]
.excludeLastPath: boolean flag set to indicate that the .lastPath is an exclusive bound, not an inclusive bound
Default: false
.minDepthLevel: the minimum number of names or ordinals a path in the range must have
Default: 0
.maxDepthLevel: the maximum number of names or ordinals a path in the range must have
Default: +Infinity

Taken together, the defaults for each property specify a range that covers every possible path.

The standard way to create an empty path range is to set .maxDepthLevel to a negative number. (Level 0 means that the root path is still included.)

Note that the path range interface is compatible with the path object interface, so an object could be both. For example, to turn a path object into a path range that includes only itself:

path.firstPath = path.lastPath = path;
path.excludeFirstPath = path.excludeLastPath = false;
path.minDepthLevel = path.maxDepthLevel = path.length;

...or alternatively, a path object could also be a range that includes all of its descendant paths, instead of itself:

path.firstPath = path.concat([0]);
path.lastPath = path.concat([-1]);
path.excludeFirstPath = path.excludeLastPath = false;
path.minDepthLevel = path.length;
path.maxDepthLevel = +Infinity;

Volume

A Volume is a data structure that maintains an ordered list of paths, and associates named values with them: typically a Data Segment named data, and then maybe some additional metadata values, like a Date value, named timestamp.

Volume Listener

Volume Listeners are a notification mechanism that lets you hook into current or future state on a volume.

Single-Path Listener

Types of volume listener for a specific path:

named value is set
named value is deleted
all named values frozen

Multi-Path Listener

Types of volume listener for a path range (may be every path across the volume):

all paths sealed
all paths frozen

Volume Path Encoding

By default, a volume will use standard path encoding. This can be overridden, to allow for more "natural" path encoding for a particular kind of volume.

Volume Path Name Collation

Every Volume must have a Collation Function that is specified when the Volume is created, and must not change once the Volume is initialized.

The Collation Function is a string comparison function that takes two string arguments a and b, and returns:

a negative number if a < b
a positive number if a > b
zero if a === b

The default Collation Function is based on comparing strings using the standard JavaScript comparison operators. This means that by default a Volume uses case-sensitive paths. To create a Collation Function that is case-insensitive, it is recommended to look at String.prototype.localeCompare() (which also optionally provides number-aware string sorting, so for example '2' < '10').

Sealing a Volume

Sealing a volume means declaring that no new paths are going to be added to it.

Sealing can be done across the whole volume, or just part of it (specified by a path set).

Note that, while no new paths can be added, this does not mean that the data and metadata associated with an existing path cannot be changed. It's also valid for paths to be removed, but if they are, they cannot be re-added. The only guarantee is no new paths.

Sealing is important to do, in case there are any listeners that are waiting to do something when it is confirmed that there is no more .

Freezing a Volume

Freezing a volume means that the data and metadata associated with the volume's paths are in their final state, cannot change again.

Freezing can be done across the whole volume, or just part of it (specified by a path set).

Freezing a volume also means that no new paths can be added (so freezing a volume also automatically seals the volume as well) and also paths cannot be deleted either.

Volume Access Transform

Giving a volume an Access Transform means setting it up in such a way that some or all operations on it are transparently translated into operations on a different volume.

Access Transform Pattern: Mount as Subfolder

All accesses and listeners within a certain subfolder on one volume are mapped onto accesses/listeners on another volume (starting from the root, instead of the original subfolder path).

Access Transform Pattern: Secondary Data Stream

A volume access transformation to simulate systems where there can be two data streams associated with the same path, primary and secondary. The data stream from the secondary volume appears as a metadata value for paths on the primary volume.

Access Transform Pattern: Lazy Clone

A volume access transformation which just mirrors another volume directly, until you try to add or change something: these modifications are only added to the proxy volume, not the original one.

TODO: Decide whether deleting a path on the proxy should restore the cloned version, or set a special "deleted" version?

Access Transform Pattern: Current Working Directory

A volume that maps onto a subdirectory path on another volume, and the path can then be changed, similar to cd/chdir at commmand prompt.

SquareScript

SquareScript is a simplistic mini-language, intended as a target for transforming "alien" code into a manageable common format, that can then (hopefully) be run via a SquareScript interpreter.

SquareScript syntax is based on JavaScript Arrays. It is designed to be JSON-compatible.

SquareScript Document

A SquareScript document may be an Array, a JSON-encoded Array, or an object with a .toJSON() method that returns an Array.

The root of a SquareScript document is a step, often a block.

SquareScript Step

All commands and structures in a SquareScript document are based on steps. A step is an Array containing:

The step name as a string, unless this step is a Block
Any number of step parameters, each of which are either:
- A string/number/boolean/null literal
- A step

For JSON compatibility, number literal parameters may not be Infinity, -Infinity or NaN.

A step always evaluates to a value (null is used where no obvious evaluation exists), unless it is a Comment step.

Step parameters are always evaluated in order from first to last, unless it is an Orderless step.

SquareScript Block

A Block is a special kind of SquareScript step where:

There is no step name (it is the only kind of step without one)
Every parameter must be a step, not a literal value

A block always resolves to null.

SquareScript Import

SquareScript documents can specify external entities to import, using the Import to Scope step.

SquareScript Flavor

A SquareScript Flavor is a specialization of SquareScript created to suit a particular domain.

Each flavor has:

its own set of step handlers, extended from the standard ones
its own set of macro handlers, extended from the standard ones
its own global scope, for documents to import value(s) from and export value(s) to

SquareScript Step Handler

A Step Handler is a normal JavaScript function. It should assume that it will be called with an undefined this context. There is no general restriction on the kind of value it can return, or the values it might get as parameters.

If a Step Definition has the property .isPure set to true, this means the function always returns the same results for the same input parameters, and it should have no side-effects (e.g. logging).

SquareScript Macro Handler

A Macro Handler is a function that takes a step as an input parameter, and returns a new step to replace it with.

Both steps should always be Arrays. The first element of the input step will be the name of the macro. Care needs to be taken not to accidentally return the input step unchanged, or for two macros to return each other, or this will lead to an infinite loop.

SquareScript Global Scope

The SquareScript Global Scope is the object used by the Import to Scope and Export from Scope steps.

SquareScript Import Resolution

Until its imports are resolved, attempting to run a SquareScript document in a SquareScript interpreter may throw up errors.

You can get a list of the unresolved imports for a SquareScript document using SquareScript.getUnresolvedImports(script) and can resolve them by setting each import name as a property name on an object and then calling SquareScript.resolveImports(script, importsObject).