hypr - wtsi-hgi/hgi-web GitHub Wiki

On RESTfulness

  1. Hypermedia
  2. Modelling the Graph
  3. hypr: A Hypermedia Type
  4. Procedural-Graph Interface

hypr: A Hypermedia Type

Note that this is non-normative and has evolved from an earlier iteration of this paper, which provides the motivation for a new format and takes stock of the mechanisms that are available, the design constraints that have been imposed and the system that ultimately needs to be modelled.

To summarise:

  • We wish to represent an information system on a connected graph in a self-describing way.

  • We get a lot for free just off the back of HTTP[1] and the RESTful architecture[2]; this should be exploited to the fullest extent to avoid duplication of effort.

  • We strive for simplicity and regularity in our representation, using progressive enhancement and type inference as means to specificity and precision, where needed.

  • Serialisation will be offered only as a JSON[3] derivative. (Other serialisations, such as XML, would be possible, but are not deemed to be worth the effort.)

Once matured, a formal specification will be written and registered with the IANA, initially using the media type application/vnd.hypr, with the ultimate intent of standardisation to application/hypr.

General Structure

The general structure of a hypr document will be to bisect a vertex's representation into its edges and state. Edges are defined simply as key-value pairs, where the key denotes the link relation associated with IRI(s) as values. Likewise, state is a key-value store which comes with optional type definitions and semantics for linking.

Throughout the following definitions, ABNF[4] will be used to describe the grammar. Such definitions are not to be taken strictly, in the sense of whitespace invariance, character literals and undefined implications or derivations from host grammars (these are described within angle brackets).

For the purposes of linking, IRIs are assumed and URI templates[5] -- lifted on to IRIs -- are employed to denote parameterised edges. For convention's sake, these are defined as follows:

json-iri          = <JSON serialised IRI string>

json-iri-template = <JSON serialised IRI template string>

Thus we have the following general structure:

vertex = "{" edges [ "," state ] "}"

Note that state can be omitted for the sake of providing a minimal representation for symlinking.

Edges

A vertex must have at least a loop (link to self). Otherwise, edges may be arbitrary. Although, as discussed later, some link relations have semantic meaning within hypr documents:

edges           = DQUOTE "links" DQUOTE ":" edge-hash

edge-hash       = "{"
                    self-link 
                    *( "," cond-link )
                    *( "," state-link )
                    [ "," collection-link ]
                    *( "," rel-link )
                  "}"

self-link       = DQUOTE "self" DQUOTE ":" json-iri

cond-link       = DQUOTE cond-key DQUOTE ":" json-iri

cond-key        = "next" / "prev" / "base"

state-link      = DQUOTE state-key DQUOTE ":" link

state-key       = <JSON key matching state key>

collection-link = DQUOTE collection-key DQUOTE ":" json-iri-template

collection-key  = <JSON key matching state's collection key>

rel-link        = DQUOTE relation DQUOTE ":"
                  ( link / "[" link 1*( "," link ) "]" )

relation        = <JSON hash key>

link            = json-iri

Note that while we refer to vertices' edges at a technical level, the format calls them "links" for familiarity's sake.

When a vertex's loop differs from the request IRI, this should indicate a symbolic link. This ought to be picked up by the server, before reaching the client. How it responds to this inconsistency is implementation dependant, however ultimately the representation response should be that of the loop IRI. This can be done via an appropriate HTTP redirect (i.e., 303 See Other, or 302 Found), or by simply supplanting the terminal vertex's representation in the same hop. Any state data or non-loops in a symlink vertex should be ignored.

(Note that there is an argument for the vertex's loop to be a fully qualified IRL, rather than an arbitrary IRI. Part of its purpose is to encapsulate self-reference and an IRI may not provide enough information for dereferencing.)

Otherwise, there are four classes of relation-IRI pair:

  • Conditional links are only relevant to conditional representations, per their definition in the previous section. Their semantics are discussed later, but are the responsibility of the server.

  • State links are associated by their relation name with arbitrary keys within the vertex's state. This allows arbitrary linking of state elements.

  • A vertex may have at most one collection associated with it. This is defined both as a link relation and matching state element, by key. This differs from the above in that the IRI must be templated and, providing the collection is mutable, infers the locus of POST requests.

  • Beyond these, arbitrary pairs can be defined, with the restriction of a single IRI per relation lifted. Note, however, that there are three reserved relations of this class, discussed later: docs, view and contract. Otherwise, how relations are to be interpreted is application dependant.

Note, for the latter, multiple edges under a particular link relation ought to be unique. This isn't mandated by the specification, but should be enforced by any implementation.

Foreign Resource Interface

You need to build a system that is futureproof; it's no good just making a modular system. You need to realise that your system is just going to be a module in some bigger system to come, and so you have to be part of something else, and it's a bit of a way of life.

Tim Berners-Lee

The above method of defining edges is specific to resources within a hypr graph. It is minimal because hypr is designed to infer protocol semantics from its graphical structure. However, as part of the wider Internet, it would be useful to link with non-hypr resources. To this end, foreign resource links can be augmented with metadata to facilitate better client control. Specifically, answering the following questions:

  • What can be done with the resource? Presumably, it can be fetched, maybe it can also be updated or deleted. This corresponds with the HTTP Allow response header and, by encoding this, a client can provide appropriate interfaces. (By encoding this, rather than assuming the foreign server responds with a correct Allow header, we save an OPTIONS hop and also declare explicitly how a hypr resource can interact with a foreign one.)

  • What constraints apply to the resource, when reading or writing? For example, what media type should a client accept, or how should a request be encoded, etc.?

link             =/ foreign-link

foreign-link     =  "{"
                      DQUOTE "href" DQUOTE ":" json-iri
                      [ "," DQUOTE "allow" DQUOTE ":" foreign-allow ]
                      [ "," DQUOTE "accept" DQUOTE ":" foreign-accept ]
                      [ "," DQUOTE "content" DQUOTE ":" foreign-content ]
                    "}"

foreign-allow    =  http-method / "[" http-method 1*3( "," http-method ) "]"

http-method      =  DQUOTE ( "GET" / "POST" / "PUT" / "DELETE" ) DQUOTE

foreign-accept   =  media-range
                 =/ "{"
                      DQUOTE "type" DQUOTE ":" media-range
                      [ "," DQUOTE "charset" DQUOTE ":" accept-charset ]
                      [ "," DQUOTE "encoding" DQUOTE ":" accept-encoding ]
                      [ "," DQUOTE "language" DQUOTE ":" accept-language ]
                    "}"

media-range      =  <JSON serialised string, per RFC7231 §5.3.2>

accept-charset   =  <JSON serialised string, per RFC7231 §5.3.3>

accept-encoding  =  <JSON serialised string, per RFC7231 §5.3.4>

accept-language  =  <JSON serialised string, per RFC7231 §5.3.5>

foreign-content  =  content-type
                 =/ "{"
                      DQUOTE "type" DQUOTE ":" content-type
                      DQUOTE "encoding" QUOTE ":" content-encoding ]
                    "}"

content-type     =  <JSON serialised string, per RFC7321 §3.1.1.1>

content-encoding =  <JSON serialised string, per RFC7321 §3.1.2.2>

If the allow value is omitted, then GET is assumed. Note that, when allowing multiple methods, the JSON array is implied (but not specified) as having unique elements.

The accept and content values mimic a relevant subset of the respective Accept-* and Content-* HTTP headers. If they are omitted, or partially specified, then a client is free to defer to its defaults.

The accept value implies an allowance for GET. If the allow value exists but does not include GET, then the accept value should be ignored. Similarly, a content value implies PUT and/or POST; in which case, the accept value must contain at least one of these, else it too is to be ignored.

A resource that only allows DELETE seems an unlikely scenario, to say the least! Nonetheless, it is specifiable, for the sake of completeness. OPTIONS and HEAD support are not required.

State

Vertex state is represented as a key-value store. At its simplest, keys are just associated with values. However, a variety of type level annotations can be added to enforce increasingly stronger guarantees.

state        =  DQUOTE "state" DQUOTE ":" state-hash

state-hash   =  "{" element *( "," element ) "}"

element      =  DQUOTE state-key DQUOTE ":" element-data

state-key    =  <JSON hash key>

element-data =  element-val
             =/ "{"
                  DQUOTE "value" DQUOTE ":" element-val ","
                  type-def
                "}"

element-val  =  <Any valid JSON> / collection
             ;  n.b., "Any valid JSON" also covers arrays and objects

collection   =  "[" [ coll-val *( "," coll-val ) ] "]"
             ;  n.b., the JSON array must be homogenous (or empty)

coll-val     =  json-string / vertex

json-string  =  <JSON string>

Note that the entire state section is optional, but if it exists, it must contain at least one element. (That is, an empty JSON object is not valid state.)

An element's value can be arbitrary, but should always match its type (if present). A state element without a type definition should be considered immutable and inherit its type from its JSON primitive type. This is clearly limiting, but allowed.

If there is a collection element -- i.e., where the state key matches a link relation with a templated IRI -- then its value must be one of two things:

  • Complex: A (potentially empty) JSON array of fully embedded hypr representations, corresponding to the subordinate resources.

  • Simplex: A (potentially empty) JSON array of valid/existing subordinate resource names, per the template key as a means of dereference.

The depth at which collections are embedded (i.e., for collections of collections of...) must be set by the server; either globally or at a resource level. When the embedding depth reaches zero, then the representation should switch to the simplex representation. For example, a depth of one would embed the first level of subordinates in their entirety, but second level collections would be represented in simplex.

Server implementations may wish to specify a means for limitless embedding -- i.e., as collections are guaranteed to not contain non-trivial cycles, this amounts to embedding the entire subtree -- although this may not be a particularly useful feature! The depth setting may also be overridable on request (e.g., using a proprietary HTTP header). Also note that there is no duplex form, where simplex and complex representations are mixed.

Clearly the complex form necessitates subordinate loops being included, hence providing their means of dereference. However, as each subordinate is necessarily of the same type, this would be at the expense of duplicating information. While the bandwidth impact of doing so may be mitigated by HTTP compression, this redundantly complicates the representation. To resolve this -- and the related problem of not supplying enough type information for simplex embedding -- the subordinate type is lifted out from the embedded state and represented in the collection's subtype.

The exact mechanics of this are discussed in the next section, but the following serve as an example:

Simplex

{
  "links": {
    "self":       "/people",
    "collection": "/people/{id}"
  },
  "state": {
    "collection": {
      "value": ["foo", "bar", "quux"],
      "type": {
        "primitive": "collection",
        "subtype": {
          "id": {
            "primitive": "text",
            "mutable":   false
          },
          "name": {
            "primitive": "text",
            "label":     "Full Name"
          }
        }
      }
    }
  }
}

Note that the simplex form can omit the type definition entirely -- the fact that it's a collection can be inferred from the templated IRI -- making it impossible to POST to the collection and forcing any client to assume type information from its JSON primitives. This gives a minimal representation which may be appropriate in read-only scenarios.

Complex

{
  "links": {
    "self":       "/people",
    "collection": "/people/{id}"
  },
  "state": {
    "collection": {
      "value": [
        {
          "links": {
            "self": "/people/foo"
          },
          "state": {
            "id":   "foo",
            "name": "Joe Bloggs"
          }
        },
        {
          "links": {
            "self": "/people/bar"
          },
          "state": {
            "id":   "bar",
            "name": "President Business"
          }
        },
        {
          "links": {
            "self": "/people/quux"
          },
          "state": {
            "id":   "quux",
            "name": "Darth Vader"
          }
        }
      ],
      "type": {
        "primitive": "collection",
        "subtype": {
          "id": {
            "primitive": "text",
            "mutable":   false
          },
          "name": {
            "primitive": "text",
            "label":     "Name"
          }
        }
      }
    }
  }
}

Here, collection has been chosen as the collection element key for clarity. However it can be arbitrary and should be chosen to avoid collisions with other state elements while, ideally, remaining meaningful (e.g., the same as the IRI's basename would probably be appropriate; people in this example seems fitting).

When collections are only minimally embedded into vertices (i.e., the simplex case, when only enough information is present to dereference) it is clearly inapplicable for key queries to descend into them. However, the presence of a collection should always be transparent to the query (i.e., it is always present in the representation, at all embedding depths).

Type Definitions

Gradual typing affords more than just the ability to reason about data, in terms of its meaning. A suitably rich type system -- when exposed to a client -- could aid in generating an appropriate user interface. For example, if age were defined to be an integer between 0 and 120, a client could render a slider widget rather than a plain text input.

JSON only admits a handful of primitive types: null, Unicode strings (usually encoded in UTF-8, but this is not a requirement), numbers (again, usually but not necessarily double precision floating point) and Booleans; with syntax for arrays (indexed collections) and objects (hashed collections). To meet the complexity of our typing system, we must therefore overload what we have by subtyping (parameterising) wherever necessary:

Primitive JSON Primitive Subtype
Bottom Null
Textual String (Pattern; IRI; E-Mail)
Numeric Number (Set; Range; Stepping)
Temporal String Date-Time
Logical Boolean
Enumeration Collection
Raw String Media Type + Encoding

Note that parenthetical subtypes are considered optional enhancements; for example, a string could come with a regular expression. Furthermore, enumerations can be represented as either an array or an object; the latter's key-value model affording greater fidelity.

Thus two fields are required to represent type efficiently: primitive and subtype. The former is limited to JSON primitives: null, text (for strings), number and bool. An optional dependent subtype can also be specified:

  • Textual subtypes:

    • Temporal data as datetime, per ISO 8601[6].
    • Raw data as its IANA media type, per [7], with its encoding scheme (if omitted, base64 is assumed).
    • Additionally, a regular expression can be used as a subtype to specify pattern validation, prepended with a slash character. (The regexp engine is implementation dependent, but must minimally support JavaScript's capabilities[8].)
    • The predefined patterns iri and email are also defined for IRIs[9] and e-mail addresses[10], respectively. This is largely for the client's benefit.
  • Numeric subtypes:

    • Sets can be specified as int or float.
    • Ranges can be specified using open, half-open or closed interval notation; an empty string at either terminal can be used to denote extremities (i.e., ±∞).
    • Stepping can also be specified with a range by appending it after a slash character. (Stepping must be with respect to a finite interval minimum.)

If a subtype is not specified, then the semantics of the JSON primitive type take effect. Note also that it's unlikely that a null type would ever be needed in practice (as opposed to just omitting the data element from the state).

The following serve to illustrate various examples:

Plain text:

primitive: text

PNG image:

primitive: text
subtype:   image/png

IRI:

primitive: text
subtype:   iri

Integers:

primitive: number
subtype:   int

Even numbers between 0 and 10 (inclusive):

primitive: number
subtype:   int[0,10]/2

Odd numbers:

primitive: number
subtype:   int[1,)/2

Tenths from 0 up to, but excluding, 1:

primitive: number
subtype:   float[0,1)/0.1

Normalised Gaussian integers [for example's sake, only!]:

primitive: text
subtype:   /^(0|-?([1-9]\d*|(([1-9]\d*[+-])?([2-9]?|[1-9]\d+)i)))$

The range of subtypes may be extended, whereas primitive types are obviously fixed. The above is formally defined as follows:

type        =  DQUOTE primitive DQUOTE
            =/ enumeration

primitive   =  "null"
            =/ "text"
            =/ "number"
            =/ "bool"

enumeration =  "{" kv-pair *( "," kv-pair ) "}"

kv-pair     =  DQUOTE key DQUOTE ":" json-string

key         =  <JSON hash key>

subtype     =  DQUOTE textual-st DQUOTE
            =/ DQUOTE numeric-st DQUOTE

textual-st  =  "datetime"
            =/ media-type [ ";" encoding ]
            =/ "iri"
            =/ "email"
            =/ "/" regexp

media-type  =  <IANA registered media type>

encoding    =  "base64" / "percent" / "raw"
            ;  base64 encoding
            ;  percent encoding, per RFC1738
            ;  no encoding

regexp      =  <Serialised regular expression>

numeric-st  =  set [ range ]

set         =  "int" / "float"

range       =  interval [ "/" stepping ]

interval    =  ( "[" / "(" ) [ min ] "," [ max ] ( "]" / ")" )

min         =  <Serialised numeric expression>
            ;  <= max

max         =  <Serialised numeric expression>
            ;  >= min

stepping    =  <Serialised numeric expression>
            ;  > 0 && finite min && <= max - min

To account for homogeneous collections and optionality, a type can be quantified. The same syntax is used per regular expression quantifiers, with the same aliases for "at most one" (i.e., optional), "at least one" and "any amount". The default quantifier, if not specified, is "exactly one".

quant      =  DQUOTE quantifier DQUOTE

quantifier =  "{" exact "}"            ; exact
           =/ "{" min "," [ max ] "}"  ; range
           =/ "?"                      ; at most one {0,1}
           =/ "+"                      ; at least one {1,}
           =/ "*"                      ; any amount {0,}

exact      =  <Serialised integer>
           ;  > 0

min        =  <Serialised integer>
           ;  >= 0 && < max

max        =  <Serialised integer>
           ;  > min

When only a single value is quantified ({1} or {0,1}), then the data should be represented as a scalar value (or omitted in the optional case). Otherwise, data should be represented as a JSON array.

A default value can also be specified, which applies to scalar data. Thus, if a homogeneous collection is quantified, the default value represents that of individual elements. (Note that this is largely a UX consideration, but still must be enforced if data is omitted from a request.) If quantified as mandatory and no default is specified, then the server should raise a 4xx error when data is omitted from a request.

By default, a typed data element is mutable, whereas an untyped one isn't. A typed element can be made immutable. In this case, it is implied that the data is generated -- or in some way handled -- by the server. Typed elements should come with a label (regardless of their mutability) to provide UX context to the client (cf. HTML's label element); non-typed elements or label-less elements should probably be hidden from user consumption.

Note that immutability does not imply necessity, but any quantification should be ignored in requests, as it cannot be satisfied by a client.

The above enables rich typing on scalar data and homogeneous collections thereof, which are embedded within the vertex's state. However, it is also necessary to type subordinate vertex (i.e., heterogeneous) collections. These come in two flavours: genuine subordinates and linked collections. Both can be accommodated with a new primitive type of collection and a suitably defined subtype, which we introduced earlier.

Note that for collection primitives, the quantifier defaults to * -- any amount -- rather than {1}, in the sense that it applies as the quantification on the collection's size. (As such, exact quantifiers are representative of fixed length collections where all items must be present; a tuple. It is unlikely that such collections would be useful in real life!) Furthermore, the default value for collection primitives is not applicable and, if present, should be ignored.

  • Genuine subordinates can be typed against a hash of primitive type definitions, recursively per the above.

  • Linked collections can be typed by IRI to its collection head. Note that the protocol semantics against such a collection imply linking, rather than creation; i.e., a POST to a linked collection will create a link vertex that refers back to the respective item in the genuine collection, per the specified IRI.

primitive      =/ "collection"

subtype        =/ collection-st

collection-st  =  json-iri
               =/ "{" coll-type *( "," coll-type ) "}"

coll-type      =  DQUOTE coll-key DQUOTE ":" definition

coll-key       =  <JSON key matching collection's state key>

Thus the final type definition is specified as follows:

type-def   = DQUOTE "type" DQUOTE ":" definition

definition = "{"
               major-type
               [ "," minor-type ]
               [ "," label ]
               [ "," quantity ]
               [ "," default ]
               [ "," mutable ]
             "}"

major-type = DQUOTE "primitive" DQUOTE ":" type

minor-type = DQUOTE "subtype" DQUOTE ":" subtype
           ; n.b., subtype depends on primitive

label      = DQUOTE "label" DQUOTE ":" <JSON string>

quantity   = DQUOTE "quantity" DQUOTE ":" quant

default    = DQUOTE "default" DQUOTE ":" <Scalar JSON data>

mutable    = DQUOTE "mutable" DQUOTE ":" ( "true" / "false" )

Programmatic Contracts

Programmatic precondition contracts can be specified on each vertex by using the contract link relation and supplying the IRI(s) to the code (there is no facility to inline the code within the representation), per the Fielding constraint of "Code on Demand". There is no presumption on the language (i.e., JavaScript is not mandated) or mechanisms it employs to specify contracts for the client to execute. Despite this agnosticism, a common interface is still required:

  • The contract should be callable, taking the vertex state as its primary input.

  • The contract should be a predicate and, ideally, also provide a channel for signalling errors back to the caller.

  • If the environment allows it, contracts should be pure functions (i.e., not allowing side effects); otherwise, the client should at least protect state from mutation and ideally sandbox all contract execution appropriately.

  • Multiple contracts per vertex are allowed; in which case, their results are conjoined to give the final outcome. (State is immutable, per the above, so order doesn't matter.)

  • Contracts should be executed after type checking.

  • Clients are free to ignore contracts. However, servers are obliged to execute them before processing requests. Either way, it is implied that the code is the same for both parties.

For example, if state is passed that includes numeric height and weight fields, using JavaScript with CommonJS modules and the "error first callback" convention, a contract could be coded as follows:

var between = function(x, min, max) {
  return x > min && x < max;
};

module.exports = function(state, callback) {
  var err    = [],
      passed = false,
      bmi;

  // Make sure state is immutable
  // (This should be done by the client, but is here for illustration)
  if (!Object.isFrozen(state)) {
    Object.freeze(state);
  }

  // Reject invalid values
  // (This should be done in type checking)
  if (!between(state.height, 0.5, 2.5) {
    err.push(new Error('Height should be between 0.5 and 2.5m'));
  }

  if (!between(state.weight, 0, 200) {
    err.push(new Error('Weight should be between 0 and 200kg'));
  }

  // Predication on BMI
  // (Comments on the validity of BMI can be forwarded to /dev/null)
  if (!err.length) {
    bmi    = state.height / (state.weight * state.weight);
    passed = between(bmi, 25, 30);

    if (!passed) {
      err.push(new Error('Abnormal BMI'));
    }
  }

  // Return
  callback(passed ? null : err, passed);
};

The contract resource would need to signal how a client is to interpret and execute it within its response. Its Content-Type header would be appropriate, providing it can specify the necessary fidelity (e.g., application/javascript can't currently be parameterised to indicate CommonJS modules). In the event that such information cannot be conveyed, the resource's media type must instead be parameterised to express the interface for the client (see later).

Human Consumption

Of course, while self-describing data -- as far as machines are concerned -- is the aim, in the context of an API there is some expectation of human interaction. To facilitate this, two link relations on a vertex have reserved semantics: docs and view.

The first specifies references to human readable documentation for each vertex. This can cover anything that is felt appropriate, but should probably focus on application semantics at a vertex and state level. This will enable human agents to more efficiently process and navigate the API, without having to manually parse raw JSON.

In addition to this, presentation templates can be associated with a vertex such that its state can be rendered in a more palatable way, rather than deferring to (or in concert with) the semantics of the data's type definitions. While this is clearly a cosmetic consideration, which adds burden on development and maintenance, it would facilitate customisability and, as UX is very important to human users, adoption.

How these resources are represented and interpreted by the clients is arbitrary and out of the scope of this specification. They are also to be considered optional and the client is free to ignore them.

(n.b., These links are to be encoded into the representation, rather than using the Link response header, for the sake of encapsulation, should it be taken out of the context of HTTP.)

Conditional Representation

If the vertex state is filtered using conditional queries, then the vertex's loop must reflect this. For example:

{
  "links": {
    "self": "/?select=id"
  }
}

(n.b., For the sake of brevity, the state section in this and all other examples within this section have been omitted.)

In the case of slicing, next and prev relations must be inserted by the server, where appropriate, maintaining the (normalised) values of any other queries. It is the server's responsibility to calculate appropriate slices to remain within a collection's bounds.

For example:

{
  "links": {
    "self": "/people?q=(dept=hr)&slice=10:20",
    "next": "/people?q=(dept=hr)&slice=20:30",
    "prev": "/people?q=(dept=hr)&slice=0:10"
  }
}

Note that, clearly, collection vertices must be indexed consistently for slicing to work. How this is done is up to the server implementation (transparent to any client), but should ultimately rely on static vertex metadata, rather than something internal to its state (e.g., creation time or IRI, as opposed to some arbitrary field).

The server may optionally -- but preferably -- include an unfiltered link on a conditional representation using the base relation:

{
  "links": {
    "self": "/products?select=id,desc&q=(and(cost>=50)(cost<=100))",
    "base": "/products"
  }
}

If the conditional representation contains no state, then the state section should be omitted from the response (rather than supplying an illegal, empty state). Alternatively, the server may respond with 204 No Content, if that makes more sense.

Conditional Updates

It would be nice to have an analogue to SQL's UPDATE...WHERE clause, but it wouldn't make sense within this model. That is because the querying operates at a resource level (i.e., vertices), rather than at the state level; it just uses state to drive the query.

For example, in a relational database, you cannot update a selection of tables based on their shared content in a single operation. You update a single table, based on record content (as records are homogeneous). While resources that are part of a collection will be homogeneous, in general this is not true and resources are otherwise designed to be independent key-value stores.

Thus, to conditionally update a collection of vertices, one would have to do so from within the client by filtering the collection (i.e., a conditional representation of the collection's parent vertex) and then applying a PUT (or DELETE) against each vertex in the result set, in turn. (That said, it would be advantageous for clients to implement such functionality.)

That notwithstanding, a PUT, POST or DELETE request against a conditional representation of a resource should act against the holistic resource (i.e., as though no query were applied). The caveat being if a selection query masks mandatory state, then the server must not allow respective PUT and/or POST requests -- as not enough information will be available for them to be correctly formed -- by removing them from the Allow response header in that instance.

Query Persistence

It may be useful to create resources that persist queries against another resource (cf. SQL views). This can save recreating complex logic and mitigates against any address length limits imposed by browsers or other clients.

(Note that the original HTTP specification does not specify a maximum IRI length, but allows the server to respond with a 414 Request-URI Too Long error code if it cannot handle it. The refreshed specification recommends a maximum of 8KB. In reality, as of writing, browsers and search engines support up to 2KB.)

This can be done in a similar way to how symlinking is proposed: The self link must contain a non-trivial query component and, importantly, its logical address must differ from the request IRI; additionally, the representation should contain no state section (although this doesn't differ from a query that returns no results).

For example, say the representation of a resource at /depts/hr was as follows:

{
  "links": {
    "self": "/people?q=(dept=hr)"
  }
}

The server can then "do the right thing" and serve the appropriate conditional representation. (Ideally, this would not induce an HTTP redirect.) Furthermore, persistent query resources (PQR) should be transparent to additional queries -- i.e., acting as though it were a normal hypr resource -- effectively making them chainable by conjunction.

Continuing the example, a resource at /depts/hr/managers may be represented as:

{
  "links": {
    "self": "/depts/hr?q=(not(staff=null))&select=name,mail"
  }
}

Note that the edges to these vertices ought to be represented somewhere within the graph, otherwise the resources would be undiscoverable. For instance, the /depts/hr representation could have a managers link relation that points to /depts/hr/managers.

Thus, links in a PQR should augment (and override) those in the representation it's querying against. This also implies that PQRs can ultimately inherit the operations implied against them from their base representation. The semantics of which should follow per the previous section.

Failure Representation

To represent HTTP errors (i.e., 4xx or 5xx status codes), we return a very simple representation consisting of the request's loop and a single, textual (but otherwise untyped and immutable) state element of error, which gives a human-readable description of the failure. The HTTP status code is not encoded into the representation as this -- amongst other metadata -- can be got directly from the response headers.

For example, presuming a 404 Not Found status:

{
  "links": { "self": "/this/is/not/the/resource/you/are/looking/for" },
  "state": { "error": "Resource not found." }
}

Media Type Parameters

The media type takes the following parameters:

  • charset (optional): Definitively specify the document's character encoding, overriding any internal definition (default to UTF-8).

  • qkeys (optional): Override the default query component key tuple (i.e., (select, q, slice)). All keys must appear, following the same ordering.

  • interface (optional): Specify any conditions on the client required to invoke contract code. TODO: This still needs more thought...

  • ext (optional): A comma delimited list of semantic and syntactic hypr extensions applicable to the resource. (Note that a registry of extensions and an acceptance policy will thus be needed in some capacity.)

For example:

application/vnd.hypr; charset=utf8; qkeys=(k,v,slice)

Formatting a Request

When a PUT or POST request is made, the request body should be a JSON object that corresponds with the respective resource's state section, collapsed to simple key-value pairs (i.e., with no additional type information). For example:

{
  "id":   1234,
  "name": "foo",
  "mail": "[email protected]"
}

That is, PUT and POST resources must accept application/json requests. The primitive type of the data values, in the JSON response, must match their definition per the request (i.e., before "proper" type checking is performed). Resources may also, optionally, accept application/x-www-form-urlencoded requests, at the implementation's discretion.

A PUT or POST request must fail with a 400 Bad Request error if:

  • The request omits any state that is defined to be mandatory.
  • The request fails type checking in any other way.
  • Any programmatic contracts applied against the request return false.

If a PUT or POST request is made that attempts to change immutable or undefined data, then the server has the option to:

  • Either accept the request and transparently ignore any invalid state transitions, while applying anything that validates. While this would not put resources into an invalid state, it might be a little "passive aggressive" from a UX point of view!

  • A better option -- presuming a client were built in such a way to minimise this occurrence -- would be to fail with a 4xx error. This should probably again be the generic 400 Bad Request error; however, 405 Method Not Allowed and 409 Conflict may be appropriate (although run the risk of being confused with an unsupported method or edit conflict, respectively).

Note that if data that is defined to be immutable is missing from the request, then that should be ignored by the server, regardless of the above policy. PUT requests must only apply to the requested resource and do not descend into embedded resources, in the presence of a collection.

DELETE requests do not require a request body.

PATCH[11]

It may be useful for resources that support PUT requests to also support PATCH requests, to modify state by diff. As the underlying representation is JSON based, this would necessitate a JSON diff'ing method. Of which, there are currently two standards:

  • JSON Patch[12], which delimits a sequence of operations to apply to a JSON document, not dissimilar to a traditional textual diff.

  • JSON Merge Patch[13], which applies changes based on structural differences between the two JSON documents.

Support for this -- including whichever method to employ -- is left unto the whim of server developers, where the diff is applied to a resources virtual state representation. (That is, the representation's state section, collapsed to simple key-value pairs, with no additional type information, per the above.)

LINK and UNLINK[14]

Edges ought to be stable, insofar as they're either set, derived from state or generated by the server. It doesn't make sense for an arbitrary user to create or remove edges via the proposed LINK and UNLINK methods. The only time when direct edge manipulation makes sense is during resource/schema definition, which would be outside the remit of end users.

As such, LINK and UNLINK request support is not defined.

Examples

Minimal:

{
  "links": {
    "self": "/"
  }
}

Simple, untyped state, with linked data:

{
  "links": {
    "self":    "/people/foo",
    "manager": "/people/bar"
  },
  "state": {
    "name":    "Joe Bloggs",
    "mail":    "[email protected]",
    "manager": "President Business"
  }
}

Typed state:

{
  "links": {
    "self": "/people/foo"
  },
  "state": {
    "id": 123,
    "name": {
      "value": "Joe Bloggs",
      "type": {
        "primitive": "text",
        "label":     "Name"
      }
    },
    "aliases": {
      "value": ["jb123"],
      "type": {
        "primitive": "text",
        "label":     "Nicknames",
        "quantity":  "{1,}"
      }
    },
    "photo": {
      "value": "/9j/4AAQSk...",
      "type": {
        "primitive": "text",
        "subtype":   "image/jpeg;base64",
        "label":     "Avatar",
        "quantity":  "?"
      }
    },
    "dob": {
      "value": "1981-09-25",
      "type": {
        "primitive": "text",
        "subtype":   "datetime",
        "label":     "Date of Birth"
      }
    },
    "bookmarks": {
      "value": [
        "http://www.sanger.ac.uk",
        "https://en.wikipedia.org/wiki/Main_Page"
      ],
      "type": {
        "primitive": "text",
        "subtype":   "iri",
        "label":     "Favourites",
        "quantity":  "*",
        "default":   "http://www.sanger.ac.uk"
      }
    }
  }
}

Untyped, simplex collection:

{
  "links": {
    "self":       "/people",
    "collection": "/people/{id}"
  },
  "state": {
    "collection": ["foo", "bar", "quux"]
  }
}

See earlier for an example of a typed simplex and complex collection.

Typed, simplex collection, with documentation and subordinate contract relations:

{
  "links": {
    "self":       "/people",
    "docs":       "/docs/people",
    "contract":   "/contracts/checkPerson",
    "collection": "/people/{person}"
  },
  "state": {
    "collection": {
      "value": ["foo", "bar", "quux"],
      "type": {
        "primitive": "collection",
        "subtype": {
          "person": {
            "primitive": "text",
            "mutable":   false
          },
          "name": {
            "primitive": "text",
            "label":     "Full Name"
          },
          "email": {
            "primitive": "text",
            "subtype":   "email",
            "label":     "E-Mail Address(es)",
            "quantity":  "*"
          },
          "dob": {
            "primitive": "text",
            "subtype":   "datetime",
            "label":     "Date of Birth",
            "quantity":  "?"
          }
        }
      }
    }
  }
}

Paginated linked collection, with a simple foreign resource:

{
  "links": {
    "self":   "/departments/hr?slice=3:6",
    "prev":   "/departments/hr?slice=:3",
    "next":   "/departments/hr?slice=6:9",
    "base":   "/departments/hr",
    "people": "/departments/hr/{person}",
    "logo": {
      "href":   "/assets/logo.png",
      "accept": "image/png"
    }
  },
  "state": {
    "id": "hr",
    "description": {
      "value": "Human Resources",
      "type": {
        "primitive": "text",
        "label":     "Department Name"
      }
    },
    "people": {
      "value": ["foo", "bar", "quux"],
      "type": {
        "primitive": "collection",
        "subtype":   "/people",
        "label":     "Staff",
        "quantity":  "+"
      }
    }
  }
}

Frequently Asked Questions

What does "hypr" stand for?

Originally, it didn't stand for anything; it just (according to its author) looked and sounded cool. Since then, we've justified it with a recursive backronym: "hypr yields pluripotent representations".

Why is there no versioning?

It's not uncommon for document formats to embed within them an indication of the specification they follow, so that they can be consumed appropriately. hypr documents lack this because:

  • Documents are meant to be representations of vertices and the specification version is tangential to this.

  • Representations are designed to be as lightweight as possible and cramming a version string in there complicates things, albeit trivially.

  • While nothing precludes future changes to the specification, it has been designed to have a broad remit. Nonetheless, should changes occur, these ought to manifest themselves as extensions (signalled via a resource's media type parameter). Only under critical circumstances will features be deprecated and this will apply retroactively.

Why not namespace or use sigils with hypr-specific link relations?

Using sigils on link relations, to distinguish them from arbitrary relations, is the approach that HAL[15] and JSON-LD[16] take; that is, prepending their semantic relations with an underscore or commercial at symbol, respectively. An elaboration of this would be to use a more developed namespacing scheme, such as hypr.self.

This would detract from the simplicity of the specification and imply that the semantics of hypr-specific link relations are somehow different to generic relations. This is not the case: the relations have been chosen specifically to be meaningful in a way that should not cause conflicts.

(Note also that registered link relations are specified[17] as being a string of lowercase alphabetic characters, allowing dots and dashes after the initial character. While namespacing is still possible, it would specifically forbid the likes of @self or _contract, etc.)

Why allow only one collection, at most?

This is a simplification that does not compromise the generality of the graph structure, but allows for easier operations against a vertex. Specifically, a POST on a vertex that admits multiple collections would need to be analysed to deduce which collection it applies to. It is far from inconceivable that such a disambiguation, based on POST data alone, would not be possible.

What is hypr's H Factor[18]?

  1. Link Support
  • LE (embedding links): Yes, within the link section, corresponding to respective state items.
  • LO (outbound links): Yes, within the link section, where this is no corresponding state or reserved semantic usage.
  • LT (templated queries): Yes, albeit implicitly as all representations can be queried by their type definition, given client support.
  • LN (non-idempotent updates): Yes, given client support.
  • LI (idempotent updates): Yes, given client support.
  1. Control Data Support
  • CR (control data for read requests): For hypr resources, this is implied and thus not required; otherwise, it can be explicitly stated for foreign resources.
  • CU (control data for update requests): For hypr resources, this is implied and thus not required; otherwise, it can be explicitly stated for foreign resources.
  • CM (control data for interface methods): For hypr resources, this is implied by the graph structure and the Allow header; otherwise, it can be explicitly stated for foreign resources.
  • CL (control data for links): Yes, per the link section.

i.e., Support ALL the things!

Why JavaScript?

JavaScript is the one true language.

Sarcasm aside (and now we've wiped off the rotten fruit), JavaScript is not mandated by this specification. However, realistically -- given its hegemony in the browser space, which represents a significant vector for client development -- hypr has been designed with JavaScript idioms in mind, simply to facilitate usage.

Isn't hypr's typing syntax a bit complex?

The type annotation syntax has been designed with progressive enhancement in mind. That is, the degree of expressiveness is the developer's prerogative, while accommodating a high degree of fidelity, should it be required.

Unfortunately, there are no widely-used type checking or annotation conventions in use with (plain) JavaScript or JSON, which is why we've invented our own. However, we have tried to strike a balance between familiarity, brevity and functionality. Representations are largely for machine consumption and generation, but have nonetheless been designed to be tractable to human users.

Why subtype text as an IRI when you already have links?

Good question... This may change!

Why allow collapsable structures?

Many hypr structures have two forms: a hash of various options (some of which may be omitted) and a short, scalar form which corresponds to the fundamental option and assumes defaults for the rest.

The clearest example is state element values, which are considered immutable and inherit their type from their JSON type in the short form, contra to their full form fidelity. That is, compare the following state block:

{
  "someElement": "some value"
}

...versus this one:

{
  "someElement": {
    "value": "some value",
    "type": {
      "primitive": "text",
      "mutable":   false
    }
  }
}

These representations are equivalent. From a human perspective, the short form makes sense, in that it's highly tractable. However, the primary consumer of the data will be software and, with this inconsistent interface, it would need to be written to either check which form it is dealing with, or "upsample" all short forms at runtime.

This is thus a valid concern, as it introduces (arguably) needless complexity in the name of readability. However, making such a clear distinction for immutable data, we believe, pushes the balance in our favour. Although only just!

(The bandwidth saving is quite trivial, so doesn't factor into the decision. However, for a service that deals with primarily read-only data, this may be beneficial.)

In other places where this collapsing is allowed, the short form is considered to be the primary form, insofar as it's more likely to be used. The expanded form is used to tweak defaults and, likewise, make a clear semantic distinction between model classes.

Acknowledgements

Per the discussion on hypermedia, the best ideas from existing representation formats -- particularly Collection+JSON[19] -- have been considered in hypr's design. Additional kudos should go to Leonard Richardson and Mike Amundsen for their book[20] on RESTful web APIs.

Discussion and iteration on hypr's specification was aided by my colleagues in the Human Genetics Informatics team and the wider informatics community at the Wellcome Trust Sanger Institute. Specific thanks go to Josh Randall and Irina Colgiu.

Next > Procedural-Graph Interface

References

  1. Fielding, R. et al (1999) Hypertext Transfer Protocol -- HTTP/1.1; IETF RFC2616 (see also RFC7230-7235)

  2. Fielding, R. T. (2000) Architectural Styles and the Design of Network-Based Software Architectures; University of California, Irvine; PhD Thesis

  3. Bray, T. (ed.) (2014) The JavaScript Object Notation (JSON) Data Interchange Format; IETF RFC7159

  4. Crocker, D. (ed.) et al (2008) Augmented BNF for Syntax Specifications: ABNF; IETF RFC5234

  5. Gregorio, J. et al (2012) URI Template; IETF RFC6570

  6. Klyne, G. et al (2002) Date and Time on the Internet: Timestamps; IETF RFC3339

  7. Nottingham, M. et al (eds.) (Retr. 2015) Link Relations; IANA Registry

  8. ECMAScript WG (2011) ECMAScript Language Specification; Ecma International, Standard ECMA-262 5.1 Edition (specifically §15.10)

  9. Duerst, M. et al (2005) Internationalized Resource Identifiers (IRIs); IETF RFC3987

  10. Resnick, P. (eds.) (2008) Internet Message Format; IETF RFC5322

  11. Dusseault, L. et al (2010) PATCH Method for HTTP; IETF RFC5789

  12. Bryan, P. et al (eds.) (2013) JavaScript Object Notation (JSON) Patch; IETF RFC6902

  13. Hoffman, P. et al (2014) JSON Merge Patch; IETF RFC7386

  14. Snell, J. (2014) HTTP Link and Unlink Methods; IETF Internet-Draft

  15. Kelly, M. (2013) HAL - Hypertext Application Language

  16. Sporny, M. et al (2014) JSON-LD 1.0; W3C Recommendation

  17. Nottingham, M. (2010) Web Linking; IETF RFC5988

  18. Amundsen, M. (2010) H Factor: Hypermedia Types

  19. Amundsen, M. (2013) Collection+JSON - Hypermedia Type

  20. Richardson, L. et al (2013) RESTful Web APIs; Sebastopol, CA: O'Reilly

⚠️ **GitHub.com Fallback** ⚠️