hypr - wtsi-hgi/hgi-web GitHub Wiki
- Hypermedia
- Modelling the Graph
- hypr: A Hypermedia Type
- Procedural-Graph Interface
Note that this is non-normative and has evolved from an earlier iteration of this paper, which provides the motivation for a new format and takes stock of the mechanisms that are available, the design constraints that have been imposed and the system that ultimately needs to be modelled.
To summarise:
-
We wish to represent an information system on a connected graph in a self-describing way.
-
We get a lot for free just off the back of HTTP[1] and the RESTful architecture[2]; this should be exploited to the fullest extent to avoid duplication of effort.
-
We strive for simplicity and regularity in our representation, using progressive enhancement and type inference as means to specificity and precision, where needed.
-
Serialisation will be offered only as a JSON[3] derivative. (Other serialisations, such as XML, would be possible, but are not deemed to be worth the effort.)
Once matured, a formal specification will be written and registered with
the IANA, initially using the media type application/vnd.hypr
, with
the ultimate intent of standardisation to application/hypr
.
The general structure of a hypr document will be to bisect a vertex's representation into its edges and state. Edges are defined simply as key-value pairs, where the key denotes the link relation associated with IRI(s) as values. Likewise, state is a key-value store which comes with optional type definitions and semantics for linking.
Throughout the following definitions, ABNF[4] will be used to describe the grammar. Such definitions are not to be taken strictly, in the sense of whitespace invariance, character literals and undefined implications or derivations from host grammars (these are described within angle brackets).
For the purposes of linking, IRIs are assumed and URI templates[5] -- lifted on to IRIs -- are employed to denote parameterised edges. For convention's sake, these are defined as follows:
json-iri = <JSON serialised IRI string>
json-iri-template = <JSON serialised IRI template string>
Thus we have the following general structure:
vertex = "{" edges [ "," state ] "}"
Note that state can be omitted for the sake of providing a minimal representation for symlinking.
A vertex must have at least a loop (link to self). Otherwise, edges may be arbitrary. Although, as discussed later, some link relations have semantic meaning within hypr documents:
edges = DQUOTE "links" DQUOTE ":" edge-hash
edge-hash = "{"
self-link
*( "," cond-link )
*( "," state-link )
[ "," collection-link ]
*( "," rel-link )
"}"
self-link = DQUOTE "self" DQUOTE ":" json-iri
cond-link = DQUOTE cond-key DQUOTE ":" json-iri
cond-key = "next" / "prev" / "base"
state-link = DQUOTE state-key DQUOTE ":" link
state-key = <JSON key matching state key>
collection-link = DQUOTE collection-key DQUOTE ":" json-iri-template
collection-key = <JSON key matching state's collection key>
rel-link = DQUOTE relation DQUOTE ":"
( link / "[" link 1*( "," link ) "]" )
relation = <JSON hash key>
link = json-iri
Note that while we refer to vertices' edges at a technical level, the format calls them "links" for familiarity's sake.
When a vertex's loop differs from the request IRI, this should indicate
a symbolic link. This ought to be picked up by the server, before
reaching the client. How it responds to this inconsistency is
implementation dependant, however ultimately the representation response
should be that of the loop IRI. This can be done via an appropriate HTTP
redirect (i.e., 303 See Other
, or 302 Found
), or by simply
supplanting the terminal vertex's representation in the same hop. Any
state data or non-loops in a symlink vertex should be ignored.
(Note that there is an argument for the vertex's loop to be a fully qualified IRL, rather than an arbitrary IRI. Part of its purpose is to encapsulate self-reference and an IRI may not provide enough information for dereferencing.)
Otherwise, there are four classes of relation-IRI pair:
-
Conditional links are only relevant to conditional representations, per their definition in the previous section. Their semantics are discussed later, but are the responsibility of the server.
-
State links are associated by their relation name with arbitrary keys within the vertex's state. This allows arbitrary linking of state elements.
-
A vertex may have at most one collection associated with it. This is defined both as a link relation and matching state element, by key. This differs from the above in that the IRI must be templated and, providing the collection is mutable, infers the locus of
POST
requests. -
Beyond these, arbitrary pairs can be defined, with the restriction of a single IRI per relation lifted. Note, however, that there are three reserved relations of this class, discussed later:
docs
,view
andcontract
. Otherwise, how relations are to be interpreted is application dependant.
Note, for the latter, multiple edges under a particular link relation ought to be unique. This isn't mandated by the specification, but should be enforced by any implementation.
You need to build a system that is futureproof; it's no good just making a modular system. You need to realise that your system is just going to be a module in some bigger system to come, and so you have to be part of something else, and it's a bit of a way of life.
— Tim Berners-Lee
The above method of defining edges is specific to resources within a hypr graph. It is minimal because hypr is designed to infer protocol semantics from its graphical structure. However, as part of the wider Internet, it would be useful to link with non-hypr resources. To this end, foreign resource links can be augmented with metadata to facilitate better client control. Specifically, answering the following questions:
-
What can be done with the resource? Presumably, it can be fetched, maybe it can also be updated or deleted. This corresponds with the HTTP
Allow
response header and, by encoding this, a client can provide appropriate interfaces. (By encoding this, rather than assuming the foreign server responds with a correctAllow
header, we save anOPTIONS
hop and also declare explicitly how a hypr resource can interact with a foreign one.) -
What constraints apply to the resource, when reading or writing? For example, what media type should a client accept, or how should a request be encoded, etc.?
link =/ foreign-link
foreign-link = "{"
DQUOTE "href" DQUOTE ":" json-iri
[ "," DQUOTE "allow" DQUOTE ":" foreign-allow ]
[ "," DQUOTE "accept" DQUOTE ":" foreign-accept ]
[ "," DQUOTE "content" DQUOTE ":" foreign-content ]
"}"
foreign-allow = http-method / "[" http-method 1*3( "," http-method ) "]"
http-method = DQUOTE ( "GET" / "POST" / "PUT" / "DELETE" ) DQUOTE
foreign-accept = media-range
=/ "{"
DQUOTE "type" DQUOTE ":" media-range
[ "," DQUOTE "charset" DQUOTE ":" accept-charset ]
[ "," DQUOTE "encoding" DQUOTE ":" accept-encoding ]
[ "," DQUOTE "language" DQUOTE ":" accept-language ]
"}"
media-range = <JSON serialised string, per RFC7231 §5.3.2>
accept-charset = <JSON serialised string, per RFC7231 §5.3.3>
accept-encoding = <JSON serialised string, per RFC7231 §5.3.4>
accept-language = <JSON serialised string, per RFC7231 §5.3.5>
foreign-content = content-type
=/ "{"
DQUOTE "type" DQUOTE ":" content-type
DQUOTE "encoding" QUOTE ":" content-encoding ]
"}"
content-type = <JSON serialised string, per RFC7321 §3.1.1.1>
content-encoding = <JSON serialised string, per RFC7321 §3.1.2.2>
If the allow
value is omitted, then GET
is assumed. Note that, when
allowing multiple methods, the JSON array is implied (but not specified)
as having unique elements.
The accept
and content
values mimic a relevant subset of the
respective Accept-*
and Content-*
HTTP headers. If they are omitted,
or partially specified, then a client is free to defer to its defaults.
The accept
value implies an allowance for GET
. If the allow
value
exists but does not include GET
, then the accept
value should be
ignored. Similarly, a content
value implies PUT
and/or POST
; in
which case, the accept
value must contain at least one of these, else
it too is to be ignored.
A resource that only allows DELETE
seems an unlikely scenario, to say
the least! Nonetheless, it is specifiable, for the sake of completeness.
OPTIONS
and HEAD
support are not required.
Vertex state is represented as a key-value store. At its simplest, keys are just associated with values. However, a variety of type level annotations can be added to enforce increasingly stronger guarantees.
state = DQUOTE "state" DQUOTE ":" state-hash
state-hash = "{" element *( "," element ) "}"
element = DQUOTE state-key DQUOTE ":" element-data
state-key = <JSON hash key>
element-data = element-val
=/ "{"
DQUOTE "value" DQUOTE ":" element-val ","
type-def
"}"
element-val = <Any valid JSON> / collection
; n.b., "Any valid JSON" also covers arrays and objects
collection = "[" [ coll-val *( "," coll-val ) ] "]"
; n.b., the JSON array must be homogenous (or empty)
coll-val = json-string / vertex
json-string = <JSON string>
Note that the entire state section is optional, but if it exists, it must contain at least one element. (That is, an empty JSON object is not valid state.)
An element's value can be arbitrary, but should always match its type (if present). A state element without a type definition should be considered immutable and inherit its type from its JSON primitive type. This is clearly limiting, but allowed.
If there is a collection element -- i.e., where the state key matches a link relation with a templated IRI -- then its value must be one of two things:
-
Complex: A (potentially empty) JSON array of fully embedded hypr representations, corresponding to the subordinate resources.
-
Simplex: A (potentially empty) JSON array of valid/existing subordinate resource names, per the template key as a means of dereference.
The depth at which collections are embedded (i.e., for collections of collections of...) must be set by the server; either globally or at a resource level. When the embedding depth reaches zero, then the representation should switch to the simplex representation. For example, a depth of one would embed the first level of subordinates in their entirety, but second level collections would be represented in simplex.
Server implementations may wish to specify a means for limitless embedding -- i.e., as collections are guaranteed to not contain non-trivial cycles, this amounts to embedding the entire subtree -- although this may not be a particularly useful feature! The depth setting may also be overridable on request (e.g., using a proprietary HTTP header). Also note that there is no duplex form, where simplex and complex representations are mixed.
Clearly the complex form necessitates subordinate loops being included, hence providing their means of dereference. However, as each subordinate is necessarily of the same type, this would be at the expense of duplicating information. While the bandwidth impact of doing so may be mitigated by HTTP compression, this redundantly complicates the representation. To resolve this -- and the related problem of not supplying enough type information for simplex embedding -- the subordinate type is lifted out from the embedded state and represented in the collection's subtype.
The exact mechanics of this are discussed in the next section, but the following serve as an example:
Simplex
{
"links": {
"self": "/people",
"collection": "/people/{id}"
},
"state": {
"collection": {
"value": ["foo", "bar", "quux"],
"type": {
"primitive": "collection",
"subtype": {
"id": {
"primitive": "text",
"mutable": false
},
"name": {
"primitive": "text",
"label": "Full Name"
}
}
}
}
}
}
Note that the simplex form can omit the type definition entirely -- the
fact that it's a collection can be inferred from the templated IRI --
making it impossible to POST
to the collection and forcing any client
to assume type information from its JSON primitives. This gives a
minimal representation which may be appropriate in read-only scenarios.
Complex
{
"links": {
"self": "/people",
"collection": "/people/{id}"
},
"state": {
"collection": {
"value": [
{
"links": {
"self": "/people/foo"
},
"state": {
"id": "foo",
"name": "Joe Bloggs"
}
},
{
"links": {
"self": "/people/bar"
},
"state": {
"id": "bar",
"name": "President Business"
}
},
{
"links": {
"self": "/people/quux"
},
"state": {
"id": "quux",
"name": "Darth Vader"
}
}
],
"type": {
"primitive": "collection",
"subtype": {
"id": {
"primitive": "text",
"mutable": false
},
"name": {
"primitive": "text",
"label": "Name"
}
}
}
}
}
}
Here, collection
has been chosen as the collection element key for
clarity. However it can be arbitrary and should be chosen to avoid
collisions with other state elements while, ideally, remaining
meaningful (e.g., the same as the IRI's basename would probably be
appropriate; people
in this example seems fitting).
When collections are only minimally embedded into vertices (i.e., the simplex case, when only enough information is present to dereference) it is clearly inapplicable for key queries to descend into them. However, the presence of a collection should always be transparent to the query (i.e., it is always present in the representation, at all embedding depths).
Gradual typing affords more than just the ability to reason about data,
in terms of its meaning. A suitably rich type system -- when exposed to
a client -- could aid in generating an appropriate user interface. For
example, if age
were defined to be an integer between 0 and 120, a
client could render a slider widget rather than a plain text input.
JSON only admits a handful of primitive types: null, Unicode strings (usually encoded in UTF-8, but this is not a requirement), numbers (again, usually but not necessarily double precision floating point) and Booleans; with syntax for arrays (indexed collections) and objects (hashed collections). To meet the complexity of our typing system, we must therefore overload what we have by subtyping (parameterising) wherever necessary:
Primitive | JSON Primitive | Subtype |
---|---|---|
Bottom | Null | |
Textual | String | (Pattern; IRI; E-Mail) |
Numeric | Number | (Set; Range; Stepping) |
Temporal | String | Date-Time |
Logical | Boolean | |
Enumeration | Collection | |
Raw | String | Media Type + Encoding |
Note that parenthetical subtypes are considered optional enhancements; for example, a string could come with a regular expression. Furthermore, enumerations can be represented as either an array or an object; the latter's key-value model affording greater fidelity.
Thus two fields are required to represent type efficiently: primitive
and subtype
. The former is limited to JSON primitives: null
, text
(for strings), number
and bool
. An optional dependent subtype can
also be specified:
-
Textual subtypes:
- Temporal data as
datetime
, per ISO 8601[6]. - Raw data as its IANA media type, per [7], with its encoding scheme (if omitted, base64 is assumed).
- Additionally, a regular expression can be used as a subtype to specify pattern validation, prepended with a slash character. (The regexp engine is implementation dependent, but must minimally support JavaScript's capabilities[8].)
- The predefined patterns
iri
andemail
are also defined for IRIs[9] and e-mail addresses[10], respectively. This is largely for the client's benefit.
- Temporal data as
-
Numeric subtypes:
- Sets can be specified as
int
orfloat
. - Ranges can be specified using open, half-open or closed interval notation; an empty string at either terminal can be used to denote extremities (i.e., ±∞).
- Stepping can also be specified with a range by appending it after a slash character. (Stepping must be with respect to a finite interval minimum.)
- Sets can be specified as
If a subtype is not specified, then the semantics of the JSON primitive
type take effect. Note also that it's unlikely that a null
type would
ever be needed in practice (as opposed to just omitting the data
element from the state).
The following serve to illustrate various examples:
Plain text:
primitive: text
PNG image:
primitive: text
subtype: image/png
IRI:
primitive: text
subtype: iri
Integers:
primitive: number
subtype: int
Even numbers between 0 and 10 (inclusive):
primitive: number
subtype: int[0,10]/2
Odd numbers:
primitive: number
subtype: int[1,)/2
Tenths from 0 up to, but excluding, 1:
primitive: number
subtype: float[0,1)/0.1
Normalised Gaussian integers [for example's sake, only!]:
primitive: text
subtype: /^(0|-?([1-9]\d*|(([1-9]\d*[+-])?([2-9]?|[1-9]\d+)i)))$
The range of subtypes may be extended, whereas primitive types are obviously fixed. The above is formally defined as follows:
type = DQUOTE primitive DQUOTE
=/ enumeration
primitive = "null"
=/ "text"
=/ "number"
=/ "bool"
enumeration = "{" kv-pair *( "," kv-pair ) "}"
kv-pair = DQUOTE key DQUOTE ":" json-string
key = <JSON hash key>
subtype = DQUOTE textual-st DQUOTE
=/ DQUOTE numeric-st DQUOTE
textual-st = "datetime"
=/ media-type [ ";" encoding ]
=/ "iri"
=/ "email"
=/ "/" regexp
media-type = <IANA registered media type>
encoding = "base64" / "percent" / "raw"
; base64 encoding
; percent encoding, per RFC1738
; no encoding
regexp = <Serialised regular expression>
numeric-st = set [ range ]
set = "int" / "float"
range = interval [ "/" stepping ]
interval = ( "[" / "(" ) [ min ] "," [ max ] ( "]" / ")" )
min = <Serialised numeric expression>
; <= max
max = <Serialised numeric expression>
; >= min
stepping = <Serialised numeric expression>
; > 0 && finite min && <= max - min
To account for homogeneous collections and optionality, a type can be quantified. The same syntax is used per regular expression quantifiers, with the same aliases for "at most one" (i.e., optional), "at least one" and "any amount". The default quantifier, if not specified, is "exactly one".
quant = DQUOTE quantifier DQUOTE
quantifier = "{" exact "}" ; exact
=/ "{" min "," [ max ] "}" ; range
=/ "?" ; at most one {0,1}
=/ "+" ; at least one {1,}
=/ "*" ; any amount {0,}
exact = <Serialised integer>
; > 0
min = <Serialised integer>
; >= 0 && < max
max = <Serialised integer>
; > min
When only a single value is quantified ({1}
or {0,1}
), then the data
should be represented as a scalar value (or omitted in the optional
case). Otherwise, data should be represented as a JSON array.
A default value can also be specified, which applies to scalar data. Thus, if a homogeneous collection is quantified, the default value represents that of individual elements. (Note that this is largely a UX consideration, but still must be enforced if data is omitted from a request.) If quantified as mandatory and no default is specified, then the server should raise a 4xx error when data is omitted from a request.
By default, a typed data element is mutable, whereas an untyped one
isn't. A typed element can be made immutable. In this case, it is
implied that the data is generated -- or in some way handled -- by the
server. Typed elements should come with a label (regardless of their
mutability) to provide UX context to the client (cf. HTML's label
element); non-typed elements or label-less elements should probably be
hidden from user consumption.
Note that immutability does not imply necessity, but any quantification should be ignored in requests, as it cannot be satisfied by a client.
The above enables rich typing on scalar data and homogeneous collections
thereof, which are embedded within the vertex's state. However, it is
also necessary to type subordinate vertex (i.e., heterogeneous)
collections. These come in two flavours: genuine subordinates and linked
collections. Both can be accommodated with a new primitive type of
collection
and a suitably defined subtype, which we introduced
earlier.
Note that for collection primitives, the quantifier defaults to *
--
any amount -- rather than {1}
, in the sense that it applies as the
quantification on the collection's size. (As such, exact quantifiers are
representative of fixed length collections where all items must be
present; a tuple. It is unlikely that such collections would be useful
in real life!) Furthermore, the default value for collection primitives
is not applicable and, if present, should be ignored.
-
Genuine subordinates can be typed against a hash of primitive type definitions, recursively per the above.
-
Linked collections can be typed by IRI to its collection head. Note that the protocol semantics against such a collection imply linking, rather than creation; i.e., a
POST
to a linked collection will create a link vertex that refers back to the respective item in the genuine collection, per the specified IRI.
primitive =/ "collection"
subtype =/ collection-st
collection-st = json-iri
=/ "{" coll-type *( "," coll-type ) "}"
coll-type = DQUOTE coll-key DQUOTE ":" definition
coll-key = <JSON key matching collection's state key>
Thus the final type definition is specified as follows:
type-def = DQUOTE "type" DQUOTE ":" definition
definition = "{"
major-type
[ "," minor-type ]
[ "," label ]
[ "," quantity ]
[ "," default ]
[ "," mutable ]
"}"
major-type = DQUOTE "primitive" DQUOTE ":" type
minor-type = DQUOTE "subtype" DQUOTE ":" subtype
; n.b., subtype depends on primitive
label = DQUOTE "label" DQUOTE ":" <JSON string>
quantity = DQUOTE "quantity" DQUOTE ":" quant
default = DQUOTE "default" DQUOTE ":" <Scalar JSON data>
mutable = DQUOTE "mutable" DQUOTE ":" ( "true" / "false" )
Programmatic precondition contracts can be specified on each vertex by
using the contract
link relation and supplying the IRI(s) to the code
(there is no facility to inline the code within the representation), per
the Fielding constraint of "Code on Demand". There is no presumption on
the language (i.e., JavaScript is not mandated) or mechanisms it employs
to specify contracts for the client to execute. Despite this
agnosticism, a common interface is still required:
-
The contract should be callable, taking the vertex state as its primary input.
-
The contract should be a predicate and, ideally, also provide a channel for signalling errors back to the caller.
-
If the environment allows it, contracts should be pure functions (i.e., not allowing side effects); otherwise, the client should at least protect state from mutation and ideally sandbox all contract execution appropriately.
-
Multiple contracts per vertex are allowed; in which case, their results are conjoined to give the final outcome. (State is immutable, per the above, so order doesn't matter.)
-
Contracts should be executed after type checking.
-
Clients are free to ignore contracts. However, servers are obliged to execute them before processing requests. Either way, it is implied that the code is the same for both parties.
For example, if state is passed that includes numeric height
and
weight
fields, using JavaScript with CommonJS modules and the "error
first callback" convention, a contract could be coded as follows:
var between = function(x, min, max) {
return x > min && x < max;
};
module.exports = function(state, callback) {
var err = [],
passed = false,
bmi;
// Make sure state is immutable
// (This should be done by the client, but is here for illustration)
if (!Object.isFrozen(state)) {
Object.freeze(state);
}
// Reject invalid values
// (This should be done in type checking)
if (!between(state.height, 0.5, 2.5) {
err.push(new Error('Height should be between 0.5 and 2.5m'));
}
if (!between(state.weight, 0, 200) {
err.push(new Error('Weight should be between 0 and 200kg'));
}
// Predication on BMI
// (Comments on the validity of BMI can be forwarded to /dev/null)
if (!err.length) {
bmi = state.height / (state.weight * state.weight);
passed = between(bmi, 25, 30);
if (!passed) {
err.push(new Error('Abnormal BMI'));
}
}
// Return
callback(passed ? null : err, passed);
};
The contract resource would need to signal how a client is to interpret
and execute it within its response. Its Content-Type
header would be
appropriate, providing it can specify the necessary fidelity (e.g.,
application/javascript
can't currently be parameterised to indicate
CommonJS modules). In the event that such information cannot be
conveyed, the resource's media type must instead be parameterised to
express the interface for the client (see later).
Of course, while self-describing data -- as far as machines are
concerned -- is the aim, in the context of an API there is some
expectation of human interaction. To facilitate this, two link relations
on a vertex have reserved semantics: docs
and view
.
The first specifies references to human readable documentation for each vertex. This can cover anything that is felt appropriate, but should probably focus on application semantics at a vertex and state level. This will enable human agents to more efficiently process and navigate the API, without having to manually parse raw JSON.
In addition to this, presentation templates can be associated with a vertex such that its state can be rendered in a more palatable way, rather than deferring to (or in concert with) the semantics of the data's type definitions. While this is clearly a cosmetic consideration, which adds burden on development and maintenance, it would facilitate customisability and, as UX is very important to human users, adoption.
How these resources are represented and interpreted by the clients is arbitrary and out of the scope of this specification. They are also to be considered optional and the client is free to ignore them.
(n.b., These links are to be encoded into the representation, rather
than using the Link
response header, for the sake of encapsulation,
should it be taken out of the context of HTTP.)
If the vertex state is filtered using conditional queries, then the vertex's loop must reflect this. For example:
{
"links": {
"self": "/?select=id"
}
}
(n.b., For the sake of brevity, the state
section in this and all
other examples within this section have been omitted.)
In the case of slicing, next
and prev
relations must be inserted by
the server, where appropriate, maintaining the (normalised) values of
any other queries. It is the server's responsibility to calculate
appropriate slices to remain within a collection's bounds.
For example:
{
"links": {
"self": "/people?q=(dept=hr)&slice=10:20",
"next": "/people?q=(dept=hr)&slice=20:30",
"prev": "/people?q=(dept=hr)&slice=0:10"
}
}
Note that, clearly, collection vertices must be indexed consistently for slicing to work. How this is done is up to the server implementation (transparent to any client), but should ultimately rely on static vertex metadata, rather than something internal to its state (e.g., creation time or IRI, as opposed to some arbitrary field).
The server may optionally -- but preferably -- include an unfiltered
link on a conditional representation using the base
relation:
{
"links": {
"self": "/products?select=id,desc&q=(and(cost>=50)(cost<=100))",
"base": "/products"
}
}
If the conditional representation contains no state, then the state
section should be omitted from the response (rather than supplying an
illegal, empty state). Alternatively, the server may respond with 204 No Content
, if that makes more sense.
It would be nice to have an analogue to SQL's UPDATE...WHERE
clause,
but it wouldn't make sense within this model. That is because the
querying operates at a resource level (i.e., vertices), rather than at
the state level; it just uses state to drive the query.
For example, in a relational database, you cannot update a selection of tables based on their shared content in a single operation. You update a single table, based on record content (as records are homogeneous). While resources that are part of a collection will be homogeneous, in general this is not true and resources are otherwise designed to be independent key-value stores.
Thus, to conditionally update a collection of vertices, one would have
to do so from within the client by filtering the collection (i.e., a
conditional representation of the collection's parent vertex) and then
applying a PUT
(or DELETE
) against each vertex in the result set, in
turn. (That said, it would be advantageous for clients to implement such
functionality.)
That notwithstanding, a PUT
, POST
or DELETE
request against a
conditional representation of a resource should act against the holistic
resource (i.e., as though no query were applied). The caveat being if a
selection query masks mandatory state, then the server must not allow
respective PUT
and/or POST
requests -- as not enough information
will be available for them to be correctly formed -- by removing them
from the Allow
response header in that instance.
It may be useful to create resources that persist queries against another resource (cf. SQL views). This can save recreating complex logic and mitigates against any address length limits imposed by browsers or other clients.
(Note that the original HTTP specification does not specify a maximum
IRI length, but allows the server to respond with a 414 Request-URI Too Long
error code if it cannot handle it. The refreshed specification
recommends a maximum of 8KB. In reality, as of writing, browsers and
search engines support up to 2KB.)
This can be done in a similar way to how symlinking is proposed: The
self
link must contain a non-trivial query component and, importantly,
its logical address must differ from the request IRI; additionally, the
representation should contain no state section (although this doesn't
differ from a query that returns no results).
For example, say the representation of a resource at /depts/hr
was as
follows:
{
"links": {
"self": "/people?q=(dept=hr)"
}
}
The server can then "do the right thing" and serve the appropriate conditional representation. (Ideally, this would not induce an HTTP redirect.) Furthermore, persistent query resources (PQR) should be transparent to additional queries -- i.e., acting as though it were a normal hypr resource -- effectively making them chainable by conjunction.
Continuing the example, a resource at /depts/hr/managers
may be
represented as:
{
"links": {
"self": "/depts/hr?q=(not(staff=null))&select=name,mail"
}
}
Note that the edges to these vertices ought to be represented somewhere
within the graph, otherwise the resources would be undiscoverable. For
instance, the /depts/hr
representation could have a managers
link
relation that points to /depts/hr/managers
.
Thus, links in a PQR should augment (and override) those in the representation it's querying against. This also implies that PQRs can ultimately inherit the operations implied against them from their base representation. The semantics of which should follow per the previous section.
To represent HTTP errors (i.e., 4xx or 5xx status codes), we return a
very simple representation consisting of the request's loop and a
single, textual (but otherwise untyped and immutable) state element of
error
, which gives a human-readable description of the failure. The
HTTP status code is not encoded into the representation as this --
amongst other metadata -- can be got directly from the response headers.
For example, presuming a 404 Not Found
status:
{
"links": { "self": "/this/is/not/the/resource/you/are/looking/for" },
"state": { "error": "Resource not found." }
}
The media type takes the following parameters:
-
charset
(optional): Definitively specify the document's character encoding, overriding any internal definition (default to UTF-8). -
qkeys
(optional): Override the default query component key tuple (i.e.,(select, q, slice)
). All keys must appear, following the same ordering. -
interface
(optional): Specify any conditions on the client required to invoke contract code. TODO: This still needs more thought... -
ext
(optional): A comma delimited list of semantic and syntactic hypr extensions applicable to the resource. (Note that a registry of extensions and an acceptance policy will thus be needed in some capacity.)
For example:
application/vnd.hypr; charset=utf8; qkeys=(k,v,slice)
When a PUT
or POST
request is made, the request body should be a
JSON object that corresponds with the respective resource's state
section, collapsed to simple key-value pairs (i.e., with no additional
type information). For example:
{
"id": 1234,
"name": "foo",
"mail": "[email protected]"
}
That is, PUT
and POST
resources must accept application/json
requests. The primitive type of the data values, in the JSON response,
must match their definition per the request (i.e., before "proper" type
checking is performed). Resources may also, optionally, accept
application/x-www-form-urlencoded
requests, at the implementation's
discretion.
A PUT
or POST
request must fail with a 400 Bad Request
error if:
- The request omits any state that is defined to be mandatory.
- The request fails type checking in any other way.
- Any programmatic contracts applied against the request return false.
If a PUT
or POST
request is made that attempts to change immutable
or undefined data, then the server has the option to:
-
Either accept the request and transparently ignore any invalid state transitions, while applying anything that validates. While this would not put resources into an invalid state, it might be a little "passive aggressive" from a UX point of view!
-
A better option -- presuming a client were built in such a way to minimise this occurrence -- would be to fail with a 4xx error. This should probably again be the generic
400 Bad Request
error; however,405 Method Not Allowed
and409 Conflict
may be appropriate (although run the risk of being confused with an unsupported method or edit conflict, respectively).
Note that if data that is defined to be immutable is missing from the
request, then that should be ignored by the server, regardless of the
above policy. PUT
requests must only apply to the requested resource
and do not descend into embedded resources, in the presence of a
collection.
DELETE
requests do not require a request body.
It may be useful for resources that support PUT
requests to also
support PATCH
requests, to modify state by diff. As the underlying
representation is JSON based, this would necessitate a JSON diff'ing
method. Of which, there are currently two standards:
-
JSON Patch[12], which delimits a sequence of operations to apply to a JSON document, not dissimilar to a traditional textual diff.
-
JSON Merge Patch[13], which applies changes based on structural differences between the two JSON documents.
Support for this -- including whichever method to employ -- is left unto
the whim of server developers, where the diff is applied to a resources
virtual state representation. (That is, the representation's state
section, collapsed to simple key-value pairs, with no additional type
information, per the above.)
Edges ought to be stable, insofar as they're either set, derived from
state or generated by the server. It doesn't make sense for an arbitrary
user to create or remove edges via the proposed LINK
and UNLINK
methods. The only time when direct edge manipulation makes sense is
during resource/schema definition, which would be outside the remit of
end users.
As such, LINK
and UNLINK
request support is not defined.
Minimal:
{
"links": {
"self": "/"
}
}
Simple, untyped state, with linked data:
{
"links": {
"self": "/people/foo",
"manager": "/people/bar"
},
"state": {
"name": "Joe Bloggs",
"mail": "[email protected]",
"manager": "President Business"
}
}
Typed state:
{
"links": {
"self": "/people/foo"
},
"state": {
"id": 123,
"name": {
"value": "Joe Bloggs",
"type": {
"primitive": "text",
"label": "Name"
}
},
"aliases": {
"value": ["jb123"],
"type": {
"primitive": "text",
"label": "Nicknames",
"quantity": "{1,}"
}
},
"photo": {
"value": "/9j/4AAQSk...",
"type": {
"primitive": "text",
"subtype": "image/jpeg;base64",
"label": "Avatar",
"quantity": "?"
}
},
"dob": {
"value": "1981-09-25",
"type": {
"primitive": "text",
"subtype": "datetime",
"label": "Date of Birth"
}
},
"bookmarks": {
"value": [
"http://www.sanger.ac.uk",
"https://en.wikipedia.org/wiki/Main_Page"
],
"type": {
"primitive": "text",
"subtype": "iri",
"label": "Favourites",
"quantity": "*",
"default": "http://www.sanger.ac.uk"
}
}
}
}
Untyped, simplex collection:
{
"links": {
"self": "/people",
"collection": "/people/{id}"
},
"state": {
"collection": ["foo", "bar", "quux"]
}
}
See earlier for an example of a typed simplex and complex collection.
Typed, simplex collection, with documentation and subordinate contract relations:
{
"links": {
"self": "/people",
"docs": "/docs/people",
"contract": "/contracts/checkPerson",
"collection": "/people/{person}"
},
"state": {
"collection": {
"value": ["foo", "bar", "quux"],
"type": {
"primitive": "collection",
"subtype": {
"person": {
"primitive": "text",
"mutable": false
},
"name": {
"primitive": "text",
"label": "Full Name"
},
"email": {
"primitive": "text",
"subtype": "email",
"label": "E-Mail Address(es)",
"quantity": "*"
},
"dob": {
"primitive": "text",
"subtype": "datetime",
"label": "Date of Birth",
"quantity": "?"
}
}
}
}
}
}
Paginated linked collection, with a simple foreign resource:
{
"links": {
"self": "/departments/hr?slice=3:6",
"prev": "/departments/hr?slice=:3",
"next": "/departments/hr?slice=6:9",
"base": "/departments/hr",
"people": "/departments/hr/{person}",
"logo": {
"href": "/assets/logo.png",
"accept": "image/png"
}
},
"state": {
"id": "hr",
"description": {
"value": "Human Resources",
"type": {
"primitive": "text",
"label": "Department Name"
}
},
"people": {
"value": ["foo", "bar", "quux"],
"type": {
"primitive": "collection",
"subtype": "/people",
"label": "Staff",
"quantity": "+"
}
}
}
}
What does "hypr" stand for?
Originally, it didn't stand for anything; it just (according to its author) looked and sounded cool. Since then, we've justified it with a recursive backronym: "hypr yields pluripotent representations".
Why is there no versioning?
It's not uncommon for document formats to embed within them an indication of the specification they follow, so that they can be consumed appropriately. hypr documents lack this because:
-
Documents are meant to be representations of vertices and the specification version is tangential to this.
-
Representations are designed to be as lightweight as possible and cramming a version string in there complicates things, albeit trivially.
-
While nothing precludes future changes to the specification, it has been designed to have a broad remit. Nonetheless, should changes occur, these ought to manifest themselves as extensions (signalled via a resource's media type parameter). Only under critical circumstances will features be deprecated and this will apply retroactively.
Why not namespace or use sigils with hypr-specific link relations?
Using sigils on link relations, to distinguish them from arbitrary
relations, is the approach that HAL[15] and JSON-LD[16] take; that is,
prepending their semantic relations with an underscore or commercial at
symbol, respectively. An elaboration of this would be to use a more
developed namespacing scheme, such as hypr.self
.
This would detract from the simplicity of the specification and imply that the semantics of hypr-specific link relations are somehow different to generic relations. This is not the case: the relations have been chosen specifically to be meaningful in a way that should not cause conflicts.
(Note also that registered link relations are specified[17] as being a
string of lowercase alphabetic characters, allowing dots and dashes
after the initial character. While namespacing is still possible, it
would specifically forbid the likes of @self
or _contract
, etc.)
Why allow only one collection, at most?
This is a simplification that does not compromise the generality of the
graph structure, but allows for easier operations against a vertex.
Specifically, a POST
on a vertex that admits multiple collections
would need to be analysed to deduce which collection it applies to. It
is far from inconceivable that such a disambiguation, based on POST
data alone, would not be possible.
What is hypr's H Factor[18]?
- Link Support
-
LE
(embedding links): Yes, within the link section, corresponding to respective state items. -
LO
(outbound links): Yes, within the link section, where this is no corresponding state or reserved semantic usage. -
LT
(templated queries): Yes, albeit implicitly as all representations can be queried by their type definition, given client support. -
LN
(non-idempotent updates): Yes, given client support. -
LI
(idempotent updates): Yes, given client support.
- Control Data Support
-
CR
(control data for read requests): For hypr resources, this is implied and thus not required; otherwise, it can be explicitly stated for foreign resources. -
CU
(control data for update requests): For hypr resources, this is implied and thus not required; otherwise, it can be explicitly stated for foreign resources. -
CM
(control data for interface methods): For hypr resources, this is implied by the graph structure and theAllow
header; otherwise, it can be explicitly stated for foreign resources. -
CL
(control data for links): Yes, per the link section.
i.e., Support ALL the things!
Why JavaScript?
JavaScript is the one true language.
Sarcasm aside (and now we've wiped off the rotten fruit), JavaScript is not mandated by this specification. However, realistically -- given its hegemony in the browser space, which represents a significant vector for client development -- hypr has been designed with JavaScript idioms in mind, simply to facilitate usage.
Isn't hypr's typing syntax a bit complex?
The type annotation syntax has been designed with progressive enhancement in mind. That is, the degree of expressiveness is the developer's prerogative, while accommodating a high degree of fidelity, should it be required.
Unfortunately, there are no widely-used type checking or annotation conventions in use with (plain) JavaScript or JSON, which is why we've invented our own. However, we have tried to strike a balance between familiarity, brevity and functionality. Representations are largely for machine consumption and generation, but have nonetheless been designed to be tractable to human users.
Why subtype text as an IRI when you already have links?
Good question... This may change!
Why allow collapsable structures?
Many hypr structures have two forms: a hash of various options (some of which may be omitted) and a short, scalar form which corresponds to the fundamental option and assumes defaults for the rest.
The clearest example is state element values, which are considered
immutable and inherit their type from their JSON type in the short form,
contra to their full form fidelity. That is, compare the following
state
block:
{
"someElement": "some value"
}
...versus this one:
{
"someElement": {
"value": "some value",
"type": {
"primitive": "text",
"mutable": false
}
}
}
These representations are equivalent. From a human perspective, the short form makes sense, in that it's highly tractable. However, the primary consumer of the data will be software and, with this inconsistent interface, it would need to be written to either check which form it is dealing with, or "upsample" all short forms at runtime.
This is thus a valid concern, as it introduces (arguably) needless complexity in the name of readability. However, making such a clear distinction for immutable data, we believe, pushes the balance in our favour. Although only just!
(The bandwidth saving is quite trivial, so doesn't factor into the decision. However, for a service that deals with primarily read-only data, this may be beneficial.)
In other places where this collapsing is allowed, the short form is considered to be the primary form, insofar as it's more likely to be used. The expanded form is used to tweak defaults and, likewise, make a clear semantic distinction between model classes.
Per the discussion on hypermedia, the best ideas from existing representation formats -- particularly Collection+JSON[19] -- have been considered in hypr's design. Additional kudos should go to Leonard Richardson and Mike Amundsen for their book[20] on RESTful web APIs.
Discussion and iteration on hypr's specification was aided by my colleagues in the Human Genetics Informatics team and the wider informatics community at the Wellcome Trust Sanger Institute. Specific thanks go to Josh Randall and Irina Colgiu.
Next > Procedural-Graph Interface
-
Fielding, R. et al (1999) Hypertext Transfer Protocol -- HTTP/1.1; IETF RFC2616 (see also RFC7230-7235)
-
Fielding, R. T. (2000) Architectural Styles and the Design of Network-Based Software Architectures; University of California, Irvine; PhD Thesis
-
Bray, T. (ed.) (2014) The JavaScript Object Notation (JSON) Data Interchange Format; IETF RFC7159
-
Crocker, D. (ed.) et al (2008) Augmented BNF for Syntax Specifications: ABNF; IETF RFC5234
-
Gregorio, J. et al (2012) URI Template; IETF RFC6570
-
Klyne, G. et al (2002) Date and Time on the Internet: Timestamps; IETF RFC3339
-
Nottingham, M. et al (eds.) (Retr. 2015) Link Relations; IANA Registry
-
ECMAScript WG (2011) ECMAScript Language Specification; Ecma International, Standard ECMA-262 5.1 Edition (specifically §15.10)
-
Duerst, M. et al (2005) Internationalized Resource Identifiers (IRIs); IETF RFC3987
-
Resnick, P. (eds.) (2008) Internet Message Format; IETF RFC5322
-
Dusseault, L. et al (2010) PATCH Method for HTTP; IETF RFC5789
-
Bryan, P. et al (eds.) (2013) JavaScript Object Notation (JSON) Patch; IETF RFC6902
-
Hoffman, P. et al (2014) JSON Merge Patch; IETF RFC7386
-
Snell, J. (2014) HTTP Link and Unlink Methods; IETF Internet-Draft
-
Kelly, M. (2013) HAL - Hypertext Application Language
-
Sporny, M. et al (2014) JSON-LD 1.0; W3C Recommendation
-
Nottingham, M. (2010) Web Linking; IETF RFC5988
-
Amundsen, M. (2010) H Factor: Hypermedia Types
-
Amundsen, M. (2013) Collection+JSON - Hypermedia Type
-
Richardson, L. et al (2013) RESTful Web APIs; Sebastopol, CA: O'Reilly