The "id" conundrum - sgpinkus/json-schema GitHub Wiki
JSON-SCHEMA-ORG/JSON-SCHEMA-SPEC REPOSITORY.
THIS WIKI IS OBSOLETE. PLEASE SEE THE NEWNote
The situation has evolved since this writeup. It remains a "fun" read, however ;)
What? Conundrum?
Yes, conundrum. This keyword, defined in draft v3 (see below), is a major source of disagreement even between members of the GitHub organization. I (the author of this page) say it must go in its current form, or be re(de)fined, so as to avoid its innumerable number of traps, while other members say it is OK as it is.
What the draft says
5.27. id
This attribute defines the current URI of this schema (this attribute
is effectively a "self" link). This URI MAY be relative or absolute.
If the URI is relative it is resolved against the current URI of the
parent schema it is contained in. If this schema is not contained in
any parent schema, the current URI of the parent schema is held to be
the URI under which this schema was addressed. If id is missing, the
current URI of a schema is defined to be that of the parent schema.
The current URI of the schema is also used to construct relative
references such as for $ref.
In layman's terms: if you encounter an id
keyword, wherever it may be in the schema, then you MAY consider that the URI for that particular subschema is the value of id
resolved against the current root schema's URI.
What this keyword influences
OK, it influences many things, not only validation, which is my primary concern. But when it comes to validation, you may have to address other schemas if you encounter a JSON Reference.
And I am adamant that JSON Reference processing, when it comes to validation, must be, to quote Eben Moglen in this speech (personal recommendation: watch that video, it is really worth it), "reliable, reproducible and certain".
And id
does not make that guarantee. At all.
Can this be fixed?
Yes indeed. It would require additional provisions to the draft to get rid of all problems completely. See the bottom of the page.
id
used today?
Is Well, that is a good question indeed. I know of no implementation which uses id
by the spec. NOT ONE. id
is mostly used as a string identifier for schemas. Which was not its intended usage.
In fact, most schemas I have seen written so far don't use id
for addressing but JSON Pointer (with reason: JSON Pointer is unambiguous).
Now, on to examples
Fasten your seatbelts.
Duplicate ids
Yes, the wording above does not forbid that. This schema is valid:
{
"id": "http://foo.bar",
"subschema": {
"id": "http://foo.bar"
}
}
What is http://foo.bar
supposed to point to?
So is this schema:
{
"id": "http://foo.bar",
"subschema": {
"id": "#foo"
},
"subschema2": {
"id": "#foo"
}
}
What is http://foo.bar#foo
?
Conflicting URIs
Say you have a schema at http://foo.bar/schema.json
which reads:
{
"id": "http://foo.bar/schema.json",
"subschema": {
"id": "schema2.json",
"type": "integer"
}
}
and a schema at http://foo.bar/schema2.json
which reads:
{
"type": "boolean"
}
Remember the specification? In theory, an implementation MAY consider that schema subschema
in the first schema has URI... http://foo.bar/schema2.json
! Which means you end up with conflicting contents for the same URI.
And there is worse. Look at that:
{
"$schema": "http://json-schema.org/draft-03/schema#",
"subschema": {
"id": "http://json-schema.org/draft-03/schema#"
}
}
Now, some background:
http://json-schema.org/draft-03/schema#
is the canonical URI of the meta-schema;- this meta-schema is itself a JSON Schema, and it validates all schemas written against it;
$schema
says "this is the meta-schema this schema should be valid against".
What you have effectively done here is jeopardize schema validation itself. Congratulations ;)
Unreachable content
Yes, id
can do that for you. Witness:
{
"id": "http://foo.bar/x.json#/subschema",
"subschema": {
"whatever": [ "you", "want" ]
}
}
You load this schema. You take for granted that the id
at the root of the schema is the effective URI of this schema. And you cannot access subschema
_AT ALL_.
Oh, and there is this situation too:
{
"id": "http://foo.bar/schema.json",
"subschema": {
"id": "children/otherschema.json"
}
}
Now, let us say that you have this JSON Reference to resolve:
{
"$ref": "http://foo.bar/children/otherschema.json"
}
but there is no content at that absolute URI. That means:
- if you are currently "in"
http://foo.bar/schema.json
, the reference resolves successfully; - if you are outside of it, it fails to resolve at all...
How to fix that
Here are the suggested rules for fixing this mess:
id
in root schemas
- In root schemas,
id
MUST be absolute. It MUST have no, or an empty, fragment part. - Implementations SHOULD ignore the value of
id
if the rules above are not met. - If the schema has been loaded from another URI than the one mentioned in
id
, implementations SHOULD consider that the schema URI is the loading URI, not the one inid
.
id
in subschemas
- In subschemas,
id
MUST be a fragment only URI. The fragment MUST NOT be empty, and MUST NOT start with a solidus (/
) [this is to avoid conflicts with JSON Pointer]. - The same
id
MUST NOT be used twice in a same schema. - Implementations MUST raise an exception if the rules above are not met.