The "id" conundrum - sgpinkus/json-schema GitHub Wiki

THIS WIKI IS OBSOLETE. PLEASE SEE THE NEW JSON-SCHEMA-ORG/JSON-SCHEMA-SPEC REPOSITORY.

Note

The situation has evolved since this writeup. It remains a "fun" read, however ;)

What? Conundrum?

Yes, conundrum. This keyword, defined in draft v3 (see below), is a major source of disagreement even between members of the GitHub organization. I (the author of this page) say it must go in its current form, or be re(de)fined, so as to avoid its innumerable number of traps, while other members say it is OK as it is.

What the draft says

5.27.  id

   This attribute defines the current URI of this schema (this attribute
   is effectively a "self" link).  This URI MAY be relative or absolute.
   If the URI is relative it is resolved against the current URI of the
   parent schema it is contained in.  If this schema is not contained in
   any parent schema, the current URI of the parent schema is held to be
   the URI under which this schema was addressed.  If id is missing, the
   current URI of a schema is defined to be that of the parent schema.
   The current URI of the schema is also used to construct relative
   references such as for $ref.

In layman's terms: if you encounter an id keyword, wherever it may be in the schema, then you MAY consider that the URI for that particular subschema is the value of id resolved against the current root schema's URI.

What this keyword influences

OK, it influences many things, not only validation, which is my primary concern. But when it comes to validation, you may have to address other schemas if you encounter a JSON Reference.

And I am adamant that JSON Reference processing, when it comes to validation, must be, to quote Eben Moglen in this speech (personal recommendation: watch that video, it is really worth it), "reliable, reproducible and certain".

And id does not make that guarantee. At all.

Can this be fixed?

Yes indeed. It would require additional provisions to the draft to get rid of all problems completely. See the bottom of the page.

Is `id` used today?

Well, that is a good question indeed. I know of no implementation which uses id by the spec. NOT ONE. id is mostly used as a string identifier for schemas. Which was not its intended usage.

In fact, most schemas I have seen written so far don't use id for addressing but JSON Pointer (with reason: JSON Pointer is unambiguous).

Now, on to examples

Fasten your seatbelts.

Duplicate ids

Yes, the wording above does not forbid that. This schema is valid:

{
    "id": "http://foo.bar",
    "subschema": {
        "id": "http://foo.bar"
    }
}

What is http://foo.bar supposed to point to?

So is this schema:

{
    "id": "http://foo.bar",
    "subschema": {
        "id": "#foo"
    },
    "subschema2": {
        "id": "#foo"
    }
}

What is http://foo.bar#foo?

Conflicting URIs

Say you have a schema at http://foo.bar/schema.json which reads:

{
    "id": "http://foo.bar/schema.json",
    "subschema": {
        "id": "schema2.json",
        "type": "integer"
    }
}

and a schema at http://foo.bar/schema2.json which reads:

{
    "type": "boolean"
}

Remember the specification? In theory, an implementation MAY consider that schema subschema in the first schema has URI... http://foo.bar/schema2.json! Which means you end up with conflicting contents for the same URI.

And there is worse. Look at that:

{
    "$schema": "http://json-schema.org/draft-03/schema#",
    "subschema": {
        "id": "http://json-schema.org/draft-03/schema#"
    }
}

Now, some background:

http://json-schema.org/draft-03/schema# is the canonical URI of the meta-schema;
this meta-schema is itself a JSON Schema, and it validates all schemas written against it;
$schema says "this is the meta-schema this schema should be valid against".

What you have effectively done here is jeopardize schema validation itself. Congratulations ;)

Unreachable content

Yes, id can do that for you. Witness:

{
    "id": "http://foo.bar/x.json#/subschema",
    "subschema": {
        "whatever": [ "you", "want" ]
    }
}

You load this schema. You take for granted that the id at the root of the schema is the effective URI of this schema. And you cannot access subschema _AT ALL_.

Oh, and there is this situation too:

{
    "id": "http://foo.bar/schema.json",
    "subschema": {
        "id": "children/otherschema.json"
    }
}

Now, let us say that you have this JSON Reference to resolve:

{
    "$ref": "http://foo.bar/children/otherschema.json"
}

but there is no content at that absolute URI. That means:

if you are currently "in" http://foo.bar/schema.json, the reference resolves successfully;
if you are outside of it, it fails to resolve at all...

How to fix that

Here are the suggested rules for fixing this mess:

`id` in root schemas

In root schemas, id MUST be absolute. It MUST have no, or an empty, fragment part.
Implementations SHOULD ignore the value of id if the rules above are not met.
If the schema has been loaded from another URI than the one mentioned in id, implementations SHOULD consider that the schema URI is the loading URI, not the one in id.

`id` in subschemas

In subschemas, id MUST be a fragment only URI. The fragment MUST NOT be empty, and MUST NOT start with a solidus (/) [this is to avoid conflicts with JSON Pointer].
The same id MUST NOT be used twice in a same schema.
Implementations MUST raise an exception if the rules above are not met.