Namespace Handling Woes - mblinn/goavro GitHub Wiki

Namespace Handling Woes

Namespace handling, and some of the other name handling, is goavro is currently a bit horken. It's inconsistent both with the spec and the Java implementation. In the the current master (https://github.com/linkedin/goavro/tree/ad9aa16a0ebf65af62bf3a49881a0c63340e6caa as of this writing), there's even a failing test related to namespace handling, huzzah!

Anyhow this page explains the situation, a fix I made, and a fix that still needs to be made.

This is based on my reading of the Avro spec here - https://avro.apache.org/docs/1.7.7/spec.html#schemas, with an attempt to validate against the corresponding Java implementation as the assumed most-cannonical implementation

The Story Thus Far

Avro fairly precisely specifies a name format that applies in several situations.

  1. as a name for a named type (record, fixed, enum)
  2. as a name for a field in a record
  3. as the name for a field in an emum

A name, quoth the spec, must:

start with [A-Za-z_]

subsequently contain only [A-Za-z0-9_]

Additionally, avro specifies a fullname, that is, a name prepended with a namespace. The definition of a namespace is:

A namespace is a dot-separated sequence of such names

Per my reading of the spec, and the Java implementation, a fullname may be used for 1, that is, as a name for a named type but may not be used for 2 or 3.

As an example the following is valid:

{"namespace": "tst.example",
  "type": "record",
  "name": "tst.someother.User",
  "fields": [
      {"name": "aName", "type": "string"}
  ]
}

but this one is not

{"namespace": "tst.example",
 "type": "record",
 "name": "tst.someother.User",
 "fields": [
     {"name": "invalid.spot.for.a.fullname.aName", "type": "string"}
 ]
}

because a fullname cannot be used for a field name

Java Implementation Plot Twist

As stated before, avro specifies the definition of a namespace like so:

A namespace is a dot-separated sequence of such names

Meaning each portion of a period seperated namespace must be a valid name, so a namespace like the following:

$.%.!.^

leading to a fullname like:

$.%.!.^.SomeName

should be disallowed, as it's parts are not themselves valid names

However, the java implementation doesn't appear to actually validate the namespace portion of a fullname, or namespaces in other contexts like type declarations, it happily compiles the following schema, for instance:

{"namespace": "tst.example",
 "type": "record",
 "name": "$.%.User",
 "fields": [
     {"name": "aName", "type": "string"}
 ]
}

Where the namespace portion of the fullname for the top level record contains some characters that don't meet the definition of name. So there's that.

Goavro Current State

Currently (as of https://github.com/linkedin/goavro/tree/ad9aa16a0ebf65af62bf3a49881a0c63340e6caa) goavro seems to have a broken attempt to handle fullnames in record definitions, complete with a failing test!

I fixed the problem that is breaking that test in a commit here - https://github.com/mblinn/goavro/commit/9d6255a99eeefd4fb7d40c9b2614dadc1be66ab6, (as well as another failing test in the previous commit). When I did so, I made it work the way I think the Java version works (ie: without doing any validation on the namespace portion of the fullname) rather than the way the Avro spec says it should work.

This doesn't attempt to handle validation of field names, which should also be validated to be names (but not fullnames).