Namespace Handling Woes - mblinn/goavro GitHub Wiki
Namespace Handling Woes
Namespace handling, and some of the other name handling, is goavro is currently a bit horken. It's inconsistent both with the spec and the Java implementation. In the the current master (https://github.com/linkedin/goavro/tree/ad9aa16a0ebf65af62bf3a49881a0c63340e6caa as of this writing), there's even a failing test related to namespace handling, huzzah!
Anyhow this page explains the situation, a fix I made, and a fix that still needs to be made.
This is based on my reading of the Avro spec here - https://avro.apache.org/docs/1.7.7/spec.html#schemas, with an attempt to validate against the corresponding Java implementation as the assumed most-cannonical implementation
The Story Thus Far
Avro fairly precisely specifies a name format that applies in several situations.
- as a name for a named type (record, fixed, enum)
- as a name for a field in a record
- as the name for a field in an emum
A name, quoth the spec, must:
start with [A-Za-z_]
subsequently contain only [A-Za-z0-9_]
Additionally, avro specifies a fullname, that is, a name prepended with a namespace. The definition of a namespace is:
A namespace is a dot-separated sequence of such names
Per my reading of the spec, and the Java implementation, a fullname may be used for 1, that is, as a name for a named type but may not be used for 2 or 3.
As an example the following is valid:
{"namespace": "tst.example",
"type": "record",
"name": "tst.someother.User",
"fields": [
{"name": "aName", "type": "string"}
]
}
but this one is not
{"namespace": "tst.example",
"type": "record",
"name": "tst.someother.User",
"fields": [
{"name": "invalid.spot.for.a.fullname.aName", "type": "string"}
]
}
because a fullname cannot be used for a field name
Java Implementation Plot Twist
As stated before, avro specifies the definition of a namespace like so:
A namespace is a dot-separated sequence of such names
Meaning each portion of a period seperated namespace must be a valid name, so a namespace like the following:
$.%.!.^
leading to a fullname like:
$.%.!.^.SomeName
should be disallowed, as it's parts are not themselves valid names
However, the java implementation doesn't appear to actually validate the namespace portion of a fullname, or namespaces in other contexts like type declarations, it happily compiles the following schema, for instance:
{"namespace": "tst.example",
"type": "record",
"name": "$.%.User",
"fields": [
{"name": "aName", "type": "string"}
]
}
Where the namespace portion of the fullname for the top level record contains some characters that don't meet the definition of name. So there's that.
Goavro Current State
Currently (as of https://github.com/linkedin/goavro/tree/ad9aa16a0ebf65af62bf3a49881a0c63340e6caa) goavro seems to have a broken attempt to handle fullnames in record definitions, complete with a failing test!
I fixed the problem that is breaking that test in a commit here - https://github.com/mblinn/goavro/commit/9d6255a99eeefd4fb7d40c9b2614dadc1be66ab6, (as well as another failing test in the previous commit). When I did so, I made it work the way I think the Java version works (ie: without doing any validation on the namespace portion of the fullname) rather than the way the Avro spec says it should work.
This doesn't attempt to handle validation of field names, which should also be validated to be names (but not fullnames).