Concepts NameAndID - UBOdin/mimir GitHub Wiki
Ugh. Case sensitivity is a nightmare.
TL;DR
- NEVER use
String
for identifiers. - Use
mimir.algebra.ID
(case-sensitive) everywhere you can. - NEVER change the case on an
ID
(i.e., DO NOT use_.id.toUpperCase
or_.id.toLowerCase
or Oliver will be very upset). - Use
sparsity.Name
to talk to case-insensitive or variably-cased external interfaces, but normalize it to anID
before using it internaly.
The problem.
SQL is normally not case-sensitive, except when identifiers are quoted (e.g., "foo" or `foo`), in which case the identifier is case sensitive.
Data sources vary in their case sensitivity. For example Spark follows the SQL quoting model while Filesystems / URLs typically are case sensitive (unless you're on a Mac). Data sources also vary in their name standardization. For example, Spark downcases variable names, while many SQL implementations up-case them.
In short, in prior iterations of the Mimir code, we've had a mountain of bugs, hacks, and ugly workarounds dealing with case-sensitivity issues. These issues are now largely gone thanks to three rules.
- Never never never ever use
String
to store an identifier. - All identifiers in Mimir's internals are case-sensitive. You acknowledge this contract by wrapping all identifiers in
mimir.algebra.ID
. - The interfaces between Mimir and the outside world may need case-insensitive identifiers.
sparsity.Name
is used for this purpose. If itsquoted
field is set, the name is case-sensitive. If not set, the name is case-insensitive.
Name
s should only be used at the interfaces between Mimir and the outside world and resolved into their case-sensitive form before use. An unquoted name should be matched against every candidate ID using equalsIgnoreCase
(or equivalent), and the first matching ID should replace it.
********** BEGIN Message from supreme high leader Oliver ***********
* I don't want to see *anywhere* in the code ANY of the following
* - [var].id.toUpperCase
* - [var].id.toLowerCase
* - [var].id.equalsIgnoreCase
* or anything along these lines. In fact, unless you have a
* particularly good reason to do so (several acceptable reasons
* listed below), you should NEVER access [var].id. Acceptable
* reasons include:
* - You're talking to a backend that is case sensitive (e.g. Spark)
* - You're printing debug information.
* - The id gets immediately wrapped in a StringPrimitive and shoved into the MetadataBackend.
* If you're talking to a mixed-case backend (e.g., GProM), ID
* values MUST be treated as quoted.
********** END Message from supreme high leader Oliver ***********