Plan: 0.5.0 Symbols - monadgroup/axiom GitHub Wiki

This is a planning document for the 0.5.0 Symbols feature. The described feature is in flux and probably will be different from the final implementation, but any feedback is welcome and will be used to further evolve the feature.

High-Level Description

Symbols are globally unique identifiers represented by some name. That is, across a whole Axiom project, two symbols with the same name are equal, but symbols with different names are not. Symbols are primarily used as descriptive names for arbitrary constants, such as a set of branched nodes that should be switched based on specific values (e.g. different types of oscillators).

From a user perspective then, symbols can be thought of as static strings - i.e character strings that can't be inspected at runtime (although some runtime operations might be useful in the future). The only operations defined for strings are plain equality (== and !=).

The feature is composed of several subcomponents:

A new opaque symbol type which can be created with a symbol literal (see Syntax below)
A new symbol wire type
Overloads for the == and != operators for comparing symbols
A basic :symbol control that displays the name of the symbol and allows connections
A :symbol[] extractor control
A :select rich control with the following display modes:
- "Dropdown" shows a preset list of symbol names. Selecting one changes the value of the control to that symbol
- "Radio" shows a list of radio buttons for a preset list of symbol names.
- "Knob" (name tbd) shows a regular knob that snaps at equal positions for each possible option in a preset list of symbols.

Syntax

In Maxim code, 'symbol literals' look like double-quoted string literals. Some examples:

myVar = "a symbol"
out:symbol = "nice"

"some symbol" == "some symbol" # true
"some symbol" == "another symbol" # false
"Some Symbol" == "some symbol" # false

Symbol literals support UTF8 and the standard valid escape sequences:

\n for newline
\r for carriage return
\t for tab
\\ for backslash
\" for double quote
\x7F for an 8-bit character code
\u{7FFF} for a Unicode character code

Note that symbol literals cannot contain null characters.

Internal Representation

From a language perspective, the actual name of a symbol can be lost after compile-time - all that needs to be known is if two symbols are equal. In the editor, however, it is beneficial to be able to map an internal symbol value back to the original name.

To support this, the runtime maintains a table of uniqued symbol names (meaning each name only appears once). When a symbol is referenced it is looked up in the table, and the pointer to the symbol name string is used as the symbol value. This allows the editor to simply use the pointer when displaying the symbol name.

In an exported project, there's no need to be able to determine the original name of a symbol (at least at the moment), so the text is never included. Instead, the exporter determines a unique index for each symbol.

Internally, symbols are stored as 64-bit unsigned integers to fit up to a 64-bit pointer in them. Both the storage representations mentioned above allow symbol comparison to simply be a number comparison operation, which is easily fast enough for what we need.