Strings - Spicery/Nutmeg GitHub Wiki

Literal (Short) Strings

Literal strings in Nutmeg can be single-quoted or double-quoted, following the convention established by languages such as Python and JavaScript. Although this redundant syntax is somewhat wasteful, it does make strings that involve quotes easier to read. Literal strings correspond to const (fully immutable) objects at runtime. Strings can be arbitrarily long but occupy a single line (no unescaped newlines or returns).

The backslash (\) character is used to escape characters that would otherwise be awkward to write. Nutmeg includes four escaping conventions: single character escapes, Unicode escapes, HTML character entity escapes, and HTML numerical character escapes.

Short Escapes

Nutmeg supports the following single character escapes, which will be familiar from languages like C, JavaScript, C# etc

  • \\, \" and \' the basic escapes for backslash, double-quote and single-quote marks
  • \n, newline
  • \r, carriage return
  • \s, space
  • \t, tab
  • \v, vertical tab
  • \uXXXX where Xs stand for a hex-digits (regex [0-9A-F]), unicode codepoint

HTML Character Entity Reference Escapes

Nutmeg allows the standard HTML character entity references. These start with the escape sequence \&, followed by a character entity ref such as 'copy' or 'diamond', and is closed by a semi-colon (;). Examples:

  • \©, the copyright symbol ©.
  • \♠, the spade symbol familiar from playing cards ♠.

HTML Numerical Character Reference Escapes

Similarly Nutmeg also supports the HTML numerical character references. These begin \&# and are terminated with ;.

  • \★, the star symbol ★
  • \→, the right arrow →

Raw Strings [Planned feature]

It is occasionally necessary to write strings that contain one or more backslashes and having to escape them can rapidly become tedious and unreadable. Raw strings have no escape mechanism, they are simply terminated at the next matching quote-mark. They are strings introduced by a backslashed quote mark.

  • \"This is a raw string in which backslashes have no special meaning."

Long Strings [Planned feature]

Long-strings are used to include a string that consists of several of lines. They are loosely modelled after the Java long-string proposal and are indentation sensitive.

  • Long strings are started and finished by triple-quotes (either single-or-double) that occupy a line on its own. (Triple quotes that are opened and closed on the same line are allowed but do not count as long strings.)
  • The indentation of the opening triple and the closing triple must match. In this context, indentation means a sequence of whitespace characters and they match if they are an identical sequence (e.g. space-space-space-space does not match tab even if the indentation level is 4).
  • The lines between the opening and closing marks must start with matching indentation as the opening/closing marks. This indentation is removed from the front of the lines. If they have a different indentation it is a tokenisation error. The quoted lines may have additional indentation, of course.
  • Trailing spaces must be escaped or will be automatically trimmed.
  • Escapes are introduced by backslash and work as normal.
  • Within the opening/closing triple quotes any occurrences of the triple quotes must be escaped.
  • To make a long string 'raw', prefix the triple quote with a backslash.
### The lines of this poem will not start with whitespace.
poem :=
    """
    Mary had a little lamb
    It's fleece was white as snow
    And everywhere that Mary went
    That lamb was sure to go
    """

Stylistic Guidelines

The choice of single or double quotes is often arbitrary and that can make code look a bit messy. So, in the interests of stylistic consistency, we recommend that projects consistently prefer one choice rather when all things are equal. And, for what it is worth, we prefer single-quotes over double-quotes as we think they make the page less cluttered - but you are free to ignore this. We do not endorse any stylesheet or guideline that cramps your style as that would infringe the spirit of Nutmeg.

In particular you are entirely free to invent your own stylistic guidelines e.g. double-quotes for text that users might see on the web page and single-quotes for text that only appears in a printed PDF.

Semantics

Strings can be thought of as vectors (1D-arrays) of characters. The characters are considered to be part of the vector, which means that they cannot be shared. When you access a character you get back an immutable snapshot. Like all built-in types, they come in different favours of mutability: mutable strings, limited strings and const strings.

  • Const strings are vectors of characters, which means they can't change at all. Sealing has no effect.
  • Limited strings allow you to update their parts but not add or remove existing parts. Sealing a limited string prevents the characters being updated. Limited strings are almost as efficient as const strings and often provide all the flexibility you need.
  • Mutable strings are the most general form of string. They are mutable, sealable vectors of owned-characters, which means:
    • You can add, remove and update characters.
    • Because the characters are owned, you cannot get access to the character-object. When you access the character you get an const version of the character, which may or may not be a copy (the terminology for this is snapshot). When you update a character at a position with a new character C, a snapshot is taken first.
    • Once it is sealed no further changes are allowed and the string may be treated as const.

Literal strings, like all literals in Nutmeg, are const. Not only are they fully immutable but equal literal strings are typically the same object.

See Also