Number Literals - 7ombie/phantasm GitHub Wiki

PHANTASM uses a unique syntax for expressing numbers. The abstract grammar looks like this (noting that this is a token grammar, so it cannot contain any whitespace):

[sign] mantissa [operator exponent]

Note: The terms mantissa and exponent are only being used to describe the two main parts of a number literal: The required part, and an optional part (that modifies the magnitude of the required part).

The Mantissa

The mantissa can be defined in one of four ways:

  • Decimal Integer: One or more decimal digits (eg: 0 or 153).
  • Decimal Float: One or more decimal digits, followed by a dot, then one or more decimal digits (eg: 0.0 or 1.53).
  • Hexadecimal Integer: A hash (#) character, followed by one or more, hexadecimal digits (eg: #10 or #7F).
  • Hexadecimal Float: A hash (#) character, followed by one or more hexadecimal digits, then a dot, then one or more hexadecimal digits (eg: #10.7F).

However the mantissa is expressed, it can optionally be peppered with underscore (_) separators, though each separator must have a digit on either side of it.

Lowercase hexadecimal digits are supported (though official documentation will always use uppercase hexadecimals).

The Sign

The sign is just an optional plus (+) or minus (-) character, with the usual implications.

Note: Hexadecimal notation is normally used to express (implicitly) positive integers, but an explicit sign is permitted. The sign is always the first character of a number literal, so it must proceed the hash character (eg: +#7F or -#80.EE).

The Operator & Exponent

No matter how a mantissa is expressed, regardless of any sign, the value can always be raised or lowered n orders of magnitude, using the exponentiation syntax, which (immediately) appends an operator and an exponent to the mantissa.

The operator is either a backslash (\), which is used to raise the value of the mantissa by n orders or magnitude, or a slash (/), which is used to lower the value. The exponent is always an implicitly positive integer. For example:

1\3         ; 1,000
1/3         ; 0.001
1.5\6       ; 1,500,000

The exponent (when present) is always expressed using the same base (decimal or hexadecimal) as the corresponding mantissa, so hexadecimal exponents do not require (or permit) another hash prefix. For example:

#FF\6       ; 4,278,190,080                 (#FF000000)
#1.F/2      ; 0.007568359375                (#0.01F)
#1.F/A      ; 1.7621459846850485e-12        (#0.0000000001F)

The actual logic of the operators is as follows (with the radix (10 or 16) implied by (the notation of) the mantissa):

mantissa\exponent => mantissa * (radix ** exponent)
mantissa/exponent => mantissa / (radix ** exponent)

Note: In practice, exponents will be relatively small numbers, but underscore separators are still grammatically valid.

Evaluation

While the grammar is different, like WAT, PHANTASM uses its number literals generally (all numbers, whether integers or floats, use the same literals), and the context determines how they are ultimately encoded.

PHANTASM uses two contexts for encoding number literals: An integer context and a float context. For example, even though i32 1 and f32 1 use the exact same number literal, the result must be encoded differently.

Before the value of any literal can be encoded, the string first needs to be evaluated to a numeric type, and any exponentiation applied.

When evaluating floats, the mantissa and any exponent are always converted to 64-bit floating point values, and any exponentiation is computed using 64-bit floating-point arithmetic.

When evaluating integers, the mantissa and any exponent are usually converted to signed, unbounded integers, with exponentiation using integer arithmetic. This is especially significant when lowering the magnitude (using the / operator), as it implies integer division.

To make number literals as flexible as possible, an exception is made when an integer is expressed using floating point notation (eg: i32 1.5\6). In that case, the 64-bit floating-point logic is used, and the result is converted to an integer (rounding towards the nearest whole number, and always rounding up when the fractional part is .5).

Users must be mindful of the fundamental differences between ints and floats (regarding ranges and rounding et cetera), particularly when lowering by an exponent, or when using floating-point logic to define integers greater than about nine quadrillion (9\15).

Encoding

When an evaluation happens in a float context, the result is a 64-bit float, which can be directly encoded to an f64, and easily cast to the nearest f32, as required. The encoding is IEEE 754.

In a integer context, the result is always a signed integer. The width of the required data type, and whether or not it is signed or unsigned, together imply a valid range. If the result of the evaluation is outside of that range, the compilation process is aborted with an error message. Otherwise the result is encoded using the appropriate LEB128 encoding scheme (signed or unsigned).