Vellum Manual - mr-martian/quest-editor GitHub Wiki
Introduction
Vellum is best described as being a procedural language that feels very much like a functional language, while technically being neither. Like most modern languages, it is multi-paradigm, albeit with an intentional neglect of object-orientation at the language level. It blends elements from the
- procedural
- functional
- concatenative
- generic
programming paradigms to make (what we hope is) a rich and productive programming experience.
Intro
Structured syntax
The main way Vellum is intended to be used is via the structured syntax. It provides a rich and familiar programming experience, which should be easy to approach for most programmers. The structured syntax can represent anything that can be expressed in an applicative style, so it is possible to use Vellum entirely using the structured syntax, treating it as if it were just an ordinary procedural language with functional aspects, though with more familiarity with the language, it is expected that Substrate will be used when its properties are useful.
UFCS
Vellum has uniform function call syntax. That is, f(a)
and a.f()
can call the same function. These rules also subsume some of the functionality of argument-dependent lookup (ADL) in C++. Specifically, given let a : A
, and an unqualified function name f
, the rules are as follows: [Todo: formalize these rules better.]
- For the expression
f(a)
, look upf
in the current scope, and then attempt to type-match it with eithera
or&a
as the argument, in that order. If no function can be bound, look upf
in the scope ofA
, and repeat type matching. - For the expression
a.f()
, look upf
in the scope ofA
, and then attempt to type-match it with eithera
,&a
, ora^
as the argument, in that order. If no function can be bound, look upf
in the current scope, ignoring variable names, and repeat type-matching. - For the expression
a.f
, look upf
in the scope ofA
. If that lookup finds a function callable asf(a)
, but does not type-check, then rewrite the expression toa.f()
. - In the case of
a.f(b)
, for any number of parameters in place ofb
, if lookup ofa.f(b)
fails, but finds a candidate function in the scope ofA
such thata.f()
would succeed, rewrite the expression toa.f()(b)
and retry, but do not perform this rewrite more than once.
The latter two rules allow leaving off empty parentheses for function calls with .
where it is unambiguous. [Note: These rules are very complex and will probably be altered to be simpler and more intuitive.] The rules in this section are applied after any rewriting of overloaded operator expressions into ordinary function calls, or other similar rewrites that produce unqualified names.
Substrate syntax
The computational model underlying Vellum is very different from most languages you may be familiar with. Instead of the usual imperative paradigm, or the somewhat less common functional one, Vellum is concatenative. Additionally, it provides direct access to that low-level model via the sub-language named Substrate, so named because it is the foundation the rest of the language is built on. A primer on concatenative programming from a general background is included later in this manual; this section exists to describe Vellum's implementation of the concept.
Substrate expressions are, syntactically, extremely simple. There is only one primary syntactic primitive, the term. Terms include literals, variables, function names, certain keywords, variable bindings and unbindings, and quotations, which are similar to lambdas; though this is not an exhaustive list. A substrate expression is simply a list of terms. Almost all of the syntactic complexity of structured Vellum is completely absent in Substrate.
However, and perhaps uniquely among concatenative languages, Vellum has function overloading, and particularly, variadic functions; this requires a more sophisticated way to name things than just identifiers, so Vellum introduces discriminators, which are parts of terms rather than being terms in themselves. There are three kinds of discriminator, two of which, the scope resolution discriminator (ex: scope::id
) and the signature discriminator (ex: (fn(T, U) -> R) id
), are useful in structured Vellum as well, while the overload disambiguation discriminator (ex: id!3
) is almost entirely useless outside of Substrate because its meaning is redundant with the information encoded by the syntax.
Unlike most concatenative languages, Substrate is statically-typed and statically-bound, which each impose certain restrictions compared to languages such as Forth. Instead of being described by a stack, the Substrate execution environment is instead described by a "notional tuple" and the set of names in scope. Every term has an environment effect, which is a description of its affects on the notional tuple, and a lexical effect, which is its affect on the set of names in scope. Because Vellum uses lexical scoping and static binding, lexical effects are generally uncommon and obvious, being essentially limited to variable bindings and function/procedure definitions, and so need little discussion.
In languages such as Forth, it is possible to redefine any term at all at any time, including literals, so that for instance, the term 4
actually pushes the value 5
to the stack. Vellum has static binding of names, and only allows valid identifiers to be defined (no keywords or literals), which makes such redefinition impossible. This is considered to be no loss at all, but rather an improvement, because there is no valid use-case for such tricks.
Static typing is inconsistent with dynamic environment effects, which means that every term must consume and output a fixed number of values, which must have statically-resolvable types. Usually, the compiler is able to keep track of these effects without any annotation, however, sometimes it is necessary to use signature discriminators to explicitly declare stack effects. If such an annotation disagrees with what the compiler can determine, the program is ill-formed.
The overload disambiguation discriminator is used to define the number of inputs an overloaded function name should consume. When not explicitly disambiguated, function overload resolution in Substrate works in a different, and sometimes unwanted, way, where (with the exception of certain operators), given a notional tuple such as ... w x y z
and a term such as n::f
identifying an overloaded function, will attempt to bind n::f(z)
, n::f(y, z)
, n::f(x, y, z)
, n::f(w, x, y, z)
, ..., in turn, binding the first function which successfully type-checks. If n::f(z)
is valid, but n::f(x, y, z)
was intended, the programmer would have to write n::f!3
instead, forcing the compiler to pass exactly 3 arguments, no more and no less, and if the call cannot be bound, the program is ill-formed. Additionally, if an overloaded function takes many arguments but has forms taking fewer, even if the type checking is fully unambiguous, having to consider so many bindings could be slow for the compiler, so it is a good idea to specify for any overloaded function.
The operators which are an exception to the usual rules for order of binding are those with both binary and unary forms, such as +
and &
. Pending a case-by-case review, such operators are tried with two operands, and terms such as +!1
must be used to invoke the unary forms.
Additionally, instead of using ()
to invoke callables, the call
term is used. TODO: define how call
works (the semantics assumed in loc_stats.vll may not be sound.)
Substrate flow control
In Substrate, several flow control keywords work in different ways to how they work in structured Vellum.
if
takes as inputs (... condition true-function false-function)
, plus whatever the environment effects of the branches are (they must be compatible). There is no else
for if
, it always takes exactly those inputs. unless
has the same caveats as if
does. (? :)
(space required) is exactly the same as if
. (?:)
and (??)
both take (... input else-function)
, where the first argument is a simple value and the latter is invokable, but otherwise work exactly the same as you would expect from the structured syntax.
TODO: match
Though they are not strictly control flow operators, the logical AND and OR operators do short-circuit in structured Vellum. TODO.
TODO: Do loops even exist in Substrate?
For result expressions, the order is changed from 'result' [label] [expression]
to expression 'result'[label]
(there can be no space between result
and the :
of the label), and the same for the other related keywords. The equivalent of structured nullary result
is None result
. However, in general, result-expressions are not very natural within Substrate expressions.
Definitions
Types
This section describes the Vellum type system.
Builtin types
These types are built into the language. All compound types are ultimately composed of some combination of builtin types.
Integers
Like many languages, Vellum has two families of integer types: signed and unsigned. Each can be declared to have any bit width. The signed types are spelled iN
or int(N)
and the unsigned types are spelled unsigned(N)
or uN
, for any values of N
between 1 and an implementation-defined limit of no less than 65,536 (2^16) and no greater than 2147483647 (2^31-1). (i0
and u0
are valid, but are aliases of None
and are not integer types. Attempting to specify a width greater than 2147483647 is a syntax error.)
The int(N)
and unsigned(N)
types are, in fact, builtin templated struct
s defined in the compiler. This means that they fit in very naturally with the rest of Vellum's type system.
Additionally, the type iSize
is defined to be an alias to the signed integer type with the width of a pointer on the machine. (There is no complementary uSize
, as it is not generally needed in Vellum, but there is a type abi::size_t
provided by the library, intended for interfacing with C and C++. Several other ABI-compatibility integer types are also defined in the library. For more details, see the ABI/FFI section.)
Integer types are also used to represent string code units, often ambiguously called "characters". Encoding is specified externally. For ABI compatibility with C, the type abi::wchar_t
is defined as an alias to an implementation-defined integer type. The types abi::char
, abi::char8_t
, abi::char16_t
, and abi::char32_t
are also defined as aliases to the obvious corresponding integer types.
Floating-point Numbers
Vellum has two binary floating-point types, Float32
and Float64
. These are defined to be IEEE754 single and double precision, respectively, and generally correspond to C's float
and double
, respectively. (There is also abi::long_double
, which is an implementation-defined type compatible with the C long double
type on the platform.)
Unlike the integer types, the floating-point types are not part of a larger generalized family. There is an underlying template for the purposes of genericity, but its argument is an enum
, defined as follows:
enum vll::Float_Tag {
single,
double,
long_double,
// implementation-defined
}
struct Float(f : vll::Float_Tag);
Bool
The Bool
type is in fact a builtin enum
type, exactly equivalent to:
enum(u1) Bool #[exhaustive, castable] {
false = 0,
true = 1,
}
let const false = Bool::false;
let const true = Bool::true;
Except that Bool
, true
, and false
are all keywords and thus this code would not be valid Vellum. (Note however that despite having a literal following the ::
, which is normally invalid syntax, Bool::true
is a valid construct which is identical to the unqualified literal true
.)
That is, Bool
is an enumeration of two values which is explicitly convertible to and from an underlying one-bit unsigned integer type and, in fact, explicitly or contextually convertible from any integer type at all, and from optional or optional-semantic types as well.
Byte
The type Byte
is the type of raw memory.
References and pointers
Vellum has 3 reference type categories, which are spelled &T
(reference to T
), &mut T
(mutable reference to T
), and &const T
(reference to immutable T
). Both &mut T
and &const T
can be implicitly converted to &T
, but the reverse conversions are impossible. Additionally, an lvalue of type T
can be implicitly converted to &T
(or &const T
if valid) in certain situations, such as function calls, but not to &mut T
. The unary prefix &
operator may be used on an lvalue of type T
to obtain a &mut T
.
Vellum also distinguishes between references to scalar (non-array) types (&T
), array types of fixed bound (&[N]T
), and array types of unknown or dynamic bound (&[]T
), as well as (for multidimensional arrays) combinations of the latter two. For scalar references, the unary postfix ^
operator may be used on any kind of reference to obtain an lvalue of the same mutability. The .
operator may implicitly dereference a scalar pointer when looking up the name to bind. No other operations are defined.
For array references, the postfix []
operator can be used to index the referenced array and yield an lvalue referring to an element thereof. Additionally, pointer arithmetic may be performed on array references. No other operations are defined. The .
operator may be used on array references when the called function acts on the reference itself, but not when the called function acts on the element type. That is, pa.f
will never call pa[0].f
.
Vellum does not have null references. Any reference is guaranteed to refer to an object of the specified type. Additionally, dynamic array references keep track of the size of the referenced array. A regular reference may refer to a mutable object and may alias with either mutable or immutable references (but never both). Immutable references cannot refer to mutable objects and cannot alias with mutable references. This lack of aliasing allows for certain optimizations to be made. If nullable reference semantics are desired, use an optional reference instead. This specific case is optimized to avoid storing a discriminator in the union, instead using the underlying null pointer as a discriminating value.
For compatibility with unsafe languages such as C, raw pointer types are also defined. These may be qualified as mutable or immutable in the same way as for references. Note that *const T
(as opposed to *mut T
and *T
) has no direct equivalent in C or C++ and therefore cannot be used with those languages' ABIs.
References may be converted to the corresponding raw pointer type. For references to arrays of unknown bound, this strips the size. For references to arrays of known bound, two conversions exist, one preserving the bound and one dropping it (thus, &[N]T
is convertible to both *[N]T
and *T
). A raw pointer may be null. Raw pointers can be dereferenced with either the ^
or []
operators. Thus, *T
acts somewhat like an unchecked (?&T)|(?&[]T)
.
A raw pointer may be pattern-matched on to eliminate null; the other branch may be a non-optional reference of compatible mutability (not to an array of unknown bound, however, because the dynamic size information cannot be re-introduced).
Optional and Result
Denoted by the ?
type constructor (as in ?T
), the optional type is in fact a language-level alias for the union type T | None
. However, several builtin operators are defined for this specific union configuration.
The Result type, denoted by the !
type constructor, is an alias for T | Fail
.
Owner and Shared
The primary means for managing dynamic memory is with the owner(T)
and shared(T)
smart pointers. owner(T)
models unique ownership and may not be copied, only moved. shared(T)
models shared ownership via reference counting, and therefore allows copying. Generally it is uncommon to need explicit heap allocations, and when you do need them, owner(T)
is typically used at least an order of magnitude more often than shared(T)
is. Unlike references, smart pointers may be null.
The declaration
let mut x : owner(T) = f();
declares x
as a mutable owner of an immutable object. That is, x
may be mutated (such as to release the owned object and obtain a new one) but the owned object cannot be. In contrast,
let y : owner(mut T) = f();
declares y
as an immutable owner of a mutable object. That is, the T
instance may be mutated, but y
will always own exactly that object until its destruction.
None
The None
type and singleton is a builtin alias of the empty tuple ()
. It serves as the primary Unit type for Vellum, though it is possible to define others using struct
s. It is used to represent empty values and expressions with no meaningful result. It is also used as the replacement for void
return types in C.
Because there is only one value of type None
, the token None
refers to both the value and the type of that value.
In type theory, the Unit type is the analogue of the number 1, and acts as a "multiplicative constant" in that a product type (tuple) gains no extra values by including it. That is, (T, None)
is compatible in both directions with (T)
.
Noreturn
Noreturn
is the Bottom type in Vellum. It is a type with no values, aliasing the empty union
. It is impossible to instantiate a value of type Noreturn
and is used as a signal that an expression does not return a result to its caller. It is the type of flow control operators like result
and end
, which jump to a different part of the program, as well as functions such as abort
, which end execution of the program. As an empty union, Noreturn
is a supertype of (is conformable with in expressions) all other types, and cannot itself be a union alternative.
In type theory, the Bottom type is the analogue of the number 0, and acts as an "additive constant", in that a sum type (union) gains no extra values by including it. That is, T | Noreturn
is compatible in both directions with (and is in fact definitionally identical to) T
.
Type
Type
is a builtin alias for a union type with alternatives for struct
s, enum
s, tuple
s, union
s, arrays, references, pointers, and functions. It is used for reflection and metaprogramming, and is most commonly used as a parameter type for templates. Values of type Type
may be constructed at runtime, but no mutable operations can be performed on them; they are merely tokens which can be inspected to get such properties as the name of a type.
__intN
and __uintN
The __intN
and __uintN
type families are used as the lowest-level implementation types of integers within the compiler (for instance, __int32
is the type of the argument to the int(N)
template). There is almost no reason to use them directly, as they are less featured than the regular int(N)
and unsigned(N)
types. The full list of low-level integer types is exactly: __int8
, __int16
, __int32
, __int64
, __uint8
, __uint16
, __uint32
, and __uint64
. Each such type is an independent type; they are not instantiations of templates. No other bit widths of low-level integers are defined.
Enums
enum-definition ::= 'enum' enum-name '{' enumerator {',' enumerator} [','] '}'
enumerator ::= identifier ['=' expression]
An enum
is a type with enumerated, named values. They are similar to scoped enum
s in C++, but by default the conversion to integer type is one-way only. Several attributes are provided to extend the basic functionality.
The builtin Bool
type is in fact just an enum
type with two values. The only special properties it has are that its enumerators, true
and false
, are also keywords, and certain unique builtin operators are defined for it.
enum
types may be templated.
Tuples
Tuples, also known as product types in type theory, are the most basic way to aggregate multiple values together into a single object. A tuple is comprised of an ordered, fixed-size, heterogenously-typed sequence of anonymous subobjects. Tuples are implicitly created and unwrapped by the language in many contexts. Because the members are anonymous, a tuple type is exactly identified by its list of members. A tuple of tuples is conformable with a tuple formed from the concatenation of all of its member tuples. A tuple of a single element is conformable with that element's bare type.
The None
type is in fact an alias for the empty tuple. Tuples may contain the None
type, but it may be implicitly dropped or introduced by conversions, given that concatenation with an empty tuple is the identity function.
The primary tuple type constructor is the parenthesized comma-separated list. In contexts where commas and parentheses have special meaning, tuple types may also be constructed with the named tuple(Ts...)
type constructor. The empty tuple may be written as ()
, but None
is generally the preferred spelling.
Unions
Unions, also known as variants or as sum types in type theory, are used when a value may be of exactly one of several possible alternative types. Union alternatives are unordered and duplicates are ignored, which is to say that a union type is comprised of a set, rather than a list, of types.
The Noreturn
type is in fact an alias for the empty union.
The primary union type constructor is |
. When in generic variadic contexts, where binary operators are less ergonomic to write, the named union(Ts...)
type constructor may also be used.
Functions/closures
Functions and closure types are categorized along several axes. The most obvious one is the type signature, which is a combination of a tuple of argument types and a return type (which may be, but doesn't have to be, a tuple), separated by ->
. Every such combination is a unique function type signature. The normal tuple conformability rules do not apply to function arguments or return types; however, function type signatures that differ only in the insertion or removal of None
are still conformable.
Aside from the infinite set of type signatures, function types are divided by purity and by captures. A pure function type is denoted by the fn
keyword, while an impure function type is denoted by the proc
keyword. References to pure functions can be converted to references to impure functions. Functions may capture either a simple reference, which is denoted by the &
character before the argument list, or they may capture an arbitrary tuple of values, which are enclosed in braces before the argument list. A reference to non-capturing function can be converted to a reference to reference-capturing function.
Language linkage is part of the type of a function, and whether function types of different language linkage are convertible is implementation-defined.
No other conversions amongst function types are valid.
Function types are written as (described via a form resembling a regex) (fn|proc) (&|{Cs...})? (As...) -> R
, where As...
represents the arguments, R
represents the return type, and Cs
represents the value captures, if any. A reference to a function type is written by placing the &
before the fn
or proc
keyword. Function types may need to be parenthesized in order to include them in unions.
Arrays
Arrays are built into Vellum's type system. Both fixed-size and dynamic array types are provided, and multidimensional types may have any combination of bounds. Dynamic bounds are indicated by the lack of a number in the type-id. A fixed-size linear array is spelled [N]T
, while a dynamically-sized linear array is spelled []T
. A square array of size 10×10 is spelled [10, 10]T
, while a 2D array whose size is unknown in both dimensions is spelled [,]T
. Arrays of constant size are allocated on the stack by default, while arrays of dynamic bound are allocated automatically on the heap. This allocation is managed via RAII.
Structures
struct-declaration ::= 'struct' struct-name [argument-list] (';' | struct-body)
struct-body ::= '{' {struct-member} '}'
struct-member ::= data-member-definition
| [access-key] fn-declaration
| [access-key] default-fn-definition
| 'defer' trait-id 'to' identifier new-line
| 'implements' trait-id new-line
| alias-definition
data-member-definition ::= 'let' var-name ':' type-id access-specifier
access-specifier ::= '{' (access-key [',' access-key2] ['mut']) '}'
access-key ::= 'public' | access-key2
access-key2 ::= 'module' | 'private'
default-fn-definition ::= [access-key] 'fn' default-fn-id '=' 'default' ';'
default-fn-id ::= 'new' [argument-list] [type-constraint]
The most powerful way to define a new type in Vellum is the struct
.
This
Within the body of a struct
, the type This
refers to that struct. This allows for a more uniform and less noisy declaration syntax.
Type conversions
Generally, conversions between unrelated types is disallowed. However, certain conversions are deemed safe and can be performed implicitly. For instance, integer types can be implicitly converted to wider integer types, and any type may be implicitly converted to a union type containing it. Safe conversions are pure and cannot result in a loss of data. All other conversions are considered potentially-unsafe, and can only occur at defined locations in the program, including converting contexts and explicit casts. Certain operators apply the potentially-unsafe conversion to Bool
to some of their operands; in this instance the operator is said to be a converting context. This conversion is as-if a condition expression x
were replaced with x.cast(Bool)
. Function calls can apply the object-to-reference conversion, normally performed by the unary &
operator, to their arguments.
Explicit casts are done with the fn as(val, type)
and fn cast(val, type)
reserved functions. They are identical except that as
shall be safe, as described above, and may be considered for implicit type conversions. They simply convert their argument val
to the type type
, and can be defined by the user for struct
s. as
and cast
are somewhat similar to the new
reserved function, in that both result in a new object of a particular type. The difference is that as
and cast
are defined in the scope of the original type, while new
is defined in the scope of the target type. Type conversion functions should be pure; a warning is issued for declaring either function with the signature proc (val, type)
. It is an error if c.cast(Bool)
resolves to an impure function when called implicitly in a converting context.
Declarations
Variables
var-declaration ::= 'let' var-definition var-initializer ';'
| 'let' 'const' var-definition var-initializer ';'
| 'let' 'mut' var-definition [var-initializer] ';'
var-definition ::= var-name [':' type-constraint]
var-initializer ::= equal-initializer | braced-initializer
equal-initializer ::= '=' expression
braced-initializer ::= '{' expression-list '}'
(* no mutable global variables *)
global-var-declaration ::= 'let' var-definition var-initializer ';'
| 'let' 'const' var-definition var-initializer ';'
Variables are introduced by the let
keyword. By default, variables are constant, meaning they cannot be modified after declaration. Constant variables must be initialized, while uninitialized mutable variables must have their type fully specified. The mut
keyword can be used to declare a mutable variable, which can be modified or reassigned after declaration. The const
keyword is used to declare a variable whose value is computed at compile-time and is usable in further compile-time computation; this differs from the meaning of the const
reference qualifier. Variables at module or namespace scope may not be declared with the mut
qualifier.
Variables may also be introduced by function parameters, in such instances, the syntax is similar but the let
keyword is dropped, initializers correspond to default arguments, are not required for constant parameters, nor are type constraints required for mutable parameters without default arguments, because the initialization of function arguments is performed by the caller. In all other respects, function parameters behave as if they were variables declared in a hypothetical scope enclosing only their declarations in order followed by, and returning the result of, the body of the function.
Variables are bound to scopes and have lifetimes. The scope of a variable is the context in which its name is visible, which generally extends from immediately after its point of declaration until the end of the enclosing scope. The lifetime of a variable depends on where it is declared. At module or namespace scope, a variable's lifetime is that of the whole program. At struct
scope, a variable's lifetime is bound to that of an instance of the enclosing struct
. For more detail, see the section on structures.
At local scope, a variable's lifetime is from when its declaration is evaluated until the scope is exited. Local variables are destroyed promptly on exit, in reverse order of declaration (as if pushed to a stack), which enables a form of automatic resource management known as RAII, or Resource Acquisition Is Initialization.
Local variables, however, have the additional property that their lifetimes, as well as scopes, may be explicitly ended by the program at any point, using the consume
operator, as long as it is unconditional, meaning there can never be an expression in which whether a variable is in scope depends on which possible path control flow took to reach it, or the program is ill-formed.
This constraint is purely syntactic and does not take into account any reachability analysis the compiler may be able to perform. If a variable is consume
d in one branch of an if
statement, it must be consume
d in the other as well, and the same goes for match
. Only variables local to the body of a loop may be consume
d within it.
The consume
operator triggers either a destructive move, if its value is used, or else simple destruction. Unlike ordinary move operations, a variable does not need to be mutable to be consume
d, because the variable ceases to exist after destruction. Because variable scopes begin after their complete declaration, it is possible to write
let mut a : i32;
a := 1;
let a = consume a;
which has the effect of replacing the mutable variable a
with a constant variable also named a
, having the same value. At a higher level, it may be described as dropping the mutability of a
, and 'finalizing' its value. [Note: may consider replacing this feature with an explicit 'finalize' feature.]
Functions and closures
fn-declaration ::= prototype fn-body
| prototype ';'
fn-body ::= block | '=' expression ';' | '=' 'delete' [reason-expr] ';'
prototype ::= fn-prologue fn-name argument-list [return-spec]
anon-prototype ::= fn-prologue argument-list [return-spec]
return-spec ::= '->' type-constraint
fn-prologue ::= ['extern' [string-literal]] ('fun' | 'proc')
argument-list ::= '(' {arg-definition [default-arg]} ')'
arg-definition ::= arg-name [':' arg-type-constraint]
The basic unit of behavior in Vellum, as in all procedural languages, is the function/procedure. Vellum maintains a distinction in the type system between pure and impure functions. Pure functions are introduced by the fn
keyword, while impure functions, or 'procedures' in Vellum parlance, are introduced with the proc
keyword. Pure functions are free of side-effects and are referentially transparent, which means that they are very easy to reason about. Additionally, pure functions can be executed at compile-time, which enables very powerful and seamless metaprogramming. However, aside from the most trivial programs, at some point, some manner of stateful interaction with the environment must be performed for the program to do its job. Thus, procedures exist.
Signatures
A function signature represents its type.
fn(InType) -> OutType
proc(InType) -> OutType
It is relatively uncommon in Vellum to have to fully specify the types of all arguments to a function or procedure. Most functions, rather, are generic. This may sound problematic if you are familiar with most other statically typed programming languages, because generic functions are more difficult to write. However, in Vellum, the most common way to be generic is to simply write the simplest code that works. Type annotations are generally extra details in Vellum, usually not critical to understanding programs.
Linkage
Functions and procedures have language linkage. This is denoted by extern
plus any string constant. This is used for linking to code written in other languages. If unspecified, the default language linkage is extern "Vellum"
. extern "C"
and extern "C++"
are the most likely other values.
ABI
Vellum is a native language, and like all native languages, provides the ability to call or be called by programs written in any other native language. However, native interfaces are dominated by—nay, defined by—C and its particular strengths and weaknesses. Thus, no matter what is natural in any other language, they must support the data representations and the level of abstraction provided by C. Vellum is no different. The industry-standard way that C interoperability is achieved is ABI compatibility. Vellum additionally aims to target C++ ABI compatibility, in order to more efficiently communicate higher-level concepts such as ownership semantics. However, C++ is a much more complex target.
For this purpose, Vellum provides a number of ABI-compatibility features. Other than language linkage, most of these are in the forms of type definitions in the std::abi
namespace. Using these types, function signatures can be written that are compatible between languages.
Statements
Vellum is an expression-oriented language, which means that statements are a very minor part of the syntax. In fact, the only kinds of statements that are not also expressions are variable declarations and assignments.
Templates
Various constructs besides functions can also be parametrized; the mechanism Vellum provides for this is templates. struct
, enum
, and trait
declarations can all be templated.
Traits
Concatenative Macros
Reflection
Expressions
Primary expressions
Literals
Flow control
All control flow in Vellum is done with expressions, rather than statements. This is unusual for a procedural, structured language, but it is the standard for concatenative languages. Aside from having atypical and complex syntaxes, all control flow keywords are simply operators.
Conditionals
if-expression ::= ('if' | 'unless') expression block ['else' (block | if-expression)]
| expression '?' expression ':' expression
Binary-conditional expressions include if
-expressions and expressions using the operators ? :
, ?:
, and ??
. The latter two are infix binary operators. match
-expressions are also conditional expressions, but are described in a separate subsection for clarity.
if
-expressions and the ? :
operator work in essentially exactly the same way. The difference is that if
requires its branches to be brace-enclosed blocks, but its else
branch is optional, while ? :
takes three plain expressions but its else-equivalent is non-optional. if
is intended to be used for conditional execution of multiple lines of code, where the result is usually unused, while ? :
is intended to be used within expressions and its result is expected to always be used. If the else
block of an if
-expression is not present, the behavior is as if an empty block were present. unless
acts exactly as if not
.
The 'orelse' operator ?:
, also known colloquially as the 'elvis' operator, returns its left operand if it evaluates to true, or else its right operand. a ?: b
is similar in effect to a ? a : b
, except that a
is evaluated only once. It is an error if the conversion of its left operand to Bool
potentially-modifies that object.
The 'recovery' operator ??
acts as a type-level conditional operator. If the left operand's dynamic type is None
or Fail
, it evaluates and returns its right operand. Otherwise, it returns its left operand. It is similar to the "null-coalescing" operator found in many dynamic/GC languages, except that Vellum does not have universal nullability, rather it uses sum types to represent optionality. (TODO: figure out how to handle None
results which can fail)
match
match-expression ::= 'match' [label] expression '{'
match-rule { ',' match-rule} [',' [trivial-match]]
'}'
match-rule ::= match-specifier '->' (expression | block)
match-specifier ::= argument-list | case-specifier
(* can stack cases *)
case-specifier ::= 'case' primary-expression [case-specifier]
match
is like a generalization of switch
in many other languages. It can be used to inspect objects in complex ways, including matching on dynamic type or on values of members. Semantically, each match specifier is checked in order and the first matching one is evaluated. match
expressions may be labeled, which allows result
expressions to be bound to them by name, even in the case of nesting.
Loops
(* All loops can be labeled, allowing for named multi-level break *)
loop-expression ::=
(* See below for examples *)
'for' [label] for-loop-header block ['else' expression]
(* while and do-while loops:
while keepgoing() { go() }
do { go() } until stop();
*)
| loop-constraint [label] condition block ['else' expression]
| 'do' [label] block loop-constraint condition (';' | ['else' expression])
(* infinite loop:
loop {
go();
break if stop()
go();
}
'loop' loops have no else because they have no condition that can be false
*)
| 'loop' [label] block
for-loop-header ::=
(* iteration:
for i in range { print(i) }
*)
var-definition 'in' expression
(* counting:
//count 0 to 9
for i = 0 -> 10 { print(i) }
// count 10 to 1
for i = 10 ; -1 { print(i) }
// count -10 to 8 by 2s
for i = -10 -> 10 ; +2 { print(i) }
*)
| loop-var-init ['->' expression]
[';' ('+' | '-' | ) integer-literal]
(* counting from 0:
// count 0 to 9
for i -> 10 { print(i) }
*)
| ['mut'] var-definition
[';' ('+' | '-' | ) integer-literal]
(* generalized for loop:
for i = 10 while i > 1 ; i / 2 { print(i) }
for i = begin() while i != end() ; i.next() { print(i) }
*)
| loop-var-init loop-constraint
[';' expression]
loop-constraint ::= 'while'
(* equivalent to `while not` *)
| 'until'
loop-var-init ::= ['mut'] var-definition equal-initializer
| '_'
label ::= ':' identifier
| ':' ('for' | 'while' | 'do' | 'loop')
condition ::= expression
| 'let' var-definition equal-initializer
There are four basic types of loop expressions: loop
, while
, do
, and for
loops. Any kind of loop can be labeled, which allows continue
, end
, and result
-based expressions to be linked to specific loops despite nesting. The identifiers used for labels do not share a namespace with local variables, but may not be the same as any enclosing label or function/procedure name. Unlike in other cases, the last expression in a loop body is not used as the result of the loop, because it may be evaluated arbitrarily many times. Instead, the result of a loop expression can be provided via either the operand of a result
expression, or it is provided by evaluating an else
clause attached to the loop. If there is no else
clause, it is as if one simply containing None
were present.
There is a special kind of result
expression solely for loops, called break
. break
, by default, exits the nearest enclosing loop with result None
. A label, operand, or both may be provided, in which case it acts exactly the same as result
. The control flow expressions continue
and end
are unique to loops. continue
jumps directly to the end of the loop body, evaluating local destructors but nothing else. It may be labeled, but takes no operand. end
is similar to break
, but it jumps to the else
clause instead of to outside the loop. It can be labeled, but cannot take an operand. As normal, the result type of all of these operators is Noreturn
.
The simplest kind of loop is the loop
-loop, which simply repeatedly evaluates a block until explicitly exited by some means. Other than lacking an else
clause (because it would be unreachable), loop
loops are semantically defined by a direct translation to while true
loops.
while
(including until
) and do
loops are very similar, in that they repeatedly evaluate their body while a condition holds. while
loops evaluate the condition first, then the loop body, then repeat, while do
loops evaluate the loop body first, then the condition, then repeat. Thus, a do
loop will always evaluate its body at least once. If the loop-constraint of a while
or do
loop is while
, the body is evaluated if the condition evaluates to true
, while if the loop-constraint is until
, the body is evaluated if the condition evaluates to false
. In the event that the loop exits because of the condition, the loop's else
clause, if any, is executed.
for
loops are the highest-level form of loops, and have the distinguishing feature of binding an iteration variable. The most general form is generalized for loop, which binds an iteration variable, evaluates a condition (which may be constrained by either while
or until
), evaluates the loop body, and then rebinds the iteration variable to the result of an iteration-expression. If no iteration-expression is provided, the iteration variable is unchanged.
For while
loops, the condition can be either an expression or a variable declaration, in the latter case, the variable is initialized once and lives until the end of the loop's associated else
expression, and its value is used as the expression for the loop condition. For do
loops, which check their condition at the end of the body, as well as loop
loops, which do not check any condition, a declaration can be inserted before the body, but it is not implicitly used as the condition.
In the translation to substrate, while
, do
, and generalized for
loops are translated to tail-recursive immediately-invoked lambda expressions.
The simplest other form of for
loop is the counting for
loop, which iterates over a range of numbers until it reaches an upper bound. Semantically, it is defined by a straightforward translation from something like for i = init -> bound; + incr
to for i = init while i < bound ; i + incr
. If no increment is provided, it is assumed to be +1
if init <= bound
and -1
if init > bound
The other form is the range for loop, known in many languages as a "foreach" loop. It binds the loop variable to each element of a range in turn, where a range is any iterable type. Range for loops are semantically defined by a translation to generalized for loop which caches the end iterator and loops from the begin iterator to it, binding a local variable in the loop body to the result of dereferencing the iterator.
result
and yield
result-expression ::= 'result' [label] [expression]
| 'yield' [label] [expression]
| 'return' [expression]
| 'break' [label] [expression]
Result expressions are used to exit scopes prematurely, potentially carrying a value. All result expressions, including those with operands, have type Noreturn
. result
simply returns to an outer scope, while yield
suspends execution of the generator expression or coroutine it is within, allowing it to be resumed at a later point. return
is equivalent to result
except that it is restricted and can only return from the nearest enclosing function. break
is equivalent to result
except that by default it exits the nearest enclosing loop expression. result
with no label returns from the nearest enclosing labelable scope of any kind, that is, loops, match expressions, or functions/procedures. yield
suspends and returns from the nearest enclosing generator expression or coroutine. [N.B.: coroutines and generator expressions have yet to be operationally defined.]
[Experimental: A labeled result
can be used to exit from nested lambdas and reference-capturing local functions/procedures that meet certain restrictions, that can be summarized as maintaining a predictable stack/notional tuple state. Recursing using annotated #[call/cc]
calls is allowed, but not if the lambda or local function/procedure potentially recurses in any other way. This is used in the language to lower loops to substrate expressions, but probably should be strongly avoided in regular code due to breaking the expected execution model.]
Compound expressions
Arithmetic operators
operator | arity | position | meaning |
---|---|---|---|
+ |
2 | infix | addition |
+? |
2 | infix | checked addition |
+% |
2 | infix | wrapping addition |
+ |
1 | prefix | unary plus* |
- |
2 | infix | subtraction |
-% |
2 | infix | wrapping subtraction |
-? |
2 | infix | checked subtraction |
- |
1 | prefix | negation |
* |
2 | infix | multiplication |
/ |
2 | infix | division |
% |
2 | infix | remainder |
mod |
2 | infix | modulus |
* The unary plus operator is defined in the core language, for numeric operands, to return its operand unchanged. It exists for consistency with unary minus, better known as negation.
Bitwise operators
operator | arity | position | meaning |
---|---|---|---|
& |
2 | infix | bitwise AND |
| |
2 | infix | bitwise OR |
xor |
2 | infix | bitwise exclusive OR |
~ |
1 | prefix | bitwise complement |
<< |
2 | infix | left shift |
>> |
2 | infix | right shift |
Logical operators
operator | arity | position | meaning |
---|---|---|---|
and |
2 | infix | logical AND |
or |
2 | infix | logical OR |
not |
1 | prefix | logical complement |
The logical AND and OR operators short-circuit, as in most languages.
Relational Operators
operator | arity | position | meaning | Overloadable |
---|---|---|---|---|
== |
2 | infix | equal to | Yes |
!= |
2 | infix | not equal to | No |
<=> |
3 | infix | 3-way comparison* | Yes |
< |
2 | infix | less than | No |
> |
2 | infix | greater than | No |
<= |
2 | infix | less than or equal to | No |
>= |
2 | infix | greater than or equal to | No |
* Also known as the spaceship operator. It is used as a basis for the more usual comparison operators.
Assignment Operators
operator | arity | position | meaning |
---|---|---|---|
= |
2 | infix | initialization* |
:= |
2 | infix | assignment* |
+= |
2 | infix | increase |
+?= |
2 | infix | checked increase |
+%= |
2 | infix | wrapping increase |
-= |
2 | infix | decrease |
-?= |
2 | infix | checked decrease |
-%= |
2 | infix | wrapping decrease |
*= |
2 | infix | multiply |
/= |
2 | infix | divide |
%= |
2 | infix | remainder |
* Technically, the =
and :=
tokens are not operators, but rather syntactic elements. They cannot be used within expressions, but only in their dedicated syntactic constructs.
Indirection operators
Operator | arity | position | meaning |
---|---|---|---|
& |
1 | prefix | address-of |
. |
2 | postfix | member access |
^ |
1 | postfix | dereference* |
[] |
2+ | postfix | element access |
Contrasting with C's prefix *
, the Vellum dereference operator ^
is postfix, such that all pointer-consuming operators are uniformly on the right. Also contrasting with C, the .
operator will automatically* dereference pointers, rendering the C ->
operator redundant.
* Due to UFCS, the .
operator may find a function taking a pointer as an argument and thus it will not dereference the argument. This is usually the desired behavior anyway, but in rare instances it may be necessary to write p^.x
, which is actually the direct equivalent to ->
in C. The lack of need for parentheses make a dedicated operator for this case unhelpful.
Special Operators
Operator | arity | position | meaning |
---|---|---|---|
(: |
1 | prefix | begin substrate expression |
:) |
1 | postfix | end substrate expression |
:: |
2 | infix | scope resolution |
! |
2 | infix | overload disambiguation |
|> |
2 | infix | pipeline rewrite |
# |
1 | prefix | reflection |
@ |
1 | prefix | injection |
consume |
1 | prefix | destructive move |
drop |
1 | prefix | discard value |
?: |
2 | infix | orelse |
?? |
2 | infix | recovery |
The sequence (: ... :)
is an abbreviation of, and exactly identical in meaning to, the sequence substrate { ... }
.
::
and !
are discriminators, rather than being operators in the conventional sense, and have the same infix syntax even within substrate expressions. ::
is used to refer to a member of a specified scope. !
is used to disambiguate overloaded functions by determining the number of arguments to pass. This is typically useful only within substrate expressions, which otherwise pass the least number of arguments that they can to an overloaded function. The other kind of discriminator is to prefix the function name with its signature in parentheses, which is needed only when taking a reference to a function, and never when calling it.
\|>
is the pipeline rewrite operator.
#
and @
are the reflection and injection operators, respectively.
The consume
operator is described in the section on variables. The drop
operator is used to explicitly discard the value of an expression; this is analogous to casting to void
in languages such as C and C++, and in the structured syntax it is typically used only to suppress warnings. In substrate expressions, however, it is an essential low-level primitive which discards the rightmost value from the notional tuple.
?:
and ??
are conditional operators, described in detail in the section on control flow.
Custom operators
Custom operators may be declared by operator declarations, which assign a fixity and precedence to a particular custom operator.
Functions and Procedures
Vellum supports several mechanisms by which an object can participate in the environment.
- Functions, which may not have side effects.
- Procedures, which may have side effects.
- Overloaded data structures (not finalized)
Vellum has two syntaxes for handling callable functions and procedures:
- Structured Syntax
- Concatenative Syntax
A Primer for Concatenative Programming
The concatenative substrate of Vellum is what makes it somewhat unique, as concatenative programming is largely overlooked by most programmers, but nonetheless quite powerful. It is a useful statement to say it is the dual of Lisp, in that rather than everything being a list that may be reduced, in concatenative expressions, everything is a function composed of other functions.
Others have noted that concatenative programming is like an amplifier: bad or naive programmers produce unreadable write-once code, whereas good programmers can write brilliant, crystal clear programs. Be warned that concatenative syntax is seductive and easy to misuse and as such, we include a synaptic guide to avoid misunderstandings and avoidable mistakes.
Tenets of Concatenative Programming
Words and Vocabularies
Concatenative programming languages borrow much terminology from Forth, the first concatenative language, most notably, the terms word and vocabulary. A word is analogous to a function, and a vocabulary is analogous to a namespace. Forth didn't have explicit namespaces, using only the global namespace, but modern languages like Factor and Kitten do. The distinction between words and functions is at least somewhat useful because as will be explained later, functions in applicative languages are not equivalent to words in concatenative languages. We will hence borrow this terminology. Keep in mind that for all intents and purposes, a function is a word, but a word is not necessarily a function.
Point Free Style
Point Free programming is a style in which named variables are not used. Instead, data is passed from one function to another function, note necessarily given an identifier, like a variable. This is similar to the Unix pipeline, in which a program often provides text data to stdout, and the |
operator can pipe it in to the next program without the original data needing to be explicitly named, such as in the example of find . | grep "snakes and dogs"
.
Named variables are useful whenever a single invariant value appears multiple times in an expression, such as in mathematical formulas. It obfuscates the intent to use stack shuffling (described below) to avoid binding a name to a value to compute such things, and defining words for each value pollutes the environment. and so there are mechanisms to create scoped variable names. In general, however, concatenative expressions should not be used in such situations directly, as the infix notation of the main language is more than adequate for expressing those operations.
Stack Shuffling
Functions in most procedural languages make use of a stack to store temporary local variables. In concatenative programming, this is more true than ever (although the vocobulary is another story, analogous to the heap). Concatenative programming requires stack shuffling for the lowest level of abstraction. The operations in plain english are
- duplicate (the top item)
- swap (the top two items)
- rotate (the top three items i.e. item 3->1, 1->2, 2->3)
- drop (the top item)
- over (duplicate the second item, put it on top)
These words are the interface between the low level, literal machine operations, and the high level, (hopefully) crystal clear abstractions. It is common for mid-level words to use these to get data from the environment, but it is usually discouraged as they more often than not impair the readability and reasonability of the code.
Everything is a function
Some languages, like Forth, take this to an absurd extreme, allowing this as valid:
: 5 4 ;
Which redefines 5
as a word that puts 4
onto the stack. This is not a valuable feature in any language outside of experimental or esoteric programming or code obfuscators (whose value is questionable to begin with), and hence most modern concatenative languages do not allow this. Nonetheless, it highlights the notion that all expressions in a concatenative language, even literals, behave exactly like regular functions that consume no data and return themselves.
The Concatenative Monoid
A monoid in functional programming is a group with an identity value and a set that is closed under all operations. There must also be at least one operation that is commutative and associative, and every operation like this must have a corresponding inverse operation.
Concatenative programming takes the concept of a monoid to another extreme because all functions accept the working environment as an argument and return the new working environment as a result. This is referred to as Stack Polymorphism in stack based languages. Vellum's model includes a stack in much the same way that procedural languages use stack allocation, but it also takes advantage of registers where possible.
Quotations
Quotations are one of several methods for metaprogramming in Vellum (others including reflection, AST transformation, and templates). They can be thought of as anonymous words and create a higher level of indirection. Higher order functions (functions that accept functions as arguments) in applicative languages are analogous to words that accept quotations. Quotations, however, provide more power than lambdas or anonymous functions because they also allow for the creation of concatenative macros, i.e. transforming one or more simple functions into a more complex function at compile time.
Concatenative Best Practices
- Definitions should be no longer than 10 words in the vast majority of cases, and have very descriptive names.
- Definitions should always have a clear environment effect. Many concatenative languages are stack based and hence refer to this as the stack effect. Vellum uses a different model for better conceptual compatibility with the main applicative language and hence refers to this as the environment effect
- Quotations should not contain large sub-expressions, but preferably one or two clearly named words or literals.
- Avoid stack shuffling in mid-level abstraction words. One of the powers of Vellum is that you can blend applicative code with concatenative code. Feel free to create functions to let the compiler handle stack shuffling for you so you can focus (and let those reading your code focus) on the intent.
While this may be obvious, concatenative programming is not a silver bullet. It can make expressing some solutions easier or more clearly, but this usually requires a great deal of care to do well. The benefit is programs that are easier to read, but there is always the risk of writing incomprehensible code.
Further Reading
Will add more resources later.
Structured Syntax
Concatenative Syntax
Signatures
TBD