6.13 Create your own language with a lot of transpiling: in LispE - naver/lispe GitHub Wiki
Are you frustrated by the increasingly complex syntaxes of modern programming languages? Do you long for a straightforward, elegant language that aligns with your needs? There’s a solution: you can create your own programming language.
This might seem daunting at first. A review of existing literature reveals intimidating concepts like abstract syntax trees and BNF grammars, which can discourage even the most determined learners. You might abandon the idea, returning to familiar tools like Python—only to encounter errors, such as a variable switching from a list to an integer unexpectedly.
However, persistence pays off. In computer science, ideas can often be realized through practical implementation. This article presents a viable approach to designing your own language using a specific toolset.
We’ll use LispE, a distinctive variant of Lisp hosted on Naver’s GitHub. Naver, a major Korean company, provides a wide range of internet services, including search engines and e-commerce platforms. Unlike traditional Lisp implementations that rely on linked lists, LispE uses arrays as its core data structure. This offers advantages, such as efficient indexed access to list elements. While linked lists are supported, they are not central to LispE’s design.
This article is hosted in the wiki of the referenced GitHub repository.
Important: Pre-compiled versions for Windows and Mac (including M1 and Intel) are available here. For Linux, refer to the compiling on Linux page.
Let’s explore how to create a custom language with LispE.
If you’re still reading, you’re likely motivated to learn more. LispE provides the tools needed to build a powerful yet simple language. Here’s an example of what it might look like:
function fact(a)
if a <> 1 then
a * fact(a-1)
then
1
endif
endfunction
a = fact(10)
println a
This syntax avoids unnecessary punctuation like semicolons or excessive parentheses, offering a clean and efficient structure. You could design an alternative syntax, but starting with a simple approach is practical and helps avoid unnecessary complexity.
For a complete example, see Example. It includes instructions to execute the Basic example stored in the code variable directly in LispE.
A grammar formally defines a language using mathematical notation. It also follows its own conventions, which we’ll now examine.
Computer science relies heavily on such conventions, established through rigorous development over time. Consider the instruction A = 10
. We can define a rule to describe it:
assignment := variable %= number
Here, variable identifies a variable name, and number recognizes a numeric value. The %
escapes the =
operator. Since variables can hold more than numbers, we use the disjunction operator (^
) to list possible types:
assignment := variable %= [calculus^variable^number^string]
The [..]
brackets group the disjunction of four options. number, string, and variable are defined externally during transpilation, a process we’ll use extensively.
For calculus, we define two rules:
operator := [%< %<]^[%> %>]^[%^ %^]^[%* %*]^%&^%|^%+^%-^%*^%/^%%^%^
calculus := [number^string] [operator [number^string]]+
Transpilation converts code from one language to another—for example, transforming Java into C++. In this case, we’ll convert our grammar into a LispE program. Each rule becomes a function, with disjunctions as or and conjunctions as and. Consider:
assignment := variable %= [calculus^variable^number^string]
operator := [%< %<]^[%> %>]^[%^ %^]^[%* %*]^%&^%|^%+^%-^%*^%/^%%^%^
calculus := [number^string] [operator [number^string]]+
The assignment rule translates to this LispE function:
(defun C_assignment (tokens i0 v)
(check (< (car i0) (size tokens))
(setq v0 ())
(if (and
(setq i1 (clone i0))
(setq v1 ())
(C_variable tokens i1 v1)
(compare tokens "=" i1 v1 nil)
(or
(C_calculus tokens i1 v1)
(C_variable tokens i1 v1)
(C_number tokens i1 v1)
(C_string tokens i1 v1)
)
(set@ i0 0 (car i1))
(setq v0 v1)
)
(push v (cons 'assignment v0))
)
)
)
The C_
prefix avoids conflicts with native LispE commands. Functions like C_variable and C_number are defined externally, allowing precise control over parsing.
This process requires a list of tokens, created through tokenization—splitting a string into meaningful units. For example:
"A = 10" --> ("A" "=" "10")
LispE provides tokenize_rules
for this, using customizable rules (see rules.lisp):
(setq tok (tokenizer_rules))
(setq tokens (tokenize_rules tok "A=10")) ; ("A" "=" "10")
Applying our rules:
(setq tok (tokenizer_rules))
(setq tokens (tokenize_rules tok "A=10"))
(setq v ())
(C_assignment tokens '(0) v)
The result: v
contains (assignment (variable "A") (number 10))
, an Abstract Syntax Tree (AST). To demonstrate further, let’s parse if a < 10 then a = a + 1 endif
using the Basic grammar (Basic):
(setq tokens (tokenize_rules tok "if a < 10 then a = a + 1 endif"))
(setq v ())
(C_analysis tokens '(0) v)
; v: (if (comparison (variable "a") (comparator "<") (anumber 10)) (then (declaration (variable "a") (computing (variable "a") (operator "+") (anumber 1)))))
The grammar dictates the structure, setting the stage for translation into LispE code.
The grammar-to-code transpiler is here: compiler. It processes the grammar to produce basic.lisp. Each grammar line becomes a LispE function, with external definitions for token detection (e.g., C_anumber).
We use pattern programming to traverse tokens, converting rules into (and...)
structures and disjunctions into (or...)
. LispE allows variable initialization within these constructs, simplifying generation. Square brackets [..]
add (and...)
layers for sequence accuracy (see comparator). Kleene operators (?
, +
, *
) become specialized functions (O_
, P_
, S_
). Once compiled, basic.lisp is reusable.
Here’s the AST for the fact function in LispE:
(function
"fact"
(variables (variable "a"))
(if (comparison
(variable "a")
(comparator "<" ">")
(anumber 1)
)
(then
(computing
(variable "a")
(operator "*")
(call
"fact"
(computing
(variable "a")
(operator "-")
(anumber 1)
)
)
)
)
(else (anumber 1))
)
)
This tree mirrors Lisp’s structure, with prefix notation and nested parentheses. This resemblance stems from Lisp’s unique trait: its code is inherently an AST. Unlike other languages that require complex parsing, Lisp lets you write the tree directly, making compilation straightforward.
The AST is ready, but we need LispE-executable code. For A = 10 + 20
, we aim for (setq A (+ 10 20))
. For:
function fact(a)
if a <> 1 then
a * fact(a-1)
then
1
endif
endfunction
We target:
(defun fact(a)
(if (neq a 1)
(* a (fact (- a 1)))
1
)
)
Pattern programming is essential here. The AST’s sublists begin with keywords (e.g., computing, variable), set by basic.lisp via lines like:
(push v (cons 'assignment v0))
Each rule produces a concise tree, omitting redundant elements (e.g., =
in !assignment := variable %= number
). The transpiler, tailored to the grammar, is here: transpiler. It uses defpat to define parsing functions, such as:
(defpat parsing ( ['assignment $ d] )...)
(defpat parsing ( ['function name parameters $ code] )...)
The $
captures remaining list elements, and the first keyword enables efficient indexing.
See how Basic becomes LispE in Example. The transpile function (transpile) processes the code variable:
(defun transpile (code)
(setq tree (abstract_tree code))
(setq code '(__root__))
(push code '(trace true))
(ife (in (car tree) "Error")
(setq code (list 'println (join tree " ")))
(loop line (cdr tree)
(setq c (parsing line))
(push code c)
)
)
code
)
abstract_tree builds the AST, and parsing transforms it. For fact, the function keyword triggers:
(defpat parsing ( ['function name parameters $ code] )
(setq code (maplist 'parsing code false))
(nconcn (list 'defun (atom name) (map 'parsing (cdr parameters)) code))
)
Recursive parsing ensures consistency across the tree.
The real code handles complexities like dim complicated[5][5][5]
(flattened arrays with atshape) and Data 10,20,30, "A", "B" EndData
for list creation. Explore examples in example.
Grammars struggle with indented languages like Python due to spacing conventions. A workaround adds markers, as in indentation_python.lisp, yielding:
def Loss(a, X,Y):
s = []
for x in X:
s.append(sum([w * e for (w,e) in zip(a,x)])
end#
return sum([(yy - e)**2 for (yy,e) in zip(Y,s)])
end#
This enables Python-to-LispE transpilation, though the grammar is left as an exercise.
This article outlines the steps to create a programming language, using LispE due to the Lisp family’s suitability for such tasks. Many historical languages began as Lisp implementations. Modify the grammar slightly, and you can craft variants like basicois, a French Basic (see basicois_example.lisp), using the compiler.
LispE offers robust features (see help), supporting modern programming paradigms. While Lisp’s syntax may challenge some, custom formalisms can unlock its potential for tailored language design.