ASR Design - lcompilers/lpython Wiki
Abstract Semantic Representation (ASR) Design
How to Contribute
You can modify online at https://github.com/lcompilers/lpython/wiki/ASR-Design by pressing the "Edit" button, or you can modify locally using git and push:
git clone [email protected]:lcompilers/lpython.wiki.git cd lpython.wiki vi ASR-Design.md git commit -a git push origin master
Quick History and Motivation
The LFortran compiler was initially designed using AST (Abstract Syntax Tree) and annotating the tree with types and a symbol table. Then the AST + annotations were lowered to LLVM. The semantic analysis code that took AST and appended type annotations and symbol table, together with lowering to LLVM was very complicated, because it was not clear what exactly the output of the semantic analysis should be, and if a bug is in the AST+annotations lowering stage, or in the semantic analysis itself (or both). The code was getting complex and hard to develop. We searched for a solution and came up with an intermediate representation that we call ASR (Abstract Semantic Representation). The semantic analysis takes AST and returns ASR. The LLVM backend takes ASR and creates LLVM. Adding ASR to LFortran cleaned up the code and set firm boundaries what the input and output of each stage of the compiler is.
The previous paragraph is the basic motivation for ASR, but there is still a lot of freedom left in the details how the ASR should look like. This document systematically discusses the ASR design, and provides documentation how to modify it or add to it as the needs of LCompilers evolve.
A simple intuitive definition of ASR:
- We can think of ASR as the simplest core symbolic representation of the surface language. All the cruft is abstracted away, but all the important things stay, nothing was lost. We can then use ASR to manipulate the language. To think of the language, we can simply think of ASR. ASR was designed to be equivalent to the language. At the same time, it can immediately be lowered to machine code (and obviously anything higher level like LLVM), everythig is simple and clear how to do so.
To achieve this, we need to go deeper.
There are two main requirements:
We want ASR to be a standalone representation, as far from the original surface language (Python, Fortran, ...) as possible, without losing any semantic information, so that it is always possible to convert ASR back to the original language, but we do not need to deal with often complicated surface level semantics (such as scope rules, what type is 1/2, and so on).
We want ASR to have all the semantics figured out locally, so that that the backends (LLVM, WebAssembly, etc.) can generate code immediately by visiting each node without having to do extra passes over ASR to figure things out. Everything must be accessible either directly in the node, or via a few pointer dereferences (for example it is ok to go from an ASR node to its parent symbol table, then up to the module symbol table, then to the ASR node of the module, to determine its name; as long as it is just pointer hops; it would not be ok to loop over ASR to examine each module to find which one contains the given node).
In addition to the above, there are several other requirements that guide the detailed structure of ASR, such as:
In Debug mode we want quick compilation. So we want each language feature to be faithfully represented close to the original form and not do any kind of complicated transformations, so that we can quickly generate LLVM or other code. In particular, in theory, just source code -> AST -> ASR -> LLVM passes are needed. We should not need to do any kind of rewriting of ASR after it is constructed. Everything should be in the form that we can immediately lower to LLVM (and other representations).
So as an example of the prior point: in Fortran (and possibly later in Python too) to determine the type such as
real(dp)we need to know the compile time value of
dp. So we always force the frontend to provide a compile time value of any expression if it exists and store it in ASR. Then we can quickly determine the value of
dpwhen it is encountered and determine the type. But we do not want to be doing any more than that, for performance reasons in Debug mode.
In Release mode we want the best optimizations, represented as ASR -> ASR passes, so we want to be able to represent all high level optimizations in ASR. We should eventually do function inlining, code rewriting, dead code elimination, loop unrolling, moving things around, promote allocatable arrays to stack arrays, etc. The optimizations left for LLVM: promoting variables to registers, some simple cleanup of generated LLVM code (like "i + 1 - 1"), and any low level optimization that LLVM can do, as long as it plays nicely with ours. We should generate such LLVM code so that LLVM can finish the job.
We want a verify() pass that we run after we construct ASR and after each ASR -> ASR pass to check that ASR conforms to our semantic requirements for it.
We want any valid ASR (i.e., ASR that passes the verify() check) to always be possible to lower it to LLVM. In general all backends should always generate correct code for any ASR that passes verify(). The only possible error a backend can say is "Not implemented yet". In practice, a given backend might choose to call some ASR passes to rewrite some ASR nodes with some other ones (for example the LLVM backend calls an ASR pass to rewrite for loops to while loops, so that we only have to implement and maintain while loops in the LLVM backend), but that is just an implementation detail of the backend, from the user perspective the LLVM backend can accept any valid ASR code.
We do not prescribe how the given ASR node / feature is implemented in the backend. We are not tied to any ABI (Application Binary Interface). The backend can choose any representation it wants. Some nodes have a flag like
abi = BindC, in which case we are requiring the node to be ABI compatible with C; this can then be used for C interoperability. We can add more abi, like GFortran, Python, etc.
Symbol table: each symbol (such as a variable) is just a pointer to the symbol "declaration" in the symbol table. The symbol table must be scoped to allow variables being declared in different scopes. The backend can thus lookup everything about each symbol, as well as maintain a separate hashtable with backend specific information for each symbol. The alternative design is to duplicate the information from the symbol table into each symbol, but that would make ASR bigger and it would be harder for the backend to figure out if two symbols are the same.
We try to not duplicate information in ASR, keep it unique (that is, have a symbol table, not duplicate it).
Within the constrains above, we try to keep ASR as simple and abstract as possible. For example we merged Subroutines/Functions into just Functions (one less node to implement and worry about in the backends).
We often evaluate a change in ASR based on if it simplifies code in the LLVM and other backends. For example, we used to have a BinOp that handled all types (real, integer, complex), and each backend had a switch based on type. So we split BinOp into IntegerBinOp, RealBinOp and ComplexBinOp. This simplified the backends.
Backends that lower (LLVM, WebAssembly, x86) can use the compile time value as well as can internally apply any ASR -> ASR simplification (such as for loops -> while loops) to make things easier and to keep the final code more optimized.
Backends that compile to surface languages (Python, C, C++, Fortran, Julia) should keep the code as readable and as close to the original surface language (that the ASR was generated from) as possible. That means, they should not apply compile time value for expressions (e.g., they should print
25is available in ASR as the value) and they should not rewrite ASR.
At a technical level, we want to represent ASR in some simple high level format (we chose ASDL) that is on the order of 100 lines long and automatically generate highly efficient C++ code to create, walk/visit, serialize/deserialize, print etc. (on the order of 10,000 lines).
This still leaves many possible design decisions, such as:
- Have statements and expressions vs just expressions
- How arrays are handled (Array(Integer) vs Integer(dimension * ))
- Should we use basic blocks (that do not contains branches).
- Should we enforce canonical ASR, and let the frontend always create ASR in canonical form
- How should function signatures be handled (separate type, etc.)
- How to handle types? Currently types are duplicated / copied for each expression (each expr has a type) and each symbol (in the symbol table) as well as any other place that requires a type. An alternative (perhaps better?) design would be to have a (global?) type table (like a symbol table), and we would reference a given type by a pointer (just like a Var references a symbol), and appropriate serialization/deserialization.
- Related: Should types by unique? Currently we compare types like
(Integer ....)always by comparing all details each time. It would be much faster to just compare unique integer IDs. But how often does that happen? Often the types have to be compatible but not always exactly equal. Although we should make things so that types have to be exactly equal (otherwise just cast it first). For example a function call and function signature should then have exactly the same type IDs.
- Structs (derived types) and classes could have an ID and we just use the ID everywhere. Classes and derived types are in symbol table, so we sort of already do this (half way)
- Function signatures could be types, and stores in the type table
- Idea: types could just be stored in the symbol table, some already are (class/derived type). Some common types like current Integer+kinds (such as
i32) could be stored in the global symbol table always, custom types (function signatures, classes, derived types, etc.) could be stored in the symbol table where they first appear; if they are needed as part of a function definition, they could be stored inside the function's symbol table and referenced there; types are then equal if and only if they have the same pointer (serialization/deserialization machinery will handle this properly). If a type is declared in one module and used elsewhere, it has to be accessed via ExternalSymbol, just like any other symbol.
There are many designs possible, we opted to implement as much of the runtime library as possible as regular ASR code (in the surface language itself), that gets compiled (and optimized) without any special handling. Functions like search, trigonometric functions, most array functions (sum, mean, linear algebra), etc. can all be implemented as pure ASR code.
If a given operation cannot be implemented using existing ASR nodes, in other words, it requires access to the implementation of the feature (say the size of an array, which depends on how the array descriptor is implemented), then we add a dedicated ASR node (such as ArraySize).
Some features (say
clock_time) could be implemented using an ASR node, and
maybe they should. We chose to keep ASR simple, and we implement these by
calling into C, because each surface language has very specific, system and
platform dependent functionality, so we simply implement this in C, then in
Python/Fortran that calls into C. From ASR's perspective, the user facing
function is regular ASR, that internally calls into C.
File IO currently is represented in ASR in quite Fortran specific way. We'll probably refactor it into a language independent way as much as we can. The rest will have to be handled via C calls.
Distinguishing between a pointer and a non-pointer variable
Posting the discussion I and @certik had about this,
- Var can store both pointer and non-pointers.
- And pointer can be used as a non-pointer variable (in Fortran,
- So, sometimes its not clear how many times you have load to the
llvm::Value*associated with a
- Like say,
integer, pointer :: pi integer :: i integer :: a(4) pi => i a(pi) = 0
- Now the ASR will have
Assignment(ArrayItem(Var(a), Var(pi)), 0).
- Now in LLVM
struct*(pointer to struct),
- Well when you visit
Var(a), you can't load
llvm::Value* tmp. Because you need a pointer.
- But when you visit
Var(pi)you need to load twice becuase you need
- For now I am using
ptr_loadswhich specifies how many times to load a pointer when you
- Keep using
ptr_loadsas the general solution for now as it works and is simple. Easy to override and restore between multiple calls.
This is one of the ambiguities which should be handled in ASR ideally but doing too much abstraction in ASR right now is not a good idea. Instead we should learn, experiment and then after some time when backends stabilise and we implement all the crucial features, we can make ASR handle these kinds of ambiguities.
Also later writing a complete C frontend (LC) that can compile any C code to ASR and LLVM will forces us to design this well, as pointers are inherent in C programs.
Ensuring deepcopy by default
There is one more thing but more concerned about the backend. We should make sure we are deep-copying objects by default. Right now there is some non-uniformity in that regard. We need to test and make this consistent. We can’t shallow copy objects because otherwise we will face the same performance issues as CPython and also our optimizations ASR->ASR will break.
Handling of lower bound, stride, offset and column/row order
Currently the lower bound is part of the type, and can be any runtime (or compile time) expression. In each function the lower bound can be set for any argument (or a local array), and the expression for it must be computable at the point when the function is called (effectively at the call site).
The stride, offset and column/row order is currently either not implemented, or only part of the runtime array descriptor, which is a backend choice.
We should investigate if it makes sense to move stride, offset (padding of each row/dimension) and column/row order into ASR just like the lower bound is. It seems these features only influence how the array is indexed, so the ArrayIndex (and ArraySection).
An idea is to store the lower index, stride, offset/padding and column/row order only when it is needed, that is, with each ArrayIndex, ArraySection (and anywhere else it is needed).
With runtime array descriptors with strides, many compilers generate two versions of the function: one that only operates on contiguous input arguments, and it can use efficient vector instructions, and another one that assumes general (runtime) stride, which is not vectorized (or not as efficiently). Whenever the backend does not know what to generate, it means our type system is not "ahead of time" enough, it leaves too many things unspecified. So one idea is to move things more to ASR (or even the frontend) to fully specify array indexing and other features so that the backend knows exactly at all times what and how to index an array, and consequently what exact code to generate for a function (it should not generate two versions ---- this should be a choice of the frontend (or ASR->ASR optimization passes) to generate two versions, of that is what it wants).
Here are some older relevant documents for reference: