Takeaways - BHsketch/Athena GitHub Wiki
Takeaways - what I have learnt so far.
Some macro and micro decisions
Stage-wise approach
After a lot of contemplating, I realized there was much to consider, and I wasn’t a robot. So I decided to take some baby steps and work on a non-Turing-complete language first. This was good because it allowed me to get started instead of just staring at the screen.
Input double-buffer
I realized the need for an input buffer when I needed to “spit out” tokens if and when a wrong one was read. Along with this, it’s also something I’ll find helpful when I have to perform a look-ahead of potentially varying lengths in order to recognize tokens like “==” or “<=”.
Expression precedence inherent in the grammar
The LLVM kaleidoscope tutorials implement operator-precedence parsing for expressions. I found it to be overly complicated and went with the approach presented in the book - that was having precedence inherent in the grammar itself.
Enums and Classes
Kaleidoscope used an enum to represent token kinds. Or at least that is what I understood in the initial reading. The problem loops back to one on an advantage of object-oriented programming - that is - being able to implement a hierarchy of objects that share common and uncommon functionality. Each word in the code has a meaning. And to capture that meaning in a useful manner (for example, storing the type of an array along with it’s length based on what is inside the square brackets that follow) we must implement them as objects with appropriate attributes. And since some of these attributes will be common, we need a hierarchy, constructor functions etc. This can’t be achieved with a simple enum. We need classes.
Some things need improvement
Class hierarchies are not on point
In this initial version, all keywords are on the same level per se. It’s a flat hierarchy. However, words have meaning behind them(however poetic that may sound); meaning that is implemented through attributes. It is this attribute space that helps decide a class hierarchy I’m missing at the moment.
Readability
Modeling operators like “==” and “&&” as words, instead of simple numbers seems to have some utility in that it makes the code more readable; though I’m not sure if that is the only use of it. Readability is something that has always sounded more like a feature rather than a necessity, however, the bigger the codebases I work with, the more necessary that seems.
When in a compiler, do as lex and yacc do
Currently, I have kept the lexer and parser as simple as possible. I haven’t yet dove deep into how they work but on a cursory level, lex implements a transition diagram in code. There seem to be some nuances that make this approach better than a simple SWITCH-CASE / IF statement.
Move semantics?
In the kaleidoscope tutorials, they seem to have used move semantics. Looks like it could be a useful thing to incorporate. However, I haven’t yet done it because I didn’t wanna bite off more than I could chew.